Release and Validation History

This page captures the major Eval Labs hardening steps that made the system ready for real Lucia development.

April 2026 milestone

Eval Labs became usable for active Lucia development after the custom prompt, review, tester identity, environment, CORS, and persistence work landed.

Key product changes

Custom prompt launcher

Commit:

884f194 evals: add custom prompt launcher and saved suites

Added:

fork landing page
custom 1–10 prompt launcher
saved custom suites
custom run history
runSource: custom
shared Review Queue reuse

Commit:

3ef6fb5 evals: improve review completion and navigation

Added:

final prompt button changes from Save & Next to Save
completion action area
top-left brand home navigation
clickable breadcrumbs
dist/ removed from Git tracking

Tester identity exports

Commit:

16f53cd evals: attach tester identity to exports

Added:

TesterIdentity
prompt-level savedBy
top-level exportedBy
reviewer identity in CSV/Markdown exports
Clerk identity normalization

Supabase run item conflict target

Commit:

d51f9b1 evals: fix run item upsert conflict target

Changed initial eval_run_items upsert target from id to run_id,item_index.

Supabase row identity reconciliation

Commit:

fd8a366 evals: reconcile run item ids before upsert

Fixed primary-key collisions by reusing existing run item row IDs for the same logical run_id + item_index slot.

Environment hardening

Netlify

Changed Eval Labs endpoint to:

https://api-dev.hellolucia.ai/admin/operator-focus

Render dev Engine

Updated ADMIN_ALLOWED_ORIGINS to include:

https://evaluationlabs.ai
https://www.evaluationlabs.ai

Validation outcome

Validated:

Eval Labs deployed site calls api-dev
Engine returns 200
custom prompt run succeeds
Supabase persistence succeeds
no CORS error
no eval_run_items 409 after latest bundle
exported identity metadata works

Current status

Eval Labs is ready for active Lucia dev refinement and has passed the AI-reviewed platform readiness gate.

Use custom prompt suites as the primary tool for behavior-family refinement. Do not overclaim this as human Lucia-quality approval.

May 2026 review-layer milestone

Eval Labs gained a full layered review architecture:

adjudication-ready review schema
guided Employee Review fields
suggested review layer
Human Guidance Evaluation
Quick Review UX for non-expert reviewers
review state controls and routing
adjudication queue filters
canon-candidate workflow
JSON, CSV, and Markdown export parity for structured review evidence
lifecycle finalization
Supabase promptRecord payload persistence
dirty/completion state preservation
semantic stepped rating sliders
native-feeling confidence bar visual design

Current doctrine:

Employee reviewers capture reaction.
Senior adjudication assigns canonical meaning.

This should be treated as a major product and Canon milestone, not a cosmetic UI change.

May 2026 product-surface and access milestone

Eval Labs was refined into a more complete internal product surface:

top app shell owns page identity
in-page blog-style mastheads were removed from the app
Custom, Auto-generated, and Controlled Batch Runner surfaces were split
/lucia/auto-generated became the canonical normal generated tester route
/lucia/automated remains a legacy alias
/analysis became the canonical Global Analysis route
/experiments remains a legacy alias
Single Run Analysis was added at /analysis/runs/:sessionId
Run rows were standardized with two-zone layout and Copy dropdown patterns
Copy Session ID / Copy Deep Link controls were added across key surfaces
Global Analysis loading was fixed to show immediately
role-gated owner/admin/evaluator behavior was added as the initial product gate

Historical access limitation at this milestone:

Backend/RLS enforcement still required before external evaluator rollout.

Current role and RLS posture is documented in Current System State and Eval Labs Roles and Access Matrix.

May 2026 AI-reviewed platform readiness gate

Final gate result:

60 / 60 completed runs
3,000 expected prompts
3,000 eval_run_items
3,000 Lucia responses
3,000 reviews

Supabase verification result:

ready | 60 | 3000 | 3000 | 3000 | 3000

localStorage verification:

sessionCount = 60
persistedLocalFullPayloadSessionCount = 0
persistedLocalHasItemLevelData = false
persistedLocalItemLevelDataSessionCount = 0
ownedSessionCount = 60
otherOwnerSessionCount = 0
ownerlessSessionCount = 0
rawByteSize around 68,815

This proves:

run creation
Lucia response capture
review generation
review persistence
run finalization
Run History truth
Global Analysis truth
Supabase/UI count agreement
localStorage compactness
controlled batch lifecycle
no visible cross-owner local leak in the tested owner context

This does not prove:

Lucia is human-approved
Lucia is ready for real operator use
employee rollout is complete
human evaluators agree with AI scoring
backend/RLS permissions are complete by themselves

​April 2026 milestone

​Key product changes

​Custom prompt launcher

​Review completion and navigation

​Tester identity exports

​Supabase run item conflict target

​Supabase row identity reconciliation

​Environment hardening

​Netlify

​Render dev Engine

​Validation outcome

​Current status

​May 2026 review-layer milestone

​May 2026 product-surface and access milestone

​May 2026 AI-reviewed platform readiness gate

April 2026 milestone

Key product changes

Custom prompt launcher

Review completion and navigation

Tester identity exports

Supabase run item conflict target

Supabase row identity reconciliation

Environment hardening

Netlify

Render dev Engine

Validation outcome

Current status

May 2026 review-layer milestone

May 2026 product-surface and access milestone

May 2026 AI-reviewed platform readiness gate