Skip to main content
This page captures the major Eval Labs hardening steps that made the system ready for real Lucia development.

April 2026 milestone

Eval Labs became usable for active Lucia development after the custom prompt, review, tester identity, environment, CORS, and persistence work landed.

Key product changes

Custom prompt launcher

Commit:
884f194 evals: add custom prompt launcher and saved suites
Added:
  • fork landing page
  • custom 1–10 prompt launcher
  • saved custom suites
  • custom run history
  • runSource: custom
  • shared Review Queue reuse

Review completion and navigation

Commit:
3ef6fb5 evals: improve review completion and navigation
Added:
  • final prompt button changes from Save & Next to Save
  • completion action area
  • top-left brand home navigation
  • clickable breadcrumbs
  • dist/ removed from Git tracking

Tester identity exports

Commit:
16f53cd evals: attach tester identity to exports
Added:
  • TesterIdentity
  • prompt-level savedBy
  • top-level exportedBy
  • reviewer identity in CSV/Markdown exports
  • Clerk identity normalization

Supabase run item conflict target

Commit:
d51f9b1 evals: fix run item upsert conflict target
Changed initial eval_run_items upsert target from id to run_id,item_index.

Supabase row identity reconciliation

Commit:
fd8a366 evals: reconcile run item ids before upsert
Fixed primary-key collisions by reusing existing run item row IDs for the same logical run_id + item_index slot.

Environment hardening

Netlify

Changed Eval Labs endpoint to:
https://api-dev.hellolucia.ai/admin/operator-focus

Render dev Engine

Updated ADMIN_ALLOWED_ORIGINS to include:
https://evaluationlabs.ai
https://www.evaluationlabs.ai

Validation outcome

Validated:
  • Eval Labs deployed site calls api-dev
  • Engine returns 200
  • custom prompt run succeeds
  • Supabase persistence succeeds
  • no CORS error
  • no eval_run_items 409 after latest bundle
  • exported identity metadata works

Current status

Eval Labs is ready for active Lucia dev refinement and has passed the AI-reviewed platform readiness gate.
Use custom prompt suites as the primary tool for behavior-family refinement. Do not overclaim this as human Lucia-quality approval.

May 2026 review-layer milestone

Eval Labs gained a full layered review architecture:
  • adjudication-ready review schema
  • guided Employee Review fields
  • suggested review layer
  • Human Guidance Evaluation
  • Quick Review UX for non-expert reviewers
  • review state controls and routing
  • adjudication queue filters
  • canon-candidate workflow
  • JSON, CSV, and Markdown export parity for structured review evidence
  • lifecycle finalization
  • Supabase promptRecord payload persistence
  • dirty/completion state preservation
  • semantic stepped rating sliders
  • native-feeling confidence bar visual design
Current doctrine:
Employee reviewers capture reaction.
Senior adjudication assigns canonical meaning.
This should be treated as a major product and Canon milestone, not a cosmetic UI change.

May 2026 product-surface and access milestone

Eval Labs was refined into a more complete internal product surface:
  • top app shell owns page identity
  • in-page blog-style mastheads were removed from the app
  • Custom, Auto-generated, and Controlled Batch Runner surfaces were split
  • /lucia/auto-generated became the canonical normal generated tester route
  • /lucia/automated remains a legacy alias
  • /analysis became the canonical Global Analysis route
  • /experiments remains a legacy alias
  • Single Run Analysis was added at /analysis/runs/:sessionId
  • Run rows were standardized with two-zone layout and Copy dropdown patterns
  • Copy Session ID / Copy Deep Link controls were added across key surfaces
  • Global Analysis loading was fixed to show immediately
  • role-gated owner/admin/evaluator behavior was added as the initial product gate
Historical access limitation at this milestone:
Backend/RLS enforcement still required before external evaluator rollout.
Current role and RLS posture is documented in Current System State and Eval Labs Roles and Access Matrix.

May 2026 AI-reviewed platform readiness gate

Final gate result:
60 / 60 completed runs
3,000 expected prompts
3,000 eval_run_items
3,000 Lucia responses
3,000 reviews
Supabase verification result:
ready | 60 | 3000 | 3000 | 3000 | 3000
localStorage verification:
sessionCount = 60
persistedLocalFullPayloadSessionCount = 0
persistedLocalHasItemLevelData = false
persistedLocalItemLevelDataSessionCount = 0
ownedSessionCount = 60
otherOwnerSessionCount = 0
ownerlessSessionCount = 0
rawByteSize around 68,815
This proves:
  • run creation
  • Lucia response capture
  • review generation
  • review persistence
  • run finalization
  • Run History truth
  • Global Analysis truth
  • Supabase/UI count agreement
  • localStorage compactness
  • controlled batch lifecycle
  • no visible cross-owner local leak in the tested owner context
This does not prove:
  • Lucia is human-approved
  • Lucia is ready for real operator use
  • employee rollout is complete
  • human evaluators agree with AI scoring
  • backend/RLS permissions are complete by themselves
Read next: AI-Reviewed Platform Readiness Gate.