This page records the May 2026 review-layer evolution: shared run launchers, employee review, suggested selections, Human Guidance Evaluation, adjudication-ready schema, exports, queue filters, lifecycle finalization, and the later platform-readiness split between normal testing and controlled batch gates.
May 2026 review-layer milestone
Eval Labs evolved from a prompt runner into a layered review product. The key change:custom or automated runshared Review Queuesuggested selections plus human reviewlifecycle finalization and export
Major shipped changes
Adjudication-ready schema
Added review model support for:Employee Review layer
Added guided employee-review fields:Suggested review layer
Added app-suggested review values for:Review Queue UX
The Review Queue now favors guided employee judgment:- single-column Quick Review flow
- numbered question cards
- separate selection boxes
- suggested selections
- reduced freeform text burden
- senior-review routing
- canon-candidate routing
- Human Guidance Evaluation
- Save / Save & Next / Save & next flagged flows
- search and workflow filters
- JSON, CSV, and Markdown export controls
- finalization after all prompts are reviewed
Semantic confidence sliders
The “How did Lucia do?” scoring section moved from 1–10 button rows to stepped semantic confidence sliders. The final design direction:Adjudication queue filters
Added workflow queue filters for:Exports
JSON, CSV, and Markdown exports now preserve structured review, suggested review, Employee Review, Human Guidance Evaluation, adjudication metadata, lifecycle state, tester identity, and prompt dirty/completion state.Supabase persistence
Supabase persistence now stores run lifecycle metadata oneval_runs, embeds the full case and prompt review record in eval_run_items.payload, and writes eval_item_reviews rows for review persistence.
Hydration prefers the embedded eval_run_items.payload.promptRecord over fragile review-table reads.
Current doctrine impact
This release established a new Eval Labs principle:Product surface refinement
After the review-layer release, Eval Labs was refined into a clearer product surface:- top app shell owns page identity
- in-page blog-style mastheads were removed from the app
- Custom Prompt Test, Auto-generated Prompt Test, and Controlled Batch Runner are separate surfaces
- Controlled Batch Runner is controlled readiness tooling; current access is owner/admin/evaluator, not tester
- Auto-generated Prompt Test remains the normal 50-prompt generated tester
- Run History rows use a standardized two-zone layout
- copy controls use Copy Session ID / Copy Deep Link patterns across key surfaces
- Single Run Analysis gives read-only run-level evidence outside the Review Queue

