Eval Labs now has separate surfaces for owner/admin oversight, evaluator workbench testing, tester prompt-testing, guest-facing verification, controlled platform checks, derived Registry Diagnostics, persisted Behavioral Observatory labels, Run History, Team Review, Global Analysis, Single Run Analysis, and Review Queue work.
Route map
| Route | Surface | Current access intent |
|---|---|---|
/ | Role-aware home / landing | Owner/admin/evaluator/tester by role |
/lucia/launcher | Launcher / workspace chooser | Owner/admin/evaluator/tester |
/lucia/custom | Custom Prompt Test | Owner/admin/evaluator/tester |
/lucia/custom/suites/:suiteId | Custom saved suite deep link | Owner/admin |
/lucia/auto-generated | Auto-generated Prompt Test | Owner/admin/evaluator/tester |
/lucia/automated | Legacy alias to auto-generated tester | Owner/admin/evaluator/tester |
/guest-facing/verification | Guest Facing Agent Verification Check | Owner/admin/evaluator |
/guest-facing/verification?view=check | Guest Facing Agent Verification Check | Owner/admin/evaluator |
/guest-facing/verification/results | Verification Results | Owner/admin/evaluator |
/lucia/batch-runner | Controlled Batch Runner | Owner/admin/evaluator |
/lucia/automated/runs | Run History | Owner/admin global; evaluator/tester own or scoped runs |
/team-review | Team Review overview | Owner/admin |
/team-review/evaluators/:evaluatorKey | Team Review evaluator detail | Owner/admin |
/registry-diagnostics | Registry Diagnostics | Owner/admin |
/dataset-diagnostics | Legacy Registry Diagnostics alias | Owner/admin |
/behavioral-observatory | Behavioral Observatory | Owner/admin |
/analysis | Global Analysis | Owner/admin |
/experiments | Legacy Global Analysis alias | Owner/admin |
/analysis/runs/:sessionId | Single Run Analysis | Owner/admin |
/runs/:sessionId/running | In-flight run route | Owner/admin; evaluator/tester only for own scoped runs |
/runs/:sessionId/review | Review Queue | Owner/admin; evaluator/tester only for own scoped runs |
/runs/:sessionId/review?eval=:caseId | Direct eval-item review link | Owner/admin; evaluator/tester only for own scoped runs |
Surface definitions
Role-aware home
The home route is role-aware. Owner/admin see the privileged platform overview. Evaluator and tester users should see onboarding/workspace entry points appropriate to their role.Launcher
The Launcher is the workspace chooser. It separates:- Custom Prompt Test
- Auto-generated Prompt Test
- Guest Facing Agent Verification Check
- Controlled Batch Runner
Custom prompt tester
The Custom Prompt Test lets a user enter 1-10 exact prompts. Use it for targeted testing, evaluator work, and repeatable behavior-family review. Owner/admin, evaluator, and tester roles can use this surface.Auto-generated prompt tester
The Auto-generated Prompt Test runs the normal generated 50-prompt test. Owner/admin, evaluator, and tester roles can use this surface. It is separate from controlled batch infrastructure.Guest Facing Agent Verification Check
Guest Facing Agent Verification Check runs the booked-guest verification scenario pack through the app surface. Owner/admin and evaluator roles can use this surface. Tester cannot use this surface.Verification Results
Verification Results shows the saved/current Guest Facing Agent verification output, scenario failures, exports, and copied summaries. Owner/admin and evaluator roles can use this surface. Tester cannot use this surface.Controlled batch runner
The Controlled Batch Runner is controlled platform-readiness tooling. It supports:- 1-run smoke
- 3-run checkpoint
- 10-run checkpoint
Run History
Run History is the scoped run ledger. It includes completed/finalized run truth and may show scoped operational state. Owner/admin can inspect shared/global persisted evidence. Evaluator and tester access is scoped to their own allowed work.Team Review
Team Review is the owner/admin oversight surface. It groups evaluator activity, flags review gaps, and helps owner/admin decide where human evaluation signal needs inspection. It is not available to evaluator, tester, or unassigned users.Registry Diagnostics
Registry Diagnostics is the read-only diagnostic surface for the Dataset Registry and Human Review Queue 2.0 classification model. It shows derived suggestions from existing Eval Labs run/review data:- dataset membership suggestions
- confidence
- source fields
- queue lane suggestions
- Noise / Watch classification patterns
/registry-diagnostics is canonical.
/dataset-diagnostics remains a legacy inbound alias.
Behavioral Observatory
Behavioral Observatory is the first-class behavioral labeling surface. It lets a reviewer inspect conversations and save structured labels:- Intent
- Guest Affect
- Response Strategy
- Humanness
- Notes
Global Analysis
Global Analysis is the read-only owner/admin behavioral and analytics surface. It is AI-analyzed platform evidence, not human Lucia-quality approval./analysis is canonical.
/experiments remains a legacy alias.

