Product Surfaces and Route Map

Eval Labs now has separate surfaces for owner/admin oversight, evaluator workbench testing, tester prompt-testing, guest-facing verification, controlled platform checks, derived Registry Diagnostics, persisted Behavioral Observatory labels, Run History, Team Review, Global Analysis, Single Run Analysis, and Review Queue work.

Route map

Route	Surface	Current access intent
`/`	Role-aware home / landing	Owner/admin/evaluator/tester by role
`/lucia/launcher`	Launcher / workspace chooser	Owner/admin/evaluator/tester
`/lucia/custom`	Custom Prompt Test	Owner/admin/evaluator/tester
`/lucia/custom/suites/:suiteId`	Custom saved suite deep link	Owner/admin
`/lucia/auto-generated`	Auto-generated Prompt Test	Owner/admin/evaluator/tester
`/lucia/automated`	Legacy alias to auto-generated tester	Owner/admin/evaluator/tester
`/guest-facing/verification`	Guest Facing Agent Verification Check	Owner/admin/evaluator
`/guest-facing/verification?view=check`	Guest Facing Agent Verification Check	Owner/admin/evaluator
`/guest-facing/verification/results`	Verification Results	Owner/admin/evaluator
`/lucia/batch-runner`	Controlled Batch Runner	Owner/admin/evaluator
`/lucia/automated/runs`	Run History	Owner/admin global; evaluator/tester own or scoped runs
`/team-review`	Team Review overview	Owner/admin
`/team-review/evaluators/:evaluatorKey`	Team Review evaluator detail	Owner/admin
`/registry-diagnostics`	Registry Diagnostics	Owner/admin
`/dataset-diagnostics`	Legacy Registry Diagnostics alias	Owner/admin
`/behavioral-observatory`	Behavioral Observatory	Owner/admin
`/analysis`	Global Analysis	Owner/admin
`/experiments`	Legacy Global Analysis alias	Owner/admin
`/analysis/runs/:sessionId`	Single Run Analysis	Owner/admin
`/runs/:sessionId/running`	In-flight run route	Owner/admin; evaluator/tester only for own scoped runs
`/runs/:sessionId/review`	Review Queue	Owner/admin; evaluator/tester only for own scoped runs
`/runs/:sessionId/review?eval=:caseId`	Direct eval-item review link	Owner/admin; evaluator/tester only for own scoped runs

Surface definitions

Role-aware home

The home route is role-aware. Owner/admin see the privileged platform overview. Evaluator and tester users should see onboarding/workspace entry points appropriate to their role.

Launcher

The Launcher is the workspace chooser. It separates:

Custom Prompt Test
Auto-generated Prompt Test
Guest Facing Agent Verification Check
Controlled Batch Runner

The top app shell owns page identity. The older in-page blog-style masthead pattern has been removed from the product surface.

Custom prompt tester

The Custom Prompt Test lets a user enter 1-10 exact prompts. Use it for targeted testing, evaluator work, and repeatable behavior-family review. Owner/admin, evaluator, and tester roles can use this surface.

Auto-generated prompt tester

The Auto-generated Prompt Test runs the normal generated 50-prompt test. Owner/admin, evaluator, and tester roles can use this surface. It is separate from controlled batch infrastructure.

Guest Facing Agent Verification Check

Guest Facing Agent Verification Check runs the booked-guest verification scenario pack through the app surface. Owner/admin and evaluator roles can use this surface. Tester cannot use this surface.

Verification Results

Verification Results shows the saved/current Guest Facing Agent verification output, scenario failures, exports, and copied summaries. Owner/admin and evaluator roles can use this surface. Tester cannot use this surface.

Controlled batch runner

The Controlled Batch Runner is controlled platform-readiness tooling. It supports:

1-run smoke
3-run checkpoint
10-run checkpoint

It was used for the 60-run readiness gate. Owner/admin and evaluator roles can use it in the current role model. Tester cannot use it.

Run History

Run History is the scoped run ledger. It includes completed/finalized run truth and may show scoped operational state. Owner/admin can inspect shared/global persisted evidence. Evaluator and tester access is scoped to their own allowed work.

Team Review

Team Review is the owner/admin oversight surface. It groups evaluator activity, flags review gaps, and helps owner/admin decide where human evaluation signal needs inspection. It is not available to evaluator, tester, or unassigned users.

Registry Diagnostics

Registry Diagnostics is the read-only diagnostic surface for the Dataset Registry and Human Review Queue 2.0 classification model. It shows derived suggestions from existing Eval Labs run/review data:

dataset membership suggestions
confidence
source fields
queue lane suggestions
Noise / Watch classification patterns

It does not create labels, save human behavioral decisions, or prove final dataset membership. /registry-diagnostics is canonical. /dataset-diagnostics remains a legacy inbound alias.

Behavioral Observatory

Behavioral Observatory is the first-class behavioral labeling surface. It lets a reviewer inspect conversations and save structured labels:

Intent
Guest Affect
Response Strategy
Humanness
Notes

Saved labels are persisted Behavioral Observatory evidence only after Supabase confirms the save. Derived suggestions on this page are starting context, not final judgment.

Global Analysis

Global Analysis is the read-only owner/admin behavioral and analytics surface. It is AI-analyzed platform evidence, not human Lucia-quality approval. /analysis is canonical. /experiments remains a legacy alias.

Single Run Analysis

Single Run Analysis is read-only analysis of one completed run/session. It includes run metadata, behavioral summaries, item rows, and copy/deep-link controls when hydrated data is available.

Review Queue

The Review Queue is the scoring and review workflow for prompts/items. In the current role model, evaluator and tester review access is scoped to their own allowed work. Owner/admin can inspect shared persisted evidence where oversight applies.

Copy controls

Copy Session ID and Copy Deep Link controls exist across key surfaces, including run rows, controlled batch summaries, Single Run Analysis, and review/item contexts. These controls are addressability infrastructure. They make future debugging, review handoff, and Canon recovery easier.

Surface distinction rule

Use this map:

Run History = run ledger truth
Team Review = owner/admin oversight truth
Global Analysis = read-only platform evidence truth
Registry Diagnostics = derived classification truth
Behavioral Observatory = saved behavioral label truth
Review Queue = prompt/item review workflow

Do not claim a diagnostic suggestion is a saved label.

​Route map

​Surface definitions

​Role-aware home

​Launcher

​Custom prompt tester

​Auto-generated prompt tester

​Guest Facing Agent Verification Check

​Verification Results

​Controlled batch runner

​Run History

​Team Review

​Registry Diagnostics

​Behavioral Observatory

​Global Analysis

​Single Run Analysis

​Review Queue

​Copy controls

​Surface distinction rule

Route map

Surface definitions

Role-aware home

Launcher

Custom prompt tester

Auto-generated prompt tester

Guest Facing Agent Verification Check

Verification Results

Controlled batch runner

Run History

Team Review

Registry Diagnostics

Behavioral Observatory

Global Analysis

Single Run Analysis

Review Queue

Copy controls

Surface distinction rule