Skip to main content
Global Analysis, Single Run Analysis, Run History, and Team Review are evidence surfaces for completed Eval Labs work. They support platform and behavior inspection, but they do not replace human Lucia-quality review.

Global Analysis

Canonical route:
/analysis
Legacy alias:
/experiments
Global Analysis is owner/admin-only in the current access model. It is read-only behavioral/analytics evidence. It is AI-analyzed platform evidence, not human quality approval. Owner/admin should see shared persisted Eval Labs evidence here when Supabase hydration and RLS scope allow it.

Single Run Analysis

Canonical route:
/analysis/runs/:sessionId
Single Run Analysis is read-only analysis of one completed run/session. It can include:
  • run metadata
  • behavioral summaries
  • item rows
  • item-level review links
  • Copy Session ID
  • Copy Deep Link
When only compact local state is available, summary counts can render before full item-level cloud hydration.

Run History

Canonical route:
/lucia/automated/runs
Run History is the scoped run ledger. It records completed/finalized run truth and may show scoped operational state. Run History truth means the UI agrees with the persisted run lifecycle and scoped account context. Owner/admin can inspect shared/global persisted run evidence. Evaluator and tester users should only see runs scoped to their own allowed work.

Team Review

Canonical route:
/team-review
Team Review is the owner/admin oversight surface. It exists to inspect evaluator activity, review quality, missing checks, flags, recent work, and evidence that needs owner/admin attention. Team Review is not available to evaluator, tester, or unassigned users.

Registry Diagnostics

Canonical route:
/registry-diagnostics
Registry Diagnostics is read-only diagnostic evidence. It derives Dataset Registry membership suggestions and Human Review Queue 2.0 lane suggestions from existing Eval Labs data. It does not save labels or prove human approval.

Behavioral Observatory

Canonical route:
/behavioral-observatory
Behavioral Observatory is the saved behavioral label surface. It may start from derived run/review context, but its durable truth comes only from saved labels that reload from Supabase.

Truth distinction

Use this distinction everywhere:
Run History truth = the ledger reflects the run lifecycle.
Global Analysis truth = the analytics surface reflects shared persisted run evidence.
Team Review truth = owner/admin oversight reflects evaluator activity and review gaps.
Registry Diagnostics truth = derived suggestions reflect the current classification model.
Behavioral Observatory truth = saved labels reflect reviewer-saved behavioral evidence.
Human quality truth = human reviewers decide whether Lucia worked.
The 60-run AI-reviewed gate proved Run History and Global Analysis were aligned with Supabase and local compact state for that readiness scope. It did not prove human Lucia-quality approval.

Staged hydration

Current dashboards should hydrate in stages:
run summaries first
recent evidence next
deep item-level evidence after that
This is a performance and truth-state requirement. Fast dashboards must still be source-backed. If deep evidence has not loaded yet, the UI or documentation should name that limitation instead of filling the gap with fake metrics. The same rule applies to Team Review and Global Analysis.