Analysis and Run Truth

Global Analysis, Single Run Analysis, Run History, and Team Review are evidence surfaces for completed Eval Labs work. They support platform and behavior inspection, but they do not replace human Lucia-quality review.

Global Analysis

Canonical route:

/analysis

Legacy alias:

/experiments

Global Analysis is owner/admin-only in the current access model. It is read-only behavioral/analytics evidence. It is AI-analyzed platform evidence, not human quality approval. Owner/admin should see shared persisted Eval Labs evidence here when Supabase hydration and RLS scope allow it.

Single Run Analysis

Canonical route:

/analysis/runs/:sessionId

Single Run Analysis is read-only analysis of one completed run/session. It can include:

run metadata
behavioral summaries
item rows
item-level review links
Copy Session ID
Copy Deep Link

When only compact local state is available, summary counts can render before full item-level cloud hydration.

Run History

Canonical route:

/lucia/automated/runs

Run History is the scoped run ledger. It records completed/finalized run truth and may show scoped operational state. Run History truth means the UI agrees with the persisted run lifecycle and scoped account context. Owner/admin can inspect shared/global persisted run evidence. Evaluator and tester users should only see runs scoped to their own allowed work.

Team Review

Canonical route:

/team-review

Team Review is the owner/admin oversight surface. It exists to inspect evaluator activity, review quality, missing checks, flags, recent work, and evidence that needs owner/admin attention. Team Review is not available to evaluator, tester, or unassigned users.

Registry Diagnostics

Canonical route:

/registry-diagnostics

Registry Diagnostics is read-only diagnostic evidence. It derives Dataset Registry membership suggestions and Human Review Queue 2.0 lane suggestions from existing Eval Labs data. It does not save labels or prove human approval.

Behavioral Observatory

Canonical route:

/behavioral-observatory

Behavioral Observatory is the saved behavioral label surface. It may start from derived run/review context, but its durable truth comes only from saved labels that reload from Supabase.

Truth distinction

Use this distinction everywhere:

Run History truth = the ledger reflects the run lifecycle.
Global Analysis truth = the analytics surface reflects shared persisted run evidence.
Team Review truth = owner/admin oversight reflects evaluator activity and review gaps.
Registry Diagnostics truth = derived suggestions reflect the current classification model.
Behavioral Observatory truth = saved labels reflect reviewer-saved behavioral evidence.
Human quality truth = human reviewers decide whether Lucia worked.

The 60-run AI-reviewed gate proved Run History and Global Analysis were aligned with Supabase and local compact state for that readiness scope. It did not prove human Lucia-quality approval.

Staged hydration

Current dashboards should hydrate in stages:

run summaries first
recent evidence next
deep item-level evidence after that

This is a performance and truth-state requirement. Fast dashboards must still be source-backed. If deep evidence has not loaded yet, the UI or documentation should name that limitation instead of filling the gap with fake metrics. The same rule applies to Team Review and Global Analysis.

Role and Access Model Dataset Registry and Registry Diagnostics

⌘I

​Global Analysis

​Single Run Analysis

​Run History

​Team Review

​Registry Diagnostics

​Behavioral Observatory

​Truth distinction

​Staged hydration

Global Analysis

Single Run Analysis

Run History

Team Review

Registry Diagnostics

Behavioral Observatory

Truth distinction

Staged hydration