Current System State

Eval Labs is an implemented, role-based human evaluation platform for Lucia. It supports controlled human onboarding, persisted evidence, owner/admin oversight, and evaluator-safe workflows, while some UX and rollout areas remain in active hardening.

Current platform truth

Status: implemented. Eval Labs is no longer only a founder or AI-agent testing tool. It is Lucia’s role-based human evaluation platform:

Clerk auth works.
Clerk public metadata drives frontend role behavior through eval_labs_role.
The Clerk session token includes eval_labs_role so Supabase RLS can recognize privileged owner/admin access.
Supabase RLS protects persisted evidence.
Real runs must persist to Supabase.
Owner/admin should see shared persisted Eval Labs evidence.
Evaluator and tester data remains scoped to their own work except where owner/admin oversight applies.
Team Review exists as the owner/admin oversight surface.
Staged hydration loads run summaries first, then recent and deeper evidence, so dashboards can render faster without fake metrics.

Current roles

Status: implemented. Current roles:

owner
admin
evaluator
tester
unassigned or missing role

Read the canonical matrix: Eval Labs Roles and Access Matrix.

Current test surfaces

Status: implemented. Current test surfaces:

Custom Prompt Test
Auto-generated Prompt Test
Guest Facing Agent Verification Check
Controlled Batch Runner

Tester access is intentionally narrower than evaluator access. Tester is for clean prompt-testing onboarding cohorts. Evaluator is for the full evaluator workbench and evaluator-safe test types.

Oversight and analysis

Status: implemented. Owner/admin have full platform access, shared persisted evidence, Team Review, Global Analysis, and all test surfaces. Team Review exists for owner/admin oversight of human evaluation work: evidence quality, reviewer activity, review gaps, and escalation readiness. Global Analysis is owner/admin-only platform-wide evidence inspection. It is not a tester or evaluator onboarding surface.

Human onboarding posture

Status: active hardening. Eval Labs is ready for controlled human onboarding by role and assignment. Do not describe the platform as broadly production-mature or open-access. Do not describe Lucia as human-approved because the AI-reviewed platform readiness gate passed. Use:

implemented
active hardening
deferred
future

Avoid softer labels that imply more maturity than the source state proves.

Active hardening

These areas are implemented but still being tightened, polished, or verified for rollout:

evaluator onboarding/workspace polish
first human cohort instructions
role-specific route verification
Clerk-to-Supabase role-claim verification after role or RLS changes
staged hydration behavior across large evidence sets
clear tester-vs-evaluator assignment guidance

Deferred

Deferred means intentionally outside the current access model:

tester access to Verification Check
tester access to Verification Results
tester access to Controlled Batch Runner
tester access to Team Review
tester access to Global Analysis
tester access to Registry Diagnostics
tester access to Behavioral Observatory
evaluator access to Team Review
evaluator access to Global Analysis

Future

Future means possible later, not current behavior:

broader public or external evaluator rollout
expanded assignment management
additional owner/admin management tooling
deeper cohort analytics beyond current oversight surfaces
more final evaluator UX polish

First human onboarding readiness criteria

Before a first human onboarding cohort starts:

Confirm every participant has Clerk auth access.
Confirm eval_labs_role is set in Clerk public metadata.
Confirm the session token carries the role claim used by Supabase RLS.
Confirm visible routes match the access matrix.
Run a real prompt test and verify Supabase persistence.
Confirm owner/admin can see shared persisted evidence where oversight applies.
Keep testers limited to Custom Prompt Test and Auto-generated Prompt Test.
Give evaluators only assignments that match evaluator-safe surfaces.
Name any active-hardening caveats before the work begins.
Repeat that AI-reviewed platform readiness is not human Lucia-quality approval.

​Current platform truth

​Current roles

​Current test surfaces

​Oversight and analysis

​Human onboarding posture

​Active hardening

​Deferred

​Future

​First human onboarding readiness criteria

Current platform truth

Current roles

Current test surfaces

Oversight and analysis

Human onboarding posture

Active hardening

Deferred

Future

First human onboarding readiness criteria