Skip to main content
Eval Labs is an implemented, role-based human evaluation platform for Lucia. It supports controlled human onboarding, persisted evidence, owner/admin oversight, and evaluator-safe workflows, while some UX and rollout areas remain in active hardening.

Current platform truth

Status: implemented. Eval Labs is no longer only a founder or AI-agent testing tool. It is Lucia’s role-based human evaluation platform:
  • Clerk auth works.
  • Clerk public metadata drives frontend role behavior through eval_labs_role.
  • The Clerk session token includes eval_labs_role so Supabase RLS can recognize privileged owner/admin access.
  • Supabase RLS protects persisted evidence.
  • Real runs must persist to Supabase.
  • Owner/admin should see shared persisted Eval Labs evidence.
  • Evaluator and tester data remains scoped to their own work except where owner/admin oversight applies.
  • Team Review exists as the owner/admin oversight surface.
  • Staged hydration loads run summaries first, then recent and deeper evidence, so dashboards can render faster without fake metrics.

Current roles

Status: implemented. Current roles:
  • owner
  • admin
  • evaluator
  • tester
  • unassigned or missing role
Read the canonical matrix: Eval Labs Roles and Access Matrix.

Current test surfaces

Status: implemented. Current test surfaces:
  1. Custom Prompt Test
  2. Auto-generated Prompt Test
  3. Guest Facing Agent Verification Check
  4. Controlled Batch Runner
Tester access is intentionally narrower than evaluator access. Tester is for clean prompt-testing onboarding cohorts. Evaluator is for the full evaluator workbench and evaluator-safe test types.

Oversight and analysis

Status: implemented. Owner/admin have full platform access, shared persisted evidence, Team Review, Global Analysis, and all test surfaces. Team Review exists for owner/admin oversight of human evaluation work: evidence quality, reviewer activity, review gaps, and escalation readiness. Global Analysis is owner/admin-only platform-wide evidence inspection. It is not a tester or evaluator onboarding surface.

Human onboarding posture

Status: active hardening. Eval Labs is ready for controlled human onboarding by role and assignment. Do not describe the platform as broadly production-mature or open-access. Do not describe Lucia as human-approved because the AI-reviewed platform readiness gate passed. Use:
implemented
active hardening
deferred
future
Avoid softer labels that imply more maturity than the source state proves.

Active hardening

These areas are implemented but still being tightened, polished, or verified for rollout:
  • evaluator onboarding/workspace polish
  • first human cohort instructions
  • role-specific route verification
  • Clerk-to-Supabase role-claim verification after role or RLS changes
  • staged hydration behavior across large evidence sets
  • clear tester-vs-evaluator assignment guidance

Deferred

Deferred means intentionally outside the current access model:
  • tester access to Verification Check
  • tester access to Verification Results
  • tester access to Controlled Batch Runner
  • tester access to Team Review
  • tester access to Global Analysis
  • tester access to Registry Diagnostics
  • tester access to Behavioral Observatory
  • evaluator access to Team Review
  • evaluator access to Global Analysis

Future

Future means possible later, not current behavior:
  • broader public or external evaluator rollout
  • expanded assignment management
  • additional owner/admin management tooling
  • deeper cohort analytics beyond current oversight surfaces
  • more final evaluator UX polish

First human onboarding readiness criteria

Before a first human onboarding cohort starts:
  1. Confirm every participant has Clerk auth access.
  2. Confirm eval_labs_role is set in Clerk public metadata.
  3. Confirm the session token carries the role claim used by Supabase RLS.
  4. Confirm visible routes match the access matrix.
  5. Run a real prompt test and verify Supabase persistence.
  6. Confirm owner/admin can see shared persisted evidence where oversight applies.
  7. Keep testers limited to Custom Prompt Test and Auto-generated Prompt Test.
  8. Give evaluators only assignments that match evaluator-safe surfaces.
  9. Name any active-hardening caveats before the work begins.
  10. Repeat that AI-reviewed platform readiness is not human Lucia-quality approval.