Eval Labs is an implemented, role-based human evaluation platform for Lucia. It supports controlled human onboarding, persisted evidence, owner/admin oversight, and evaluator-safe workflows, while some UX and rollout areas remain in active hardening.
Current platform truth
Status: implemented. Eval Labs is no longer only a founder or AI-agent testing tool. It is Lucia’s role-based human evaluation platform:- Clerk auth works.
- Clerk public metadata drives frontend role behavior through
eval_labs_role. - The Clerk session token includes
eval_labs_roleso Supabase RLS can recognize privileged owner/admin access. - Supabase RLS protects persisted evidence.
- Real runs must persist to Supabase.
- Owner/admin should see shared persisted Eval Labs evidence.
- Evaluator and tester data remains scoped to their own work except where owner/admin oversight applies.
- Team Review exists as the owner/admin oversight surface.
- Staged hydration loads run summaries first, then recent and deeper evidence, so dashboards can render faster without fake metrics.
Current roles
Status: implemented. Current roles:owneradminevaluatortester- unassigned or missing role
Current test surfaces
Status: implemented. Current test surfaces:- Custom Prompt Test
- Auto-generated Prompt Test
- Guest Facing Agent Verification Check
- Controlled Batch Runner
Oversight and analysis
Status: implemented. Owner/admin have full platform access, shared persisted evidence, Team Review, Global Analysis, and all test surfaces. Team Review exists for owner/admin oversight of human evaluation work: evidence quality, reviewer activity, review gaps, and escalation readiness. Global Analysis is owner/admin-only platform-wide evidence inspection. It is not a tester or evaluator onboarding surface.Human onboarding posture
Status: active hardening. Eval Labs is ready for controlled human onboarding by role and assignment. Do not describe the platform as broadly production-mature or open-access. Do not describe Lucia as human-approved because the AI-reviewed platform readiness gate passed. Use:Active hardening
These areas are implemented but still being tightened, polished, or verified for rollout:- evaluator onboarding/workspace polish
- first human cohort instructions
- role-specific route verification
- Clerk-to-Supabase role-claim verification after role or RLS changes
- staged hydration behavior across large evidence sets
- clear tester-vs-evaluator assignment guidance
Deferred
Deferred means intentionally outside the current access model:- tester access to Verification Check
- tester access to Verification Results
- tester access to Controlled Batch Runner
- tester access to Team Review
- tester access to Global Analysis
- tester access to Registry Diagnostics
- tester access to Behavioral Observatory
- evaluator access to Team Review
- evaluator access to Global Analysis
Future
Future means possible later, not current behavior:- broader public or external evaluator rollout
- expanded assignment management
- additional owner/admin management tooling
- deeper cohort analytics beyond current oversight surfaces
- more final evaluator UX polish
First human onboarding readiness criteria
Before a first human onboarding cohort starts:- Confirm every participant has Clerk auth access.
- Confirm
eval_labs_roleis set in Clerk public metadata. - Confirm the session token carries the role claim used by Supabase RLS.
- Confirm visible routes match the access matrix.
- Run a real prompt test and verify Supabase persistence.
- Confirm owner/admin can see shared persisted evidence where oversight applies.
- Keep testers limited to Custom Prompt Test and Auto-generated Prompt Test.
- Give evaluators only assignments that match evaluator-safe surfaces.
- Name any active-hardening caveats before the work begins.
- Repeat that AI-reviewed platform readiness is not human Lucia-quality approval.

