Product Architecture

Eval Labs is a separate role-based human evaluation platform that tests Lucia through the deployed Engine, stores review evidence in Supabase, and exposes scoped testing, run-history, review, Team Review, and Global Analysis surfaces.

High-level architecture

Employee / ReviewerEval Labs web appClerk role and Supabase RLS scopeLucia Engine /admin/operator-focusLucia responseEval Labs Review QueueSuggested selections plus human reviewQuick Review / Human Guidance EvaluationLifecycle finalizationSupabase persistenceRun History / Team Review / Global Analysis / Exports

Runtime responsibility split

Eval Labs owns:

top app shell and route identity
test launcher UX
custom prompt suite UX
auto-generated prompt tester UX
guest-facing verification check and results UX
controlled batch runner UX
run orchestration
Run History
Team Review
Global Analysis
Single Run Analysis
copy Session ID / copy Deep Link controls
role-gated product access
Clerk public metadata role behavior
Supabase RLS role-claim requirements
Review Queue
suggested review generation
human ratings
semantic scoring sliders
Quick Review
Human Guidance Evaluation
review lifecycle and finalization
dirty / completion state
tester identity capture
exports
Supabase persistence for eval data
staged hydration from run summaries to recent/deep evidence
localStorage compaction for completed cloud-backed runs

Lucia Engine owns:

actual Lucia behavior
intent/routing
response generation
emotional containment
operational prioritization
model gateway behavior

Eval Labs does not decide Lucia’s response quality. It records and evaluates the response Lucia produced. AI-reviewed platform evidence proves that the Eval Labs lifecycle works. Human reviewers still decide Lucia behavioral quality.

Current Engine target

Eval Labs endpoint selection is environment-configured through VITE_LUCIA_EVAL_ENDPOINT. The current Lucia v0.1.3.6 validation target for active dev refinement is:

https://api-dev.hellolucia.ai/admin/operator-focus

Development is where active iteration happens. Staging is for promoted validation only when intentionally configured.

Source of truth hierarchy

When debugging Eval Labs platform behavior:

Browser Network request URL
Current route and role state
Supabase rows and counts
Run History / Analysis UI truth
localStorage diagnostics
Render service environment
Netlify environment variables
Lucia Engine deployed commit
Eval Labs deployed commit
Exported run metadata
Human memory

Human memory is useful. It is not the source of truth.

Current route architecture

The current route map is documented in Product Surfaces and Route Map. Core canonical paths:

/                           Owner/Admin Home dashboard
/lucia/launcher             workspace chooser
/lucia/custom               Custom prompt tester
/lucia/auto-generated       Auto-generated 50-prompt tester
/guest-facing/verification  Guest Facing Agent Verification Check
/lucia/batch-runner         Controlled Batch Runner
/lucia/automated/runs       Run History
/team-review                Team Review
/analysis                   Global Analysis
/analysis/runs/:sessionId   Single Run Analysis
/runs/:sessionId/review     Review Queue

Legacy aliases:

/lucia/automated            alias to /lucia/auto-generated
/experiments                alias to /analysis

Current role architecture

Role gating is documented in Role and Access Model. Current supported Clerk public metadata values:

owner
admin
evaluator
tester

Owner/admin are privileged roles with full platform access, Team Review, Global Analysis, shared persisted evidence, and all test surfaces. Evaluator is the full evaluator workbench role. Evaluator can use evaluator-safe test surfaces and own run/review/history routes, but cannot use Team Review or Global Analysis. Tester is the entry-level prompt-testing lane. Tester can use Custom Prompt Test and Auto-generated Prompt Test, but not verification, controlled batch, Team Review, Global Analysis, Registry Diagnostics, Behavioral Observatory, or owner/admin tools. Missing or unknown role metadata should fail closed. Frontend role behavior is driven by Clerk public metadata. Persisted evidence access depends on the Clerk session token carrying eval_labs_role so Supabase RLS can recognize privileged owner/admin access.

Important design decision

The custom prompt feature did not require a separate database model because Eval Labs already had a general structure:

SessionRun itemsLucia responsesHuman reviews

Custom prompts are a new run source, not a new evaluation universe. That is good architecture.

Current run source strategy

Custom runs use:

mode: automated
runSource: custom
category: custom/prompts
templateKey: custom-prompt

This preserves compatibility with the existing run engine while clearly distinguishing custom runs from generated automated runs. Controlled batch runs use the same platform lifecycle to create, execute, review, finalize, persist, and verify runs. They are operational readiness evidence, not a separate human-review standard. Guest Facing Agent Verification Check is a separate app surface for booked-guest verification behavior and results. It is evaluator-safe but not tester-facing.

Review-layer architecture

Eval Labs now separates review responsibility into layers:

Review Queue UISuggested review valuesEmployee Review fieldsHuman Guidance EvaluationReview State / Escalation flagsLifecycle / dirty / completion stateAdjudication metadataExports / Analysis

The schema supports high-resolution analysis while the employee UI remains simple. This is intentional. The user-facing review experience should remain calm and guided even when the exported data is detailed.

​High-level architecture

​Runtime responsibility split

​Current Engine target

​Source of truth hierarchy

​Current route architecture

​Current role architecture

​Important design decision

​Current run source strategy

​Review-layer architecture

High-level architecture

Runtime responsibility split

Current Engine target

Source of truth hierarchy

Current route architecture

Current role architecture

Important design decision

Current run source strategy

Review-layer architecture