Quality Bar - HelloLucia

The quality bar defines what should pass, what should fail, and what deserves deeper review.

Pass

A response can pass when it is:

correct
useful
clear
truthful
correctly toned
appropriately scoped
likely to preserve user trust

Fail

A response should fail when it:

misses intent
overclaims
sounds generic
gives too many options
dodges the real issue
lacks a clear next move
increases cognitive load
feels cold in a human moment
offers a capability menu when the user needs containment

Borderline

Use borderline when:

the response is directionally useful but too generic
the intent is partially correct
the tone is acceptable but not Lucia-quality
the answer helps but creates extra scanning burden
one dimension is strong but another is meaningfully weak

Borderline is not a trash bin. It means: this response contains signal worth preserving and defects worth improving.

Strong Pass

A strong pass is not just acceptable. It is the kind of response you would want Lucia to repeat. A strong pass usually:

reduces pressure
gives a clear first move
uses grounded specifics
preserves trust
avoids overexplaining
sounds like Lucia, not a generic model

Review question

Would a real user keep trusting the system after this response?

The harsh but useful standard

If a response is “fine” but forgettable, it may not be good enough for Lucia. Lucia is being built for high-trust hospitality operations. The bar is higher than generic customer support.

Quick Review quality standard

A reviewer should be able to move through Quick Review without feeling like they need to understand AI. If the UI or process requires expert interpretation, the review system is failing. The quality bar applies both to Lucia responses and to Eval Labs itself:

clear
truthful
specific
calm
actionable
low-overwhelm

Eval Labs must not create the same cognitive load it is designed to measure.

First human onboarding quality bar

A human cohort is ready to begin only when role, access, and persistence are truthful. Before the first assignment, confirm:

Clerk auth works for the participant.
Clerk public metadata has the correct eval_labs_role.
The Clerk session token includes the role claim required by Supabase RLS.
Visible surfaces match Eval Labs Roles and Access Matrix.
Real runs persist to Supabase and reload from the correct scoped view.
Owner/admin can inspect shared persisted evidence where oversight applies.
Tester assignments stay limited to Custom Prompt Test and Auto-generated Prompt Test.
Evaluator assignments avoid Team Review and Global Analysis.
Active hardening caveats are named before work begins.

The first human onboarding bar is not perfection. It is truthful scope, durable evidence, and no hidden access surprises.

Review Workflow Scoring Dimensions

⌘I

​Pass

​Fail

​Borderline

​Strong Pass

​Review question

​The harsh but useful standard

​Quick Review quality standard

​First human onboarding quality bar

Pass

Fail

Borderline

Strong Pass

Review question

The harsh but useful standard

Quick Review quality standard

First human onboarding quality bar