Team Usage Guidelines

Eval Labs is only useful when reviewers are consistent, honest, specific, and operating inside their approved access scope.

Reviewer expectations

Reviewers should be:

honest
specific
consistent
grounded in the quality bar
willing to fail polished responses
careful with emotional signals
precise in notes

Do

write clear notes
identify patterns
flag truth issues
mark uncertainty
compare against the user’s actual need
save reviews before exporting reviewed evidence
use custom suites for targeted refinement
use auto-generated runs for broader regression checks only when your role allows it
keep AI-reviewed platform readiness separate from human Lucia-quality approval
keep derived diagnostic suggestions separate from saved Behavioral Observatory labels

Do not

pass weak responses to be nice
reward fancy wording
ignore tone failures
skip notes on borderline responses
treat one lucky response as proof
mix too many behavior families into one custom suite
confuse generated-only exports with reviewed exports
treat controlled batch results as human approval of Lucia quality
treat Registry Diagnostics suggestions as saved labels
treat Behavioral Observatory labels as global Lucia approval
use owner/admin surfaces from an evaluator role

Review notes

Good notes sound like:

Correct operational priority, but Lucia missed the user's disorientation signal and did not provide containment.

Bad notes sound like:

Seems fine.

Team standard

If another teammate cannot understand your review note, it is not specific enough.

When to escalate

Escalate a pattern when:

the same failure appears across 3+ related prompts
the failure affects trust
the failure affects distress handling
the failure causes wrong operational prioritization
the failure appears after a new deploy

What not to escalate

Do not escalate a single minor wording preference unless it represents a broader pattern. Eval Labs is for product signal, not personal taste fights.

Updated reviewer guidance

Employees should prioritize speed, honesty, and consistency. Do:

use the guided controls
flag senior review when uncertain
write short notes only when they add context
mark reusable learning only when the pattern feels durable

Do not:

invent new labels
write long essays
create taxonomy language
treat personal taste as product signal
overthink every prompt

The goal is clean signal, not intellectual performance.

Access rule

Access is role-based by design. Testers should use only Custom Prompt Test and Auto-generated Prompt Test. Evaluators should use evaluator-safe test surfaces and their own run/review/history routes. Owner/admin-only surfaces include Team Review, Global Analysis, Registry Diagnostics, Behavioral Observatory, all-user analytics, cleanup/tools, and future admin/tools. Do not onboard broader employee workflows until Employee Onboarding Gate is satisfied. For the simple surface-by-surface path, read Eval Labs Step-by-Step Operator Guide.

Lucia-Specific Failure Modes Reviewer Training SOP

⌘I

​Reviewer expectations

​Do

​Do not

​Review notes

​Team standard

​When to escalate

​What not to escalate

​Updated reviewer guidance

​Access rule

Reviewer expectations

Do

Do not

Review notes

Team standard

When to escalate

What not to escalate

Updated reviewer guidance

Access rule