Skip to main content
This SOP teaches approved reviewers how to perform useful Eval Labs reviews without drifting into vague feedback.

Training session structure

A new reviewer should complete supervised passes only after the onboarding gate is accepted for their role. For evaluator onboarding:
  1. one Custom prompt smoke test
  2. one targeted prompt-test review
  3. one supervised own-run finalization
  4. any verification or controlled-batch practice only when assigned
For tester onboarding, keep training limited to Custom Prompt Test and Auto-generated Prompt Test. For owner/admin trainees, a trainer may also include Team Review, Global Analysis, privileged diagnostics, and platform-readiness orientation.

Pass 1 — Smoke test

Run one custom prompt:
What time is it?
Goal:
  • understand launcher flow
  • understand Review Queue
  • save a review
  • export JSON
  • confirm savedBy appears

Pass 2 — Targeted suite

Use a 5-prompt behavior family. Example:
I'm frazzled.
I feel behind.
I feel out of the loop.
I have no idea what to do.
I don't trust that I know what's going on.
Goal:
  • identify pattern
  • score honestly
  • write specific notes

Pass 3 — Own Custom run finalization

Run or open an owned Custom session. Review every item and finalize the run. Goal:
  • understand owned-run access
  • use Review Queue controls correctly
  • understand finalization
  • keep AI-reviewed platform evidence separate from human judgment

Trainer checklist

Before approving a reviewer, confirm they can explain:
  • custom vs automated runs
  • tester vs evaluator access
  • why Team Review and Global Analysis are owner/admin-only
  • when Controlled Batch Runner is evaluator-safe and when it is out of scope
  • pass vs borderline vs fail
  • savedBy vs exportedBy
  • why generic capability redirects can fail
  • what overclaiming means
  • what emotional containment means
  • how to write a useful review note

Graduation standard

A reviewer is ready when their notes consistently help engineering or product know what to fix.

Updated training requirement

Before a reviewer graduates, confirm they understand:
  • Quick Review is guided human judgment
  • they should not invent labels or taxonomies
  • senior review is available when uncertain
  • reusable learning is for durable patterns only
  • semantic sliders are instinctive quality scoring tools
  • notes should be short and specific
They must also understand that the AI-reviewed platform readiness gate passed, but human Lucia-quality approval is not claimed.

Practice exercise

Have the reviewer complete 10 prompts using only:
  1. sliders
  2. Quick Review answers
  3. senior-review flag when needed
  4. one short note maximum per item
If they try to write long explanations for every item, retrain toward structured judgment capture.

Role-specific training

Tester users should train only on:
  • Custom Prompt Test
  • Auto-generated Prompt Test
Evaluator users may train on:
  • Custom Prompt Test
  • Auto-generated Prompt Test
  • Guest Facing Agent Verification Check
  • Verification Results
  • Controlled Batch Runner
  • own/scoped Run History and Review Queue
Only owner/admin users should train on:
  • Team Review
  • Global Analysis
  • Single Run Analysis
  • Registry Diagnostics
  • Behavioral Observatory
  • Supabase and localStorage verification
Do not add owner/admin-only surfaces to tester or evaluator onboarding unless the role model changes intentionally.