This is the simple operator guide for the major Eval Labs surfaces. Use it to avoid confusing diagnostic pages, review pages, analysis pages, and persistence truth.
Status labels
Use these words exactly:A. Home
Use Home when you need the owner/admin overview. Home shows:- platform status
- recent activity
- quick access to major surfaces
- product state
- readiness or evidence summaries when available
B. Registry Diagnostics
Use Registry Diagnostics when you need to inspect derived classification behavior. Route:- Check how many datasets exist.
- Check how many runs are included.
- Check how many suggested memberships exist.
- Check confidence breakdowns.
- Treat the whole page as diagnostic.
- Read the dataset name.
- Read the suggested membership count.
- Check confidence.
- Check source fields.
- Ask whether the evidence is real or thin.
- Read the suggested lane.
- Check which items triggered it.
- Check confidence.
- Look for broad or weak matches.
- Look for overmatching.
- Look for weak low-confidence matches.
- Look for items suggested for too many datasets.
- Write down issues for product or engineering.
- Do not assume dataset membership is final.
- Do not assume queue routing is final.
- Do not assume a human saved a label.
- Do not use this page as Behavioral Observatory.
C. Behavioral Observatory
Use Behavioral Observatory when you need to review conversations and save behavioral labels. Route:- Confirm your role and assignment allow access.
- Confirm the data shown is from the scoped run context you intend to review.
- Notice whether the selected conversation is showing a derived suggestion or a saved label.
- Open the labeling queue.
- Pick one conversation.
- Read the Human message first.
- Read Lucia’s response second.
- Choose what the human was trying to do.
- Use
Otheronly when the listed categories do not fit.
- Choose the smallest truthful emotional read.
- Do not dramatize the guest’s state.
- Choose Lucia’s main response move.
- Pick the dominant strategy, not every strategy present.
- Add a note only when it preserves useful behavioral evidence.
- Keep it short.
- Name the behavior and why it matters.
- Click Save label or Save updates.
- Wait for the saved state.
- If the state is error, do not count the label as persisted.
- Refresh the page.
- Confirm the label reloads.
- If it does not reload, treat the label as not verified.
D. Guest Facing Agent Verification
Use Guest Facing Agent Verification when you need to run or inspect booked-guest verification behavior. Routes:- running the scenario pack from the app surface
- inspecting pass/fail results
- reviewing failure details
- exporting or copying verification summaries
- a tester lane
- Team Review
- Global Analysis
- proof that Lucia is human-approved
E. Team Review
Use Team Review when owner/admin needs oversight of evaluator activity and review quality. Route:- inspecting evaluator activity
- finding missing checks
- spotting flags and failures
- reviewing recent human-evaluation signal
- deciding what needs owner/admin attention
- evaluator productivity tracking for its own sake
- tester onboarding
- a human approval page
F. Global Analysis
Use Global Analysis when you need read-only behavioral and analytics evidence. Route:- inspecting completed run evidence
- reading behavioral summaries
- comparing patterns
- opening Single Run Analysis when available
- a human approval page
- a Behavioral Observatory label-save workflow
- Registry Diagnostics
G. Run History
Use Run History when you need the run ledger. Route:- finding completed/finalized runs
- checking run lifecycle truth
- copying run/session identifiers
- opening review or analysis routes when allowed
- proof that a response was good
- proof that Behavioral Observatory labels exist
- proof that Lucia is human-approved

