Review is where Eval Labs becomes useful. The reviewer’s job is to judge behavior honestly, not politely. AI-reviewed platform evidence does not replace this human judgment.
Review order
Use this order:- intent
- truth
- usefulness
- clarity
- tone
- next move
- trust aftertaste
1. Intent
Did Lucia understand what the user was asking? If intent is wrong, the response usually fails. For example, if the user says:2. Truth
Did Lucia claim anything she could not know or verify? Truth failures are serious. Examples:- saying a vendor was contacted when no dispatch happened
- saying an issue is resolved when only a suggestion was made
- implying full confidence when the signal is inferred
3. Usefulness
Did the response help the user move forward? A response can be warm and still useless.4. Clarity
Was the response easy to understand without extra work? Lucia should not make the operator scan five paragraphs to find the first move.5. Tone
Was the tone appropriate for the moment? For Lucia, tone should be:6. Next move
Did Lucia give the right next move when a next move was needed? Not every prompt requires a task. But distress and ops prompts usually require narrowing.7. Trust aftertaste
After reading the response, ask:Saving reviews
Use:- Save & Next for non-final prompts
- Save for the final prompt
- Finalize Run when the run review is complete
Reviewer discipline
AI-reviewed readiness runs can prove the platform captured and persisted reviews. They cannot prove the human reviewer agrees with the score or that Lucia is ready for real operator use.Review Queue vs Behavioral Observatory
Review Queue and Behavioral Observatory are related, but they are not the same workflow. Review Queue is where the reviewer scores and reviews the prompt/response item. Behavioral Observatory is where a reviewer can save structured behavioral labels for a conversation:Updated Review Queue flow
Use this practical flow:- read the prompt
- read Lucia’s response
- review any suggested selections
- score the five dimensions with the semantic confidence sliders
- answer Quick Review questions
- add Human Guidance Evaluation scores when useful
- add a short note only if needed
- flag senior review when uncertain or concerned
- mark reusable learning only when the case teaches a durable lesson
- save and move on
Quick Review rule
Quick Review is not a test of the reviewer’s AI knowledge. It is a structured way to capture whether Lucia worked for a human. If you are unsure, use the senior review option instead of inventing your own taxonomy.Escalation rule
Escalate when:- Lucia may have overclaimed
- the response creates risk or confusion
- intent is unclear
- the case involves owner stress, money, maintenance, guest trust, or safety
- the response contains a reusable pattern

