Review Workflow - HelloLucia

Review is where Eval Labs becomes useful. The reviewer’s job is to judge behavior honestly, not politely. AI-reviewed platform evidence does not replace this human judgment.

Review order

Use this order:

intent
truth
usefulness
clarity
tone
next move
trust aftertaste

1. Intent

Did Lucia understand what the user was asking? If intent is wrong, the response usually fails. For example, if the user says:

I feel totally out of the loop.

Lucia should not respond with a generic capability menu. That is likely an intent-layer miss.

2. Truth

Did Lucia claim anything she could not know or verify? Truth failures are serious. Examples:

saying a vendor was contacted when no dispatch happened
saying an issue is resolved when only a suggestion was made
implying full confidence when the signal is inferred

3. Usefulness

Did the response help the user move forward? A response can be warm and still useless.

4. Clarity

Was the response easy to understand without extra work? Lucia should not make the operator scan five paragraphs to find the first move.

5. Tone

Was the tone appropriate for the moment? For Lucia, tone should be:

warm
calm
specific
not robotic
not therapy-bot

6. Next move

Did Lucia give the right next move when a next move was needed? Not every prompt requires a task. But distress and ops prompts usually require narrowing.

7. Trust aftertaste

After reading the response, ask:

Do I trust Lucia more, less, or the same?

If the answer is less, write down why.

Saving reviews

Use:

Save & Next for non-final prompts
Save for the final prompt
Finalize Run when the run review is complete

Finalization marks the run lifecycle. It does not replace per-prompt review data.

Reviewer discipline

Do not pass a response just because it sounds smart. Pass it only if it works.

AI-reviewed readiness runs can prove the platform captured and persisted reviews. They cannot prove the human reviewer agrees with the score or that Lucia is ready for real operator use.

Review Queue vs Behavioral Observatory

Review Queue and Behavioral Observatory are related, but they are not the same workflow. Review Queue is where the reviewer scores and reviews the prompt/response item. Behavioral Observatory is where a reviewer can save structured behavioral labels for a conversation:

Intent
Guest Affect
Response Strategy
Humanness
Notes

Registry Diagnostics is separate again. It shows derived dataset and queue-lane suggestions, not saved human labels.

Updated Review Queue flow

Use this practical flow:

read the prompt
read Lucia’s response
review any suggested selections
score the five dimensions with the semantic confidence sliders
answer Quick Review questions
add Human Guidance Evaluation scores when useful
add a short note only if needed
flag senior review when uncertain or concerned
mark reusable learning only when the case teaches a durable lesson
save and move on

If the assignment includes Behavioral Observatory, use the saved-label workflow after reading the conversation carefully. Do not copy derived suggestions blindly.

Quick Review rule

Quick Review is not a test of the reviewer’s AI knowledge. It is a structured way to capture whether Lucia worked for a human. If you are unsure, use the senior review option instead of inventing your own taxonomy.

Escalation rule

Escalate when:

Lucia may have overclaimed
the response creates risk or confusion
intent is unclear
the case involves owner stress, money, maintenance, guest trust, or safety
the response contains a reusable pattern

Controlled Batch Runner Protocol Quality Bar

⌘I

​Review order

​1. Intent

​2. Truth

​3. Usefulness

​4. Clarity

​5. Tone

​6. Next move

​7. Trust aftertaste

​Saving reviews

​Reviewer discipline

​Review Queue vs Behavioral Observatory

​Updated Review Queue flow

​Quick Review rule

​Escalation rule

Review order

1. Intent

2. Truth

3. Usefulness

4. Clarity

5. Tone

6. Next move

7. Trust aftertaste

Saving reviews

Reviewer discipline

Review Queue vs Behavioral Observatory

Updated Review Queue flow

Quick Review rule

Escalation rule