Skip to main content
This page explains the safe first-run workflow for approved reviewers. It does not replace the employee onboarding gate.

Before you begin

Make sure you know which testing path you are using:
Custom Prompt Test = targeted refinement
Auto-generated 50-Prompt Test = broad regression coverage
Guest Facing Agent Verification Check = booked-guest verification behavior
Controlled Batch Runner = controlled platform-readiness tooling
If you are a tester, use only Custom Prompt Test or Auto-generated Prompt Test. If you are an evaluator, use only the evaluator-safe surfaces assigned for the work. Do not use Team Review, Global Analysis, Single Run Analysis, Registry Diagnostics, Behavioral Observatory, or owner/admin tools unless your role explicitly allows it.

First custom smoke test

Use this prompt:
What time is it?
Expected result:
  • Lucia responds with current time
  • run completes
  • Review Queue opens
  • no transport failure
  • export contains runSource: custom
For evaluator and tester users, the run must be scoped to the signed-in user before review/finalization access is considered valid.

First real review test

Choose a small behavior family. Example:
I'm overwhelmed.
I feel behind.
I am so lost.
I feel totally out of the loop.
I don't trust that I know what's going on.
Run the suite. Then review each response.

What to do in the Review Queue

For each item:
  1. Read the prompt.
  2. Read Lucia’s response.
  3. Review any suggested selections.
  4. Score each dimension honestly.
  5. Choose Keep talking, Verdict, and Priority.
  6. Answer the Quick Review questions.
  7. Add Human Guidance Evaluation scores when useful.
  8. Write notes when something feels off.
  9. Save the review.
The last item should show Save, not Save & Next. After the last item is saved, use the completion actions:
  • Finalize Run
  • Back to Launcher

Export after reviewing

Export after review when you need to share evidence with product or engineering. Do not export only the generated responses if the goal is human review analysis. Generated-only exports are useful for debugging, but reviewed exports are stronger evidence. Reviewed exports preserve the structured review, suggested review, Employee Review, Human Guidance Evaluation, adjudication metadata, lifecycle state, tester identity, and dirty/completion state.

Finalize Evaluation

finalize-run-back-to-launcher Finalize run and back to launcher action buttons. final-prompt-save-button Final item in Review Queue with Save button instead of Save & Next button. Finalize only after every prompt has been reviewed. Finalization marks the run lifecycle; it does not replace the per-prompt review data.

Not part of first tester workflow

Tester users should not use:
  • Guest Facing Agent Verification Check
  • Controlled Batch Runner
  • Run History/global analytics
  • Team Review
  • Global Analysis
  • Single Run Analysis
  • owner/admin Home dashboard
Evaluator users should use verification, controlled batch, and scoped Run History only when assigned.