Running Your First Custom Eval

Custom Prompt Test is the safest first evaluator smoke path. Start small, keep the prompt set scoped, and review the run you created.

Before you run

Confirm:

your owner/admin assigned the work
you are using the Custom eval surface
your prompts are in scope
you understand what behavior you are testing

Use /lucia/custom. Do not start from Team Review, Global Analysis, Registry Diagnostics, Behavioral Observatory, or owner/admin tools. Auto-generated testing, verification, and controlled batch work should happen only when your assignment calls for those surfaces.

First smoke test

For a basic platform check, use one simple prompt:

What time is it?

Expected flow:

Open /lucia/custom.
Enter the prompt.
Run the Custom eval.
Wait for Lucia’s response.
Continue into the Review Queue.
Review the item.
Save the review.
Finalize the run only when review is complete.

First real assignment

For assigned work, use the prompt set provided by an owner/admin. Keep prompts in the assigned behavior family. Do not add unrelated prompts mid-run. After the run completes, review every item before finalizing.

If something looks wrong

Pause and ask an owner/admin if:

you land on a blocked surface
a run is not yours
the Review Queue does not open
the response is missing
the UI asks you to use an unfamiliar tool
you are unsure whether to finalize

AI-Reviewed vs Human Review Reviewing Lucia

⌘I

​Before you run

​First smoke test

​First real assignment

​If something looks wrong

Before you run

First smoke test

First real assignment

If something looks wrong