Evaluators create trustworthy review signal by staying inside scope, avoiding overclaiming, and escalating uncertainty.
Do not use restricted surfaces
Do not use owner/admin areas unless explicitly allowed later.
This includes Team Review, Global Analysis, Single Run Analysis, Registry Diagnostics, Behavioral Observatory, all-user analytics, cleanup/tools, and owner/admin evidence surfaces.
Use Auto-generated testing, verification, Controlled Batch Runner, and Run History only inside your assigned evaluator scope. Tester users should not use verification or Controlled Batch Runner at all.
Do not overclaim readiness
Do not say Lucia is human-approved because the AI-reviewed platform gate passed.
Correct:
Eval Labs passed the AI-reviewed platform readiness gate. Human Lucia-quality approval remains separate.
Do not pass polish
Do not pass a response only because it sounds warm, confident, or well written.
Pass it only if it worked for the human situation.
Do not change the assignment mid-run
Do not add unrelated prompts after seeing Lucia’s responses.
Do not rewrite the test to make the result look better.
Do not review runs that are not yours unless an owner/admin explicitly assigned them.
Do not invent process
Do not create your own scoring taxonomy, hidden labels, or private review rules.
Use the visible review controls. Ask when uncertain.
Do not ignore uncertainty
If something feels risky, unclear, or out of scope, pause and ask an owner/admin.