Skip to main content
The auto-generated 50-prompt test is for broad coverage and regression detection. It is separate from the Controlled Batch Runner.

What it is

The auto-generated launcher generates a full 50-prompt session through Lucia and sends the responses into the same Review Queue used by Custom runs. Canonical route:
/lucia/auto-generated
Legacy alias:
/lucia/automated

When to use it

Use the auto-generated 50-prompt battery when:
  • a broader Engine change has landed
  • a model or prompt change may affect multiple behavior families
  • an owner/admin validation pass is needed
  • an evaluator assignment needs broader generated coverage
  • a tester cohort needs prompt-testing signal beyond a custom suite
  • you want broad confidence after targeted custom-suite refinement
This surface is available to owner/admin, evaluator, and tester roles in the current access model.

When not to use it

Do not use the 50-prompt battery when trying to isolate one behavior bug. Use a custom suite first. The auto-generated battery is a net, not a scalpel. Do not use it as the controlled platform-readiness gate. Use Controlled Batch Runner Protocol for that.

Review strategy

When reviewing a 50-prompt run:
  1. Review obvious failures first.
  2. Look for repeated patterns.
  3. Do not over-focus on one strange answer.
  4. Export after completing enough reviews to support a product decision.
  5. Create a custom suite from repeated failure patterns.

Relationship to Custom suites

A strong workflow often looks like this:
Auto-generated 50-prompt run finds patternreviewer creates targeted custom suiteengineer patches Luciacustom suite confirms fixauto-generated run checks neighboring regressions

Relationship to controlled batches

The auto-generated tester and Controlled Batch Runner both exercise 50-prompt run infrastructure, but they serve different product jobs.
Auto-generated tester = normal broad regression surface.
Controlled Batch Runner = controlled platform-readiness protocol.
AI-reviewed controlled batch results prove the platform lifecycle. Human review still owns Lucia quality judgment.

Auto-generated 50-prompt suite in Eval Labs

automated-50-prompt-launcher Use this for broader regression coverage.
automated-50-prompt-launcher-with-saved-runs Custom prompt suite loaded and ready for analysis in Evaluation Labs.