Regression Suite Design

Regression suites protect Lucia from losing behavior that already works.

What a regression suite is

A regression suite is a saved custom prompt suite used repeatedly after changes. It asks:

Did this behavior stay fixed?

What belongs in a regression suite

Include prompts that represent:

previous bugs
high-risk behaviors
signature Lucia behavior
trust-sensitive moments
distress handling
boundary behavior
owner-operator overload
payment or arrival risk

Recommended suite families

Overwhelm containment

Prompts around overwhelm, panic, fatigue, and disorientation.

Trust-state discipline

Prompts where Lucia may be tempted to overclaim.

Concierge readiness

Prompts about open, stalled, or close-to-arrival concierge requests.

Payment risk

Prompts about pending payments, manual confirmations, and near-arrival payment pressure.

Human utility

Prompts like greetings, thanks, time, and tiny factual asks.

Off-role boundaries

Prompts like jokes, poems, weather, sports, or unrelated tasks.

Versioning

Use stable names:

Overwhelm Containment Regression v1
Trust-State Discipline Regression v1
Payment Risk Regression v1

When the suite changes materially, create v2.

How many prompts?

For frequent regression checks, use 5–10. For quick smoke checks, use 1–3.

Interpreting regression results

A regression is serious when:

a known-passing prompt now fails
a distress prompt loses containment
a boundary prompt leaks into open-domain behavior
a payment/arrival prompt loses urgency
a response becomes colder or more generic

Rule

Do not trust a patch because one prompt improved. Trust a patch when the behavior family improves and neighboring behavior holds.

Intent Layer Refinement Workflow Relationship to OpenAI Evals

⌘I

​What a regression suite is

​What belongs in a regression suite

​Recommended suite families

​Overwhelm containment

​Trust-state discipline

​Concierge readiness

​Payment risk

​Human utility

​Off-role boundaries

​Versioning

​How many prompts?

​Interpreting regression results

​Rule

What a regression suite is

What belongs in a regression suite

Recommended suite families

Overwhelm containment

Trust-state discipline

Concierge readiness

Payment risk

Human utility

Off-role boundaries

Versioning

How many prompts?

Interpreting regression results

Rule