Skip to main content
Use this workflow when Eval Labs reveals that Lucia is routing prompts into the wrong behavior mode.

What an intent-layer failure looks like

An intent-layer failure happens when Lucia understands the words too shallowly and chooses the wrong mode. Example:
User: I feel totally out of the loop.
Bad response: I can help with operator priorities, arrivals, concierge readiness...
The user is not asking for a capability menu. The user is expressing disorientation.

Step 1 — Create a focused suite

Use 5–10 related prompts. Example suite:
I'm generally frazzled.
I feel totally out of the loop.
Ugh, I have no idea what to do.
I am so lost.
Kids just got home and I'm overwhelmed.
I'm frustrated and behind.
I don't trust that I know what's going on.

Step 2 — Confirm the Eval Labs endpoint

Eval Labs runs against the Lucia endpoint configured in its environment. For current Lucia v0.1.3.6 validation, the active target should be:
https://api-dev.hellolucia.ai/admin/operator-focus
Staging should be used only when the validation pass has explicitly been configured or promoted for staging validation. Do not let an example env file override the intended validation target for the pass.

Step 3 — Review mode accuracy

For each response, ask:
Did Lucia choose the right behavior mode?
If the response gives a generic redirect to a distress signal, fail it.

Step 4 — Patch the correct owner

Likely owner:
operatorFocusBrain.js
Possibly later:
refineOperatorFocusOutput.js
Do not patch wording if the mode is wrong.

Step 5 — Deploy to the configured target

Local-only changes are not enough for Eval Labs. For current Lucia v0.1.3.6 validation, the Engine behavior must be deployed to dev before Eval Labs can test it. If the pass is intentionally configured for staging validation, deploy or promote to staging as part of that pass and record that choice in the review thread.

Step 6 — Re-run the exact same suite

Do not change prompts yet. Run the same suite again. Compare:
  • which prompts now route correctly
  • which still fail
  • whether any neighboring behavior regressed
  • whether tone improved or became too mushy

Step 7 — Export evidence

Export the run after saving reviews. Share the export with the engineering/product thread.

Success standard

The patch is successful only if the repeated suite shows improved behavior without creating new failures.