Behavioral Observatory is the first-class Eval Labs surface for reviewing conversations and saving structured behavioral labels. Derived suggestions can help the reviewer start, but only saved labels count as Behavioral Observatory label data.
Status
Plain-English definition
Behavioral Observatory is a first-class Eval Labs product surface for reviewing conversations and saving structured behavioral labels. It answers:- What was the human trying to do?
- How did the human feel?
- What response strategy did Lucia use?
- How human did the response feel?
- What notes should be preserved as behavioral evidence?
What it is
Behavioral Observatory is:- a conversation review surface
- a structured behavioral labeling workflow
- a place to compare the human message and Lucia’s response
- a place to save reviewer intent, affect, strategy, humanness, and notes
- a persisted evidence layer when Supabase confirms the save
What it is not
Behavioral Observatory is not:- Registry Diagnostics
- a dataset membership debugger
- a queue-routing model debugger
- a replacement for Review Queue scoring
- a claim that Lucia is globally human-approved
- a guarantee that future evaluator workflows are already supported
Difference from Registry Diagnostics
Registry Diagnostics shows derived classification suggestions from existing Eval Labs data. Behavioral Observatory lets a reviewer save structured behavioral labels. Use this rule:Derived suggestion
A derived suggestion comes from existing run/review fields. It may prefill or suggest:- intent
- guest affect
- response strategy
- humanness
Saved label
A saved label is a Behavioral Observatory label saved by a reviewer. Saved labels:- are stored in Supabase
- reload after refresh when persistence is available
- count as real Behavioral Observatory label data
- drive persisted Behavioral Observatory distributions and trends
- should be treated as intentional behavioral evidence
Label fields
Intent
Intent describes what the human was trying to do. Current values:- Booking Help
- Check-In
- Checkout
- Billing
- Noise
- Room Issue
- Concierge
- Other
Other only when the conversation does not fit the listed categories.
Guest Affect
Guest Affect describes the human’s emotional state. Current values:- Neutral
- Mildly Upset
- Upset
- Grateful
Response Strategy
Response Strategy describes what Lucia did as her main response move. Current values:- Acknowledge
- Apology
- Offer
- Escalation
Humanness
Humanness is a 1-7 judgment of how human the response felt. Current anchors:Notes
Notes preserve short behavioral evidence. Notes should explain the judgment when the structured fields alone are not enough.Good notes
Good notes are short, specific, and evidence-based. Examples:Bad notes
Bad notes are vague, personal, or not evidence-based. Examples:What happens when a label is saved
When a reviewer saves a Behavioral Observatory label:- Eval Labs confirms the selected run item can be tied to a persisted run and run item.
- The label is written to
public.eval_behavioral_labels. - The label is keyed to the run, run item, owner user, and reviewer user.
- The label status is saved unless another supported status is explicitly used.
- The UI can reload the saved label from Supabase.
- Persisted Behavioral Observatory analytics can use the saved label.
Saved / unsaved / error states
Use these states plainly:derived: suggestion only; nothing has been saved as a Behavioral Observatory labelunsaved: reviewer has changed fields but has not saved themsaving: save is in progresssaved: Supabase confirmed the labelerror: save or load failed; do not count it as persisted
Step-by-step usage
- Open
/behavioral-observatoryif your role and assignment allow it. - Select a conversation from the labeling queue.
- Read the Human message.
- Read Lucia’s response.
- Notice whether the current values are derived suggestions or a saved label.
- Set Intent.
- Set Guest Affect.
- Set Response Strategy.
- Set Humanness from 1 to 7.
- Add a short note only if it preserves useful behavioral evidence.
- Save the label.
- Confirm the saved state before treating it as persisted evidence.
Entry-level employee rule
Entry-level employees should use Behavioral Observatory only inside an approved assignment. They should:- read before clicking
- keep labels literal
- use the smallest truthful affect
- choose the dominant response strategy
- write short notes
- ask when uncertain
- invent new label categories
- treat derived suggestions as truth
- use Registry Diagnostics as a label workflow
- claim a saved label means Lucia is globally approved

