Behavioral Observatory

Behavioral Observatory is the first-class Eval Labs surface for reviewing conversations and saving structured behavioral labels. Derived suggestions can help the reviewer start, but only saved labels count as Behavioral Observatory label data.

Status

Behavioral Observatory surface: implemented
Behavioral label controls: implemented
Behavioral label persistence: persisted when Supabase table is applied and save succeeds
Derived context: derived
Saved Behavioral Observatory label: persisted
Entry-level employee rollout: future / access-dependent
Evaluator-reviewing-owner-run workflow: deferred

Canonical route:

/behavioral-observatory

Current access intent is owner/admin. Broader employee use requires approved access and security decisions.

Plain-English definition

Behavioral Observatory is a first-class Eval Labs product surface for reviewing conversations and saving structured behavioral labels. It answers:

What was the human trying to do?
How did the human feel?
What response strategy did Lucia use?
How human did the response feel?
What notes should be preserved as behavioral evidence?

What it is

Behavioral Observatory is:

a conversation review surface
a structured behavioral labeling workflow
a place to compare the human message and Lucia’s response
a place to save reviewer intent, affect, strategy, humanness, and notes
a persisted evidence layer when Supabase confirms the save

This is the surface where saved Behavioral Observatory labels become real behavioral data.

What it is not

Behavioral Observatory is not:

Registry Diagnostics
a dataset membership debugger
a queue-routing model debugger
a replacement for Review Queue scoring
a claim that Lucia is globally human-approved
a guarantee that future evaluator workflows are already supported

Behavioral Observatory labels are specific to the reviewed run item and reviewer.

Difference from Registry Diagnostics

Registry Diagnostics shows derived classification suggestions from existing Eval Labs data. Behavioral Observatory lets a reviewer save structured behavioral labels. Use this rule:

Registry Diagnostics explains what the classification model thinks.
Behavioral Observatory records what the reviewer intentionally saved.

Derived suggestion

A derived suggestion comes from existing run/review fields. It may prefill or suggest:

intent
guest affect
response strategy
humanness

Derived suggestions are useful starting points. They are not final human judgment. They are not saved Behavioral Observatory labels.

Saved label

A saved label is a Behavioral Observatory label saved by a reviewer. Saved labels:

are stored in Supabase
reload after refresh when persistence is available
count as real Behavioral Observatory label data
drive persisted Behavioral Observatory distributions and trends
should be treated as intentional behavioral evidence

Label fields

Intent

Intent describes what the human was trying to do. Current values:

Booking Help
Check-In
Checkout
Billing
Noise
Room Issue
Concierge
Other

Use Other only when the conversation does not fit the listed categories.

Guest Affect

Guest Affect describes the human’s emotional state. Current values:

Neutral
Mildly Upset
Upset
Grateful

Do not over-dramatize affect. Mark the smallest truthful emotional read.

Response Strategy

Response Strategy describes what Lucia did as her main response move. Current values:

Acknowledge
Apology
Offer
Escalation

Choose the dominant strategy, not every strategy present in the text.

Humanness

Humanness is a 1-7 judgment of how human the response felt. Current anchors:

= Template
= Functional
= Warm + Specific

Do not use humanness as a general pass/fail score. A response can feel warm and still fail truth or usefulness.

Notes

Notes preserve short behavioral evidence. Notes should explain the judgment when the structured fields alone are not enough.

Good notes

Good notes are short, specific, and evidence-based. Examples:

Good: Guest sounds anxious about check-in; Lucia gave one clear access-code next step.
Good: Apology is appropriate, but no operational next move was offered.
Good: Warm and specific, but implies the team already acted when only a suggestion exists.

Good notes name the behavior and the reason it matters.

Bad notes

Bad notes are vague, personal, or not evidence-based. Examples:

Bad: Sounds good.
Bad: I like this one.
Bad: Weird vibe.
Bad: Make it more AI.

Bad notes create noise. If the structured fields already tell the story, leave notes brief or empty.

What happens when a label is saved

When a reviewer saves a Behavioral Observatory label:

Eval Labs confirms the selected run item can be tied to a persisted run and run item.
The label is written to public.eval_behavioral_labels.
The label is keyed to the run, run item, owner user, and reviewer user.
The label status is saved unless another supported status is explicitly used.
The UI can reload the saved label from Supabase.
Persisted Behavioral Observatory analytics can use the saved label.

If the save fails, the label should not be treated as persisted.

Saved / unsaved / error states

Use these states plainly:

derived: suggestion only; nothing has been saved as a Behavioral Observatory label
unsaved: reviewer has changed fields but has not saved them
saving: save is in progress
saved: Supabase confirmed the label
error: save or load failed; do not count it as persisted

Step-by-step usage

Open /behavioral-observatory if your role and assignment allow it.
Select a conversation from the labeling queue.
Read the Human message.
Read Lucia’s response.
Notice whether the current values are derived suggestions or a saved label.
Set Intent.
Set Guest Affect.
Set Response Strategy.
Set Humanness from 1 to 7.
Add a short note only if it preserves useful behavioral evidence.
Save the label.
Confirm the saved state before treating it as persisted evidence.

Entry-level employee rule

Entry-level employees should use Behavioral Observatory only inside an approved assignment. They should:

read before clicking
keep labels literal
use the smallest truthful affect
choose the dominant response strategy
write short notes
ask when uncertain

They should not:

invent new label categories
treat derived suggestions as truth
use Registry Diagnostics as a label workflow
claim a saved label means Lucia is globally approved

Canon rule

Derived context helps the reviewer start.
Saved Behavioral Observatory labels are intentional behavioral evidence.

​Status

​Plain-English definition

​What it is

​What it is not

​Difference from Registry Diagnostics

​Derived suggestion

​Saved label

​Label fields

​Intent

​Guest Affect

​Response Strategy

​Humanness

​Notes

​Good notes

​Bad notes

​What happens when a label is saved

​Saved / unsaved / error states

​Step-by-step usage

​Entry-level employee rule

​Canon rule

Status

Plain-English definition

What it is

What it is not

Difference from Registry Diagnostics

Derived suggestion

Saved label

Label fields

Intent

Guest Affect

Response Strategy

Humanness

Notes

Good notes

Bad notes

What happens when a label is saved

Saved / unsaved / error states

Step-by-step usage

Entry-level employee rule

Canon rule