Skip to main content
Behavioral Observatory is the first-class Eval Labs surface for reviewing conversations and saving structured behavioral labels. Derived suggestions can help the reviewer start, but only saved labels count as Behavioral Observatory label data.

Status

Behavioral Observatory surface: implemented
Behavioral label controls: implemented
Behavioral label persistence: persisted when Supabase table is applied and save succeeds
Derived context: derived
Saved Behavioral Observatory label: persisted
Entry-level employee rollout: future / access-dependent
Evaluator-reviewing-owner-run workflow: deferred
Canonical route:
/behavioral-observatory
Current access intent is owner/admin. Broader employee use requires approved access and security decisions.

Plain-English definition

Behavioral Observatory is a first-class Eval Labs product surface for reviewing conversations and saving structured behavioral labels. It answers:
  • What was the human trying to do?
  • How did the human feel?
  • What response strategy did Lucia use?
  • How human did the response feel?
  • What notes should be preserved as behavioral evidence?

What it is

Behavioral Observatory is:
  • a conversation review surface
  • a structured behavioral labeling workflow
  • a place to compare the human message and Lucia’s response
  • a place to save reviewer intent, affect, strategy, humanness, and notes
  • a persisted evidence layer when Supabase confirms the save
This is the surface where saved Behavioral Observatory labels become real behavioral data.

What it is not

Behavioral Observatory is not:
  • Registry Diagnostics
  • a dataset membership debugger
  • a queue-routing model debugger
  • a replacement for Review Queue scoring
  • a claim that Lucia is globally human-approved
  • a guarantee that future evaluator workflows are already supported
Behavioral Observatory labels are specific to the reviewed run item and reviewer.

Difference from Registry Diagnostics

Registry Diagnostics shows derived classification suggestions from existing Eval Labs data. Behavioral Observatory lets a reviewer save structured behavioral labels. Use this rule:
Registry Diagnostics explains what the classification model thinks.
Behavioral Observatory records what the reviewer intentionally saved.

Derived suggestion

A derived suggestion comes from existing run/review fields. It may prefill or suggest:
  • intent
  • guest affect
  • response strategy
  • humanness
Derived suggestions are useful starting points. They are not final human judgment. They are not saved Behavioral Observatory labels.

Saved label

A saved label is a Behavioral Observatory label saved by a reviewer. Saved labels:
  • are stored in Supabase
  • reload after refresh when persistence is available
  • count as real Behavioral Observatory label data
  • drive persisted Behavioral Observatory distributions and trends
  • should be treated as intentional behavioral evidence

Label fields

Intent

Intent describes what the human was trying to do. Current values:
  • Booking Help
  • Check-In
  • Checkout
  • Billing
  • Noise
  • Room Issue
  • Concierge
  • Other
Use Other only when the conversation does not fit the listed categories.

Guest Affect

Guest Affect describes the human’s emotional state. Current values:
  • Neutral
  • Mildly Upset
  • Upset
  • Grateful
Do not over-dramatize affect. Mark the smallest truthful emotional read.

Response Strategy

Response Strategy describes what Lucia did as her main response move. Current values:
  • Acknowledge
  • Apology
  • Offer
  • Escalation
Choose the dominant strategy, not every strategy present in the text.

Humanness

Humanness is a 1-7 judgment of how human the response felt. Current anchors:
1 = Template
4 = Functional
7 = Warm + Specific
Do not use humanness as a general pass/fail score. A response can feel warm and still fail truth or usefulness.

Notes

Notes preserve short behavioral evidence. Notes should explain the judgment when the structured fields alone are not enough.

Good notes

Good notes are short, specific, and evidence-based. Examples:
Good: Guest sounds anxious about check-in; Lucia gave one clear access-code next step.
Good: Apology is appropriate, but no operational next move was offered.
Good: Warm and specific, but implies the team already acted when only a suggestion exists.
Good notes name the behavior and the reason it matters.

Bad notes

Bad notes are vague, personal, or not evidence-based. Examples:
Bad: Sounds good.
Bad: I like this one.
Bad: Weird vibe.
Bad: Make it more AI.
Bad notes create noise. If the structured fields already tell the story, leave notes brief or empty.

What happens when a label is saved

When a reviewer saves a Behavioral Observatory label:
  1. Eval Labs confirms the selected run item can be tied to a persisted run and run item.
  2. The label is written to public.eval_behavioral_labels.
  3. The label is keyed to the run, run item, owner user, and reviewer user.
  4. The label status is saved unless another supported status is explicitly used.
  5. The UI can reload the saved label from Supabase.
  6. Persisted Behavioral Observatory analytics can use the saved label.
If the save fails, the label should not be treated as persisted.

Saved / unsaved / error states

Use these states plainly:
  • derived: suggestion only; nothing has been saved as a Behavioral Observatory label
  • unsaved: reviewer has changed fields but has not saved them
  • saving: save is in progress
  • saved: Supabase confirmed the label
  • error: save or load failed; do not count it as persisted

Step-by-step usage

  1. Open /behavioral-observatory if your role and assignment allow it.
  2. Select a conversation from the labeling queue.
  3. Read the Human message.
  4. Read Lucia’s response.
  5. Notice whether the current values are derived suggestions or a saved label.
  6. Set Intent.
  7. Set Guest Affect.
  8. Set Response Strategy.
  9. Set Humanness from 1 to 7.
  10. Add a short note only if it preserves useful behavioral evidence.
  11. Save the label.
  12. Confirm the saved state before treating it as persisted evidence.

Entry-level employee rule

Entry-level employees should use Behavioral Observatory only inside an approved assignment. They should:
  • read before clicking
  • keep labels literal
  • use the smallest truthful affect
  • choose the dominant response strategy
  • write short notes
  • ask when uncertain
They should not:
  • invent new label categories
  • treat derived suggestions as truth
  • use Registry Diagnostics as a label workflow
  • claim a saved label means Lucia is globally approved

Canon rule

Derived context helps the reviewer start.
Saved Behavioral Observatory labels are intentional behavioral evidence.