Eval Labs is the internal evaluation system used to test and improve Lucia’s behavior. Evaluators help decide whether Lucia is useful for humans, not whether the platform merely ran.
The plain version
Eval Labs helps the team test Lucia against real behavioral expectations. It captures:- prompts
- Lucia responses
- human review
- scores
- notes
- final run state
- role and scope context
- Supabase-backed run evidence when persistence succeeds
What evaluators are judging
You are judging whether Lucia worked for the human situation in front of her. Ask:- Did Lucia understand the prompt?
- Was the response truthful?
- Was it useful?
- Was it clear?
- Was the tone right for the moment?
- Did it reduce confusion or add to it?
- Would a real operator trust Lucia more after reading it?

