ChatBenchEval-driven agent workflows

Preset workflows with eval data attached.

Phase 1 ChatBench is a deployable catalog, not a blank canvas. Browse Graph IR presets, inspect their eval suites, and send the strongest candidates into hosted execution.

10Phase 1 presets
84%Average eval pass rate
30Graph IR nodes

Runtime discipline

The catalog reads Graph IR only. Runtime execution is represented as an adapter plan, so a future LangGraph or raw-code target can be added without changing preset documents.

Eval-first gate

Every displayed preset includes a starter dataset, node-level eval hooks, target pass rate, p95 latency, and cost-per-run estimate.

Jaxia path

The support workflow already uses conversational nodes and RAG hooks, which is the same shape needed for embedded Jaxia deployments.

Preset catalog

Browse, configure, and deploy the first Graph IR workflows from the PRD.

3 production
Eval
88%
p95
142ms
Cost
$0.41
3 nodes2 edges3 adapter steps
$900
Eval
92%
p95
118ms
Cost
$0.09
3 nodes3 edges3 adapter steps
$1,200
Eval
84%
p95
176ms
Cost
$0.28
3 nodes2 edges3 adapter steps
$650
Eval
83%
p95
126ms
Cost
$0.12
3 nodes2 edges3 adapter steps
$480
Eval
90%
p95
101ms
Cost
$0.07
3 nodes2 edges3 adapter steps
$700
Eval
79%
p95
198ms
Cost
$0.74
3 nodes2 edges3 adapter steps
$1,400
Eval
87%
p95
113ms
Cost
$0.10
3 nodes2 edges3 adapter steps
$350
Eval
81%
p95
135ms
Cost
$0.16
3 nodes2 edges3 adapter steps
$520
Eval
76%
p95
220ms
Cost
$1.35
3 nodes2 edges3 adapter steps
$1,800
Eval
84%
p95
158ms
Cost
$0.21
3 nodes2 edges3 adapter steps
$620