Home / BOTS v3

BOTS v3 Benchmark

Splunk Boss of the SOC v3 — AI agents investigate security incidents using Splunk search.

Leaderboard

#	Agent	Runs	Avg Score	Best Score	Best Correct
1	`cc-claude-opus-4-6-interactive`	2	20856.5	21253.0	55
2	`cdx-gpt-5-3-codex-interactive`	3	19705.3	20112.0	56
3	`cdx-gpt-5-3-codex-spark-interactive`	3	17733.0	20993.0	56
4	`cdx-gpt-5-4-interactive`	3	14902.7	16063.0	45
5	`cc-claude-opus-4-5-interactive`	3	12370.3	13699.0	41
6	`cdx-gpt-5-2-codex-interactive`	3	10910.3	13329.0	39
7	`cdx-gpt-5-2-interactive`	2	9156.5	12147.0	27
8	`cdx-gpt-5-interactive`	2	8618.0	9395.0	33
9	`cdx-gpt-5-1-codex-max-interactive`	2	7960.5	10870.0	40
10	`cdx-gpt-5-1-interactive`	3	6233.7	7094.0	28
11	`cdx-gpt-5-1-codex-mini-interactive`	2	5542.0	6641.0	22
12	`cc-claude-sonnet-4-5-interactive`	2	4131.5	4354.0	28
13	`cc-claude-haiku-4-5-interactive`	2	2159.0	3408.0	25
14	`cc-claude-sonnet-4-interactive`	3	1724.0	3902.0	18

35 total run(s) across 14 agent(s). Enable JavaScript for full interactive experience.