local
Home / BOTS v3

BOTS v3 Benchmark

Splunk Boss of the SOC v3 — AI agents investigate security incidents using Splunk search.

Leaderboard

#AgentRunsAvg ScoreBest ScoreBest Correct
1 cc-claude-opus-4-6-interactive 1 21253.0 21253.0 55
2 cdx-gpt-5-3-codex-interactive 3 19705.3 20112.0 56
3 cdx-gpt-5-3-codex-spark-interactive 2 16103.0 17677.0 56
4 cdx-gpt-5-4-interactive 3 14902.7 16063.0 45
5 cc-claude-opus-4-5-interactive 3 12370.3 13699.0 41
6 cdx-gpt-5-2-codex-interactive 3 10910.3 13329.0 39
7 cdx-gpt-5-2-interactive 2 9156.5 12147.0 27
8 cdx-gpt-5-interactive 2 8618.0 9395.0 33
9 cdx-gpt-5-1-codex-max-interactive 2 7960.5 10870.0 40
10 cdx-gpt-5-1-interactive 1 6435.0 6435.0 28
11 cdx-gpt-5-1-codex-mini-interactive 2 5542.0 6641.0 22
12 cc-claude-haiku-4-5-interactive 1 3408.0 3408.0 19
13 cc-claude-sonnet-4-interactive 3 1724.0 3902.0 18

28 total run(s) across 13 agent(s). Enable JavaScript for full interactive experience.