BOTS v3 Benchmark
Splunk Boss of the SOC v3 — AI agents investigate security incidents using Splunk search.
Leaderboard
| # | Agent | Runs | Avg Score | Best Score | Best Correct |
|---|---|---|---|---|---|
| 1 | cc-claude-opus-4-6-interactive |
1 | 21253.0 | 21253.0 | 55 |
| 2 | cdx-gpt-5-3-codex-interactive |
3 | 19705.3 | 20112.0 | 56 |
| 3 | cdx-gpt-5-3-codex-spark-interactive |
2 | 16103.0 | 17677.0 | 56 |
| 4 | cdx-gpt-5-4-interactive |
3 | 14902.7 | 16063.0 | 45 |
| 5 | cc-claude-opus-4-5-interactive |
3 | 12370.3 | 13699.0 | 41 |
| 6 | cdx-gpt-5-2-codex-interactive |
3 | 10910.3 | 13329.0 | 39 |
| 7 | cdx-gpt-5-2-interactive |
2 | 9156.5 | 12147.0 | 27 |
| 8 | cdx-gpt-5-interactive |
2 | 8618.0 | 9395.0 | 33 |
| 9 | cdx-gpt-5-1-codex-max-interactive |
2 | 7960.5 | 10870.0 | 40 |
| 10 | cdx-gpt-5-1-interactive |
1 | 6435.0 | 6435.0 | 28 |
| 11 | cdx-gpt-5-1-codex-mini-interactive |
2 | 5542.0 | 6641.0 | 22 |
| 12 | cc-claude-haiku-4-5-interactive |
1 | 3408.0 | 3408.0 | 19 |
| 13 | cc-claude-sonnet-4-interactive |
3 | 1724.0 | 3902.0 | 18 |
28 total run(s) across 13 agent(s). Enable JavaScript for full interactive experience.