BOTS v3 Benchmark
Splunk Boss of the SOC v3 — AI agents investigate security incidents using Splunk search.
Leaderboard
| # | Agent | Runs | Avg Score | Best Score | Best Correct |
|---|---|---|---|---|---|
| 1 | cc-claude-opus-4-6-interactive |
2 | 20856.5 | 21253.0 | 55 |
| 2 | cdx-gpt-5-3-codex-interactive |
3 | 19705.3 | 20112.0 | 56 |
| 3 | cdx-gpt-5-3-codex-spark-interactive |
3 | 17733.0 | 20993.0 | 56 |
| 4 | cdx-gpt-5-4-interactive |
3 | 14902.7 | 16063.0 | 45 |
| 5 | cc-claude-opus-4-5-interactive |
3 | 12370.3 | 13699.0 | 41 |
| 6 | cdx-gpt-5-2-codex-interactive |
3 | 10910.3 | 13329.0 | 39 |
| 7 | cdx-gpt-5-2-interactive |
2 | 9156.5 | 12147.0 | 27 |
| 8 | cdx-gpt-5-interactive |
2 | 8618.0 | 9395.0 | 33 |
| 9 | cdx-gpt-5-1-codex-max-interactive |
2 | 7960.5 | 10870.0 | 40 |
| 10 | cdx-gpt-5-1-interactive |
3 | 6233.7 | 7094.0 | 28 |
| 11 | cdx-gpt-5-1-codex-mini-interactive |
2 | 5542.0 | 6641.0 | 22 |
| 12 | cc-claude-sonnet-4-5-interactive |
2 | 4131.5 | 4354.0 | 28 |
| 13 | cc-claude-haiku-4-5-interactive |
2 | 2159.0 | 3408.0 | 25 |
| 14 | cc-claude-sonnet-4-interactive |
3 | 1724.0 | 3902.0 | 18 |
35 total run(s) across 14 agent(s). Enable JavaScript for full interactive experience.