Connecting...
Starting benchmark...
0 / 0
Benchmark Configuration
Challenges
Loading challenges...
Mode & Tiers
Simple Direct exploit
CoT Discover → Plan → Execute
GEPA Self-directed optimization
Benchmark Runs
Loading...
All Results
Challenge ↕ Level ↕ Tags Tier ↕ Model ↕ Result ↕ Time ↕ Att ↕ Mode ↕ Actions
Statistics
By Tier
By Model
By Difficulty
By Tag
Whitebox vs Blackbox
Model Comparison
Overall Ranking

Weighted scoring: Easy=1pt, Medium=2pt, Hard=3pt

Best by Tier
Best by Difficulty
Best by Tag
Best by Mode
Best in Class