coeval · co-eval-task-001 benchmark
phase: — round: — — —:— UTC

Phase

phase
round
paused
participants
FOSS
non-FOSS

Winners (v3.1 final, rolling-mean pick)

FOSSn ≥ 50 samples

non-FOSSclosed-AI reference

Final-report bundle on protondrive:co-eval-task-001-final-report-2026-05-20/ — main draft (FOSS) + Appendix A (Cheuk stats from both winners) + Appendix B (non-FOSS draft).

Leaderboard (rolling mean across all rounds, n ≥ 50)

# DE group mean std n bar
loading…

Convergence (tmean_clean per round-pair)

Lower = closer to converging. v3.1 cleaned: post phantom-DE cleanup. Threshold rule fires on tmean_clean ≤ 0.063 (= 2π / 100) for 3 consecutive pairs.

v3.1 (cleaned) v3.2 / current run convergence threshold (0.063)