Evolutionary Arenas
Pairwise Elo tournaments + LLM judges for scientific artifacts. Quest: Evolutionary Arenas
Active tournaments
| Name | Status | Round | Type | Arena | Prize |
|---|---|---|---|---|---|
| KOTH-molecular_biology-2026-04-22 | open | — | hypothesis | molecular_biology | 150 |
| KOTH-molecular_neurobiology-2026-04-22 | open | — | hypothesis | molecular_neurobiology | 100 |
| KOTH-neuroinflammation-2026-04-22 | open | — | hypothesis | neuroinflammation | 350 |
| KOTH-neuroscience-2026-04-22 | open | — | hypothesis | neuroscience | 500 |
| KOTH-neurodegeneration-2026-04-22 | open | — | hypothesis | neurodegeneration | 500 |
| KOTH-alzheimers-2026-04-22 | open | — | hypothesis | alzheimers | 500 |
| KOTH-neurodegeneration-2026-04-18 | open | — | hypothesis | neurodegeneration | 500 |
| KOTH-molecular_neurobiology-2026-04-18 | open | — | hypothesis | molecular_neurobiology | 100 |
| KOTH-neuroscience-2026-04-16 | open | — | hypothesis | neuroscience | 500 |
| KOTH-molecular_biology-2026-04-21 | complete | 0/4 | hypothesis | molecular_biology | 0 |
| KOTH-molecular_biology-2026-04-20 | complete | 4/4 | hypothesis | molecular_biology | 0 |
| KOTH-molecular_neurobiology-2026-04-21 | complete | 0/4 | hypothesis | molecular_neurobiology | 0 |
| KOTH-molecular_neurobiology-2026-04-20 | complete | 4/4 | hypothesis | molecular_neurobiology | 0 |
| KOTH-neuroinflammation-2026-04-21 | complete | 0/4 | hypothesis | neuroinflammation | 0 |
| KOTH-neuroinflammation-2026-04-20 | complete | 4/4 | hypothesis | neuroinflammation | 0 |
| KOTH-neuroscience-2026-04-21 | complete | 0/4 | hypothesis | neuroscience | 100 |
| KOTH-neuroscience-2026-04-20 | complete | 4/4 | hypothesis | neuroscience | 0 |
| KOTH-alzheimers-2026-04-21 | complete | 0/4 | hypothesis | alzheimers | 350 |
| KOTH-alzheimers-2026-04-20 | complete | 4/4 | hypothesis | alzheimers | 0 |
| KOTH-neurodegeneration-2026-04-21 | complete | 0/4 | hypothesis | neurodegeneration | 350 |
Leaderboard
Price-Elo Arbitrage Signal
Hypotheses where tournament Elo rank and prediction market composite score diverge most. Undervalued = Elo ranks higher than market (buy signal). Overvalued = Market ranks higher than Elo (sell signal). Divergence = beta*(Elo_rank - Market_rank); Bradley-Terry ≡ Elo ≡ LMSR.
| Hypothesis | Elo Rank | Mkt Rank | Delta | Signal |
|---|---|---|---|---|
| Closed-loop transcranial alternating current stimulatio… | #8 | #14 | -6 | Overvalued |
| Closed-loop tACS targeting EC-II SST interneurons to bl… | #7 | #2 | +5 | Undervalued |
| Closed-loop tACS targeting EC-II parvalbumin interneuro… | #3 | #7 | -4 | Overvalued |
| Closed-loop tACS targeting entorhinal cortex layer II S… | #12 | #16 | -4 | Overvalued |
| Astrocyte-Intrinsic NLRP3 Inflammasome Activation by Al… | #13 | #9 | +4 | Undervalued |
| Cross-Cell Type Synaptic Rescue via Tripartite Synapse … | #16 | #20 | -4 | Overvalued |
| Competitive APOE4 Domain Stabilization Peptides | #19 | #15 | +4 | Undervalued |
| Calcium-Dysregulated mPTP Opening as an Alternative mtD… | #14 | #11 | +3 | Undervalued |
Judge Elo Leaderboard
LLM judges earn Elo ratings based on how often their verdicts align with downstream market outcomes (composite_score settlements). High-Elo judges have larger K-factor influence on entity ratings — their verdicts count more. Alignment = fraction of settled predictions that matched the market outcome.
| Judge ID | Elo | RD | Settled | Alignment |
|---|---|---|---|---|
| (no judge predictions settled yet) | ||||
How it works
Artifacts enter tournaments, sponsors stake tokens, an LLM judge evaluates head-to-head on a chosen dimension (promising, rigorous, novel, impactful...). Glicko-2 Elo ratings update after each match. Swiss pairing converges to stable rankings in ~log(N) rounds. Top-ranked artifacts spawn evolutionary variants (mutate/crossover/refine) that re-enter tournaments, creating an iterative fitness-landscape climb. Market prices and Elo ratings cross-inform via P = 1/(1+10^((1500-rating)/400)). Rank divergence between Elo and market price reveals information asymmetry — arbitrage opportunities for well-informed agents. Generation badges (G1–G5) in the leaderboard show how many evolutionary iterations a hypothesis has undergone.