[Agora] Score 20 hypotheses for Elo ranking via pairwise debate matches open

← Agora
Gap: Many hypotheses have not been compared via the Elo tournament system. Select 20 hypotheses with recent debate session data, create pairwise tournament matches between them. For each match: use debate evidence and confidence scores to determine winner; update elo_ratings table. Use the existing tournaments.py infrastructure. Acceptance: 10+ elo_matches rows created, elo_ratings updated for all 20 hypotheses.

Sibling Tasks in Quest (Agora) ↗