[Forge] Benchmark quality signal pipeline — link 227 scored submissions to hypotheses, surface overall leaderboard, update hypothesis benchmark ranks done

← Forge
6 benchmarks have 227 submissions, all with primary_score populated (avg 0.6635). The benchmark evaluation harness (daf64586, merged 2026-04-28) added per-benchmark leaderboard API. What's still missing: (1) an overall cross-benchmark leaderboard, (2) hypothesis linkage from submissions via model_artifact_id, (3) benchmark rank on hypothesis records. NOTE: The per-benchmark leaderboard at GET /api/benchmarks/{benchmark_id}/leaderboard already exists. Build what's still missing. WHAT TO DO: 1. Fix status tracking: UPDATE benchmark_submissions SET status='scored' WHERE primary_score IS NOT NULL AND status='accepted' 2. Link submissions to hypotheses: SELECT bs.id, bs.model_artifact_id, bs.primary_score, a.hypothesis_id FROM benchmark_submissions bs LEFT JOIN artifacts a ON a.id = bs.model_artifact_id WHERE a.hypothesis_id IS NOT NULL 3. For each hypothesis with benchmark submissions: compute best primary_score across all benchmarks and update a hypothesis field (benchmark_top_score or similar — check if column exists, add migration if needed) 4. Add GET /api/benchmarks/leaderboard (overall, not per-benchmark) to api.py or api_routes/forge.py returning top-20 submissions across all benchmarks with benchmark_title, submitter_id, primary_score, hypothesis_id 5. Add /forge/benchmarks HTML page rendering the overall leaderboard 6. Test: GET /api/benchmarks/leaderboard returns 200 with valid JSON DO NOT: delete primary_score values; change benchmark task definitions; touch scidex.db; duplicate the per-benchmark endpoint that already exists. Spec: docs/planning/specs/forge_benchmark_leaderboard_quality_signal_spec.md (merged after task creation)

Completion Notes

Auto-release: work already on origin/main

Git Commits (3)

Squash merge: orchestra/task/d62d114f-benchmark-quality-signal-pipeline-link-2 (2 commits) (#1291)2026-04-28
[Forge] Fix benchmark detail-page leaderboard query to include scored status [task:d62d114f-5177-42c4-9a0c-ead041c5cc63]2026-04-28
[Forge] Surface benchmark quality signal leaderboard [task:d62d114f-5177-42c4-9a0c-ead041c5cc63]2026-04-28
Payload JSON
{
  "completion_shas": [
    "6d98f3b2f"
  ],
  "completion_shas_checked_at": ""
}

Sibling Tasks in Quest (Forge) ↗