SciDEX — Task: [Forge] Benchmark quality signal pipeline

6 benchmarks have 227 submissions, all with primary_score populated (avg 0.6635). The benchmark evaluation harness (daf64586, merged 2026-04-28) added per-benchmark leaderboard API. What's still missing: (1) an overall cross-benchmark leaderboard, (2) hypothesis linkage from submissions via model_artifact_id, (3) benchmark rank on hypothesis records. NOTE: The per-benchmark leaderboard at GET /api/benchmarks/{benchmark_id}/leaderboard already exists. Build what's still missing. WHAT TO DO: 1. Fix status tracking: UPDATE benchmark_submissions SET status='scored' WHERE primary_score IS NOT NULL AND status='accepted' 2. Link submissions to hypotheses: SELECT bs.id, bs.model_artifact_id, bs.primary_score, a.hypothesis_id FROM benchmark_submissions bs LEFT JOIN artifacts a ON a.id = bs.model_artifact_id WHERE a.hypothesis_id IS NOT NULL 3. For each hypothesis with benchmark submissions: compute best primary_score across all benchmarks and update a hypothesis field (benchmark_top_score or similar — check if column exists, add migration if needed) 4. Add GET /api/benchmarks/leaderboard (overall, not per-benchmark) to api.py or api_routes/forge.py returning top-20 submissions across all benchmarks with benchmark_title, submitter_id, primary_score, hypothesis_id 5. Add /forge/benchmarks HTML page rendering the overall leaderboard 6. Test: GET /api/benchmarks/leaderboard returns 200 with valid JSON DO NOT: delete primary_score values; change benchmark task definitions; touch scidex.db; duplicate the per-benchmark endpoint that already exists. Spec: docs/planning/specs/forge_benchmark_leaderboard_quality_signal_spec.md (merged after task creation)

Completion Notes

Auto-release: work already on origin/main

Git Commits (3)

Squash merge: orchestra/task/d62d114f-benchmark-quality-signal-pipeline-link-2 (2 commits) (#1291)2026-04-28

[Forge] Fix benchmark detail-page leaderboard query to include scored status [task:d62d114f-5177-42c4-9a0c-ead041c5cc63]2026-04-28

[Forge] Surface benchmark quality signal leaderboard [task:d62d114f-5177-42c4-9a0c-ead041c5cc63]2026-04-28

Payload JSON

{
  "completion_shas": [
    "6d98f3b2f"
  ],
  "completion_shas_checked_at": ""
}

Sibling Tasks in Quest (Forge) ↗

○[Forge] Integrate tools with debate engineP95

○[Forge] Reproducible analysis capsules and artifact supply chainP93

○[Forge] Benchmark answer-key migration to dataset registry (driver #31)P93

○[Forge] CI: Experiment claim driver — pick high-IIG experiments for executionP93

○[Forge] CI: Paper replication target selectorP91

○[Forge] Artifact enrichment quest — evaluation context, cross-links, provenanceP82

○[Forge] Reduce PubMed metadata backlog for papers missing abstractsP82

○[Forge] CI: Test all scientific tools for availabilityP78

○[Forge] Execute: testes-gonadal RNA-seq experiment 5b0bb7afP70

○[Forge] Dedup scan every 6hP60

[Forge] Benchmark quality signal pipeline — link 227 scored submissions to hypotheses, surface overall leaderboard, update hypothesis benchmark ranks done

Completion Notes

Git Commits (3)

Sibling Tasks in Quest (Forge) ↗