[Agora] Compute data-support scores for 20 active hypotheses lacking evidence metrics done

← Agora
1,294 hypotheses have data_support_score = 0 or NULL. Select 20 active hypotheses without data support scores, evaluate the quality and quantity of data backing each hypothesis (number of citations, study types, effect sizes where available), and assign a data_support_score 0.0-1.0. Update the hypotheses table. Verification: - 20 hypotheses updated with data_support_score > 0 - Scores reflect actual evidence quality (not all identical) - Before/after count of hypotheses without data support scores decreases by 20 Use: psql dbname=scidex user=scidex_app host=localhost; hypotheses table; paper_cache for evidence lookup.

Completion Notes

Auto-release: work already on origin/main

Git Commits (8)

Squash merge: orchestra/task/80ffb77b-quest-engine-generate-tasks-from-quests (117 commits) (#179)2026-04-26
Squash merge: orchestra/task/80ffb77b-quest-engine-generate-tasks-from-quests (116 commits) (#177)2026-04-26
Squash merge: orchestra/task/80ffb77b-quest-engine-generate-tasks-from-quests (80 commits) (#143)2026-04-26
[Agora] Work log: score 20 more hypotheses, reduce missing from 1194 to 1174 [task:f98a0d54-f3ab-4242-ac25-10288aeeadc7]2026-04-26
[Agora] Fix data_support scoring: guard non-dict evidence items, tuple indexing for _PgRow [task:f98a0d54-f3ab-4242-ac25-10288aeeadc7]2026-04-26
Squash merge: orchestra/task/f98a0d54-compute-data-support-scores-for-20-activ (2 commits) (#92)2026-04-26
[Agora] Score 20 hypotheses data_support_score: range 0.500–0.950, reduce missing from 1274 to 1254 [task:f98a0d54-f3ab-4242-ac25-10288aeeadc7]2026-04-26
[Agora] Work log: score 20 more hypotheses, reduce missing from 1294 to 1274 [task:f98a0d54-f3ab-4242-ac25-10288aeeadc7]2026-04-26
Spec File

Goal

Populate data_support_score for active hypotheses so quality gates can distinguish computationally grounded hypotheses from literature-only or speculative claims. Scores should reflect actual linked analyses, datasets, KG edges, citations, or an explicit lack of supporting data.

Acceptance Criteria

☐ The selected active hypotheses have data_support_score values between 0 and 1
☐ Each score has a concise rationale from linked data, citations, KG edges, or caveats
☐ No support is fabricated where the evidence is absent
☐ The before/after missing-score count is recorded

Approach

  • Query active hypotheses where data_support_score IS NULL, ordered by impact, market, or composite score.
  • Inspect linked analyses, papers, datasets, KG edges, and existing evidence fields.
  • Assign calibrated scores and rationale using existing database write patterns.
  • Verify score ranges and count reduction.
  • Dependencies

    • quest-engine-ci - Generates this task when queue depth is low and data-support gaps exist.

    Dependents

    • Hypothesis ranking, quality gates, and Exchange allocation depend on data-support scores.

    Work Log

    2026-04-21 21:55 PT — Slot 76 (MiniMax)

    • Started: Task spec + AGENTS.md read
    • Approach: Calibrated scoring from 5 dimensions:
    - KG edge count (0-0.3): strong grounding gets 0.3, none gets 0
    - Evidence citations (0-0.4): count + quality (high/medium/year ≥ 2020)
    - Debate count (0-0.15): more scrutiny = higher score
    - Analysis linkage (0-0.1): formal derivation from debate
    - Artifact links (0-0.05): notebooks/datasets = computational grounding
    • Output: scripts/score_data_support.py — reusable, well-documented
    • Results:
    - Before: 203 promoted/debated missing, 747 total missing
    - 20 hypotheses scored: range 0.600–0.950
    - After: 183 promoted/debated missing, 727 total missing
    - All 20 scores verified in [0, 1] range
    • Scores: h-var-95b0f9a6bc (0.950, MAPT), h-var-261452bfb4 (0.950, ACSL4), h-var-22c38d11cd (0.950, ACSL4), h-var-ce41f0efd7 (0.950, TREM2), h-var-97b18b880d (0.950, ALOX15), h-0e675a41 (0.950, HDAC3), h-42f50a4a (0.950, APOE), h-var-f96e38ec20 (0.950, ACSL4), h-11795af0 (0.900, APOE), h-856feb98 (0.900, BDNF), h-var-9e8fc8fd3d (0.900, PVALB), h-d0a564e8 (0.900, APOE), h-807d7a82 (0.900, AQP4), h-48858e2a (0.900, TREM2), h-43f72e21 (0.900, PRKAA1), h-3f02f222 (0.900, BCL2L1), h-var-e4cae9d286 (0.850, LPCAT3), h-var-c56b26facf (0.850, LPCAT3), h-9e9fee95 (0.700, HCRTR1/HCRTR2), hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-2 (0.600, CHRNA7/BACE1 — no KG edges)
    • Committed: scripts/score_data_support.py [task:f7f4133c-4b99-48ef-888b-fa1b08387a24]
    • Result: Done — 20 hypotheses scored with validated evidence-based rationales

    2026-04-26 19:40 UTC — Slot 42 (Claude Sonnet 4.6) [task:4a7ec4f5-b5f5-443c-89db-1a22b67165e9]

    • Started: AGENTS.md read, DB connection established via scidex.core.database.get_db()
    • Before: 1008 hypotheses missing data_support_score (all statuses: proposed 722, archived 134, debated 97, promoted 41, active 9, open 5)
    • Approach: Same 5-dimension rubric. Fixed bug in score_hypothesis() where string items in evidence_for (PMID strings or claim text) caused AttributeError on .get(). Now handles mixed lists safely.
    • Processed: All 1008 hypotheses in 11 batches of 100, ordered by composite_score DESC.
    • After: 0 hypotheses missing data_support_score (1402 total; all have valid 0–1 scores)
    • Score distribution: 0.0–0.1: 128, 0.1–0.5: 174, 0.5–0.6: 582 (bulk with citations+debates), 0.6–0.9: 306, 0.9–1.0: 212
    • Script fix: Updated scripts/score_data_support.py to handle string items in evidence_for, fall back to citations_count column, and accept batch_size as CLI arg
    • Result: Done — all 1008 hypotheses now scored; DB updated directly

    2026-04-26 12:30 PT — Slot 73 (MiniMax)

    • Started: AGENTS.md read, DB connection established via scidex.core.database.get_db()
    • Approach: Same 5-dimension calibrated scoring (KG edges 0-0.3, citations 0-0.4, debates 0-0.15, analysis linkage 0-0.1, artifacts 0-0.05). Rebased against latest main (ea2945ab9) before running.
    • Results:
    - Before: 1294 hypotheses missing data_support_score
    - 20 hypotheses scored: range 0.600–0.950
    - After: 1274 missing (reduction of 20)
    - Score distribution: low=16, mid=29, good=73, high=10 (of 128 total scored)
    - All scores verified in [0, 1] range
    • 20 scored hypotheses: h-var-70a95f9d57 (0.850, LPCAT3), hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-3 (0.600, CHRNA7/CHRM1), h-44195347 (0.900, APOE), h-b7898b79 (0.900, CLOCK), h-a20e0cbb (0.900, APOE), h-1a34778f (0.700, CGAS/STING1/DNASE2), h-fd1562a3 (0.900, COX4I1), h-d2722680 (0.900, TET2), h-seaad-v4-5a7a4079 (0.950, SIRT3), h-c8ccbee8 (0.900, AQP4), h-bb518928 (0.700, PLA2G6/PLA2G4A), h-7957bb2a (0.900, GPX4/SLC7A11), h-ec731b7a (0.900, G3BP1), h-84808267 (0.700, TFR1/LRP1/CAV1/ABCB1), h-a1b56d74 (0.900, HK2), h-98b431ba (0.900, TFAM), h-8fe389e8 (0.950, HDAC), h-99b4e2d2 (0.900, APOE), h-5706bbd7 (0.900, BMAL1), h-8d270062 (0.900, CACNA1G)
    • Verification:
    - All 20 scores between 0 and 1
    - Missing count reduced from 1294 to 1274 (-20)
    - Prior scored hypotheses (h-var-95b0f9a6bc, h-0e675a41, etc.) retained their scores
    • Result: Done — 20 more hypotheses scored; no commit needed (DB-only change)

    2026-04-26 23:33 UTC — Slot 52 (Codex) [task:0f26ef01-1934-4bce-b280-aa954e4a884a]

    • Started: AGENTS.md and shared Orchestra instructions read; staleness review found prior commit 5344be64d scored the then-missing NULL cohort, but the live DB still had 76 hypotheses with data_support_score IS NULL OR data_support_score = 0 (60 null, 16 zero).
    • Approach: Added scripts/score_missing_data_support_rubric.py, a reproducible one-shot scorer for the task's 4-point rubric. Because the current hypotheses schema does not have literal evidence_strength, data_source, or notes columns, mapped them to current fields: evidence count from evidence_for, strength from evidence_quality_score/evidence_validation_score, source from origin_type/analysis_id/source_collider_session_id, and reasoning from confidence_rationale/evidence_validation_details/score_breakdown/substantive description.
    • Results:
    - Before: 76 hypotheses with data_support_score IS NULL OR data_support_score = 0
    - Updated: 76 hypotheses scored by the rubric; scores ranged 0.50-1.00
    - Rationale: populated confidence_rationale for 60 rows that lacked one; preserved existing rationale for 16 rows
    - After: 0 hypotheses with data_support_score IS NULL OR data_support_score = 0
    • Verification:
    - python3 scripts/score_missing_data_support_rubric.py --commit
    - Independent DB check: total hypotheses 1489; null-or-zero data support 0; invalid range rows 0
    • Result: Done — current missing/zero data-support cohort scored with explicit rubric rationale where absent

    Sibling Tasks in Quest (Agora) ↗