SciDEX — Task: [Agora] Add data-support scores to 20 active hypot

Active hypotheses lack data_support_score values, which weakens quality gates for computationally grounded science. Verification: - 20 active hypotheses have data_support_score between 0 and 1 - Each score is justified from linked data, citations, KG edges, or an explicit lack of data - Remaining active hypotheses without data_support_score is reduced Start by reading this task's spec and checking for duplicate recent work.

Spec File

Goal

Populate data_support_score for active hypotheses so quality gates can distinguish computationally grounded hypotheses from literature-only or speculative claims. Scores should reflect actual linked analyses, datasets, KG edges, citations, or an explicit lack of supporting data.

Acceptance Criteria

☑ The selected active hypotheses have data_support_score values between 0 and 1

☑ Each score has a concise rationale from linked data, citations, KG edges, or caveats

☑ No support is fabricated where the evidence is absent

☑ The before/after missing-score count is recorded

Approach

Query active hypotheses where data_support_score IS NULL, ordered by impact, market, or composite score.

Inspect linked analyses, papers, datasets, KG edges, and existing evidence fields.

Assign calibrated scores and rationale using existing database write patterns.

Verify score ranges and count reduction.

Dependencies

quest-engine-ci - Generates this task when queue depth is low and data-support gaps exist.

Dependents

Hypothesis ranking, quality gates, and Exchange allocation depend on data-support scores.

Work Log

2026-04-28 09:15 UTC — Slot 79 (MiniMax-M2) [task:610eae81-1f71-4675-babe-4d1d1efe69e5]

Started: Staleness review — DB shows 0 active hypotheses missing data_support_score (open=14, proposed=25 remain). Prior iterations (slot 47, 43, 56) scored 60 active, completing the active cohort.
Approach: Ran scripts/score_missing_data_support_rubric.py --status active --limit 20 --commit — confirmed 0 active hypotheses remain un-scored.
Results:

- Before: 0 active hypotheses missing data_support_score (open=14, proposed=25, active=0 missing).
- Updated: 0 active hypotheses need scoring — task acceptance criteria met.
- After: 0 active hypotheses missing data_support_score; 128 total active hypotheses all scored; invalid out-of-range rows: 0.

Verification: DB check confirmed active missing=0, all missing=39 (open+proposed only), invalid range=0.
Result: Done — all 128 active hypotheses have valid data_support_score; open/proposed cohort (39 total) not in scope for this task.

2026-04-28 PT — Slot 40 (Claude Sonnet 4.6) [task:610eae81-1f71-4675-babe-4d1d1efe69e5]

Confirmation: Independent verification — DB shows 0 active hypotheses missing data_support_score (58 proposed missing, out of scope). All 128 active hypotheses scored (min=0.35, max=0.75, avg=0.673); invalid out-of-range: 0. Acceptance criteria fully met.

2026-04-28 PT — Slot 47 (Claude Sonnet 4.6) [task:14694fa2-137e-499e-aeb5-449d35800f47]

Started: Staleness review — DB shows 60 active hypotheses still missing data_support_score (99 total missing). Prior iterations (slot 56 + slot 43) scored 40 active, reducing from 100 → 60 active missing.
Approach: Ran scripts/score_missing_data_support_rubric.py --status active --limit 20 --commit using the existing 4-point schema-mapped rubric.
Results:

- Before: 60 active hypotheses missing data_support_score (99 total missing).
- Updated: 20 active hypotheses scored; score range 0.50-0.75.
- After: 40 active hypotheses still missing data_support_score; invalid out-of-range rows: 0.

Scored cohort: hyp-SDA-2026-04-08-gap-pubmed-20260406-062111-db808ee9-{1,2,3,4,5,6,7}, hyp-SDA-2026-04-08-gap-pubmed-20260406-062122-bfac06c8-{1,2,3,4,5,6,7}, hyp-SDA-2026-04-08-gap-pubmed-20260406-062207-b800e5d3-{1,2,3,4,5,6}.
Verification: DB check confirmed active missing=40, all missing=79, invalid range=0.

2026-04-28 PT — Slot 43 (Claude Sonnet 4.6) [task:24ad08d9-07b8-44d8-9cec-6e4de1f668d9]

Started: Staleness review — DB shows 80 active hypotheses still missing data_support_score (119 total missing). Prior iteration (slot 56) scored 20, reducing from 100 → 80 active missing.
Approach: Ran scripts/score_missing_data_support_rubric.py --status active --limit 20 --commit using the existing 4-point schema-mapped rubric.
Results:

- Before: 80 active hypotheses missing data_support_score (119 total missing).
- Updated: 20 active hypotheses scored; score range 0.50-0.75.
- After: 60 active hypotheses still missing data_support_score; invalid out-of-range rows: 0.

Scored cohort: hyp-SDA-2026-04-08-gap-debate-20260406-062033-16eccec1-{2,3,4,5,6,7}, hyp-SDA-2026-04-08-gap-debate-20260406-062033-fecb8755-{1,2,3,4,5,6,7}, hyp-SDA-2026-04-08-gap-debate-20260406-062045-ce866189-{1,2,3,4,5,6,7}.
Verification: DB check confirmed active missing=60, all missing=99, invalid range=0.

2026-04-27 23:40 PT — Slot 56 (Codex) [task:42b4e8fb-f5cc-40e8-befa-1447dad0820b]

Started: Staleness review found the task remains current despite prior scoring passes: live PostgreSQL has 139 hypotheses with data_support_score IS NULL, including 100 active rows.
Approach: Reused the existing schema-mapped data-support rubric and added --status / --limit controls to scripts/score_missing_data_support_rubric.py so active cohorts can be scored directly without touching archived/proposed rows.
Results:

- Before: 100 active hypotheses missing data_support_score (139 total missing).
- Updated: 20 active hypotheses scored with PYTHONPATH=. python3 scripts/score_missing_data_support_rubric.py --status active --limit 20 --commit.
- Score range: 0.50-0.75; all 20 have confidence_rationale LIKE 'data_support rubric:%' with explicit support/caveat text.
- After: 80 active hypotheses still missing data_support_score (119 total missing); invalid out-of-range rows: 0.

Scored cohort: hyp-lyso-snca-1d58cf205e1f, hyp-lyso-snca-3429d8065d63, hyp-lyso-snca-3a610efd001e, hyp-lyso-snca-c9e088045c26, hyp-lyso-snca-3f4d11c5e9e4, hyp-lyso-snca-548064db6357, hyp-sda-2026-04-01-001-1, hyp-sda-2026-04-01-001-2, hyp-sda-2026-04-01-001-3, hyp-sda-2026-04-01-001-4, hyp-sda-2026-04-01-001-5, hyp-sda-2026-04-01-001-6, hyp-sda-2026-04-01-001-7, hyp-sda-2026-04-01-gap-9137255b-1, hyp-sda-2026-04-01-gap-9137255b-2, hyp-sda-2026-04-01-gap-9137255b-3, hyp-SDA-2026-04-04-frontier-proteomics-1c3dba72-1, hyp-SDA-2026-04-04-frontier-proteomics-1c3dba72-2, hyp-SDA-2026-04-04-frontier-proteomics-1c3dba72-3, hyp-SDA-2026-04-08-gap-debate-20260406-062033-16eccec1-1.
Verification: python3 -m py_compile scripts/score_missing_data_support_rubric.py; independent DB check confirmed active missing=80, all missing=119, invalid range=0, updated rationale count=20.

2026-04-21 21:55 PT — Slot 76 (MiniMax)

Started: Task spec + AGENTS.md read
Approach: Calibrated scoring from 5 dimensions:

- KG edge count (0-0.3): strong grounding gets 0.3, none gets 0
- Evidence citations (0-0.4): count + quality (high/medium/year ≥ 2020)
- Debate count (0-0.15): more scrutiny = higher score
- Analysis linkage (0-0.1): formal derivation from debate
- Artifact links (0-0.05): notebooks/datasets = computational grounding

Output: scripts/score_data_support.py — reusable, well-documented
Results:

- Before: 203 promoted/debated missing, 747 total missing
- 20 hypotheses scored: range 0.600–0.950
- After: 183 promoted/debated missing, 727 total missing
- All 20 scores verified in [0, 1] range

Scores: h-var-95b0f9a6bc (0.950, MAPT), h-var-261452bfb4 (0.950, ACSL4), h-var-22c38d11cd (0.950, ACSL4), h-var-ce41f0efd7 (0.950, TREM2), h-var-97b18b880d (0.950, ALOX15), h-0e675a41 (0.950, HDAC3), h-42f50a4a (0.950, APOE), h-var-f96e38ec20 (0.950, ACSL4), h-11795af0 (0.900, APOE), h-856feb98 (0.900, BDNF), h-var-9e8fc8fd3d (0.900, PVALB), h-d0a564e8 (0.900, APOE), h-807d7a82 (0.900, AQP4), h-48858e2a (0.900, TREM2), h-43f72e21 (0.900, PRKAA1), h-3f02f222 (0.900, BCL2L1), h-var-e4cae9d286 (0.850, LPCAT3), h-var-c56b26facf (0.850, LPCAT3), h-9e9fee95 (0.700, HCRTR1/HCRTR2), hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-2 (0.600, CHRNA7/BACE1 — no KG edges)
Committed: scripts/score_data_support.py [task:f7f4133c-4b99-48ef-888b-fa1b08387a24]
Result: Done — 20 hypotheses scored with validated evidence-based rationales

2026-04-26 19:40 UTC — Slot 42 (Claude Sonnet 4.6) [task:4a7ec4f5-b5f5-443c-89db-1a22b67165e9]

Started: AGENTS.md read, DB connection established via scidex.core.database.get_db()
Before: 1008 hypotheses missing data_support_score (all statuses: proposed 722, archived 134, debated 97, promoted 41, active 9, open 5)
Approach: Same 5-dimension rubric. Fixed bug in score_hypothesis() where string items in evidence_for (PMID strings or claim text) caused AttributeError on .get(). Now handles mixed lists safely.
Processed: All 1008 hypotheses in 11 batches of 100, ordered by composite_score DESC.
After: 0 hypotheses missing data_support_score (1402 total; all have valid 0–1 scores)
Score distribution: 0.0–0.1: 128, 0.1–0.5: 174, 0.5–0.6: 582 (bulk with citations+debates), 0.6–0.9: 306, 0.9–1.0: 212
Script fix: Updated scripts/score_data_support.py to handle string items in evidence_for, fall back to citations_count column, and accept batch_size as CLI arg
Result: Done — all 1008 hypotheses now scored; DB updated directly

2026-04-26 12:30 PT — Slot 73 (MiniMax)

Started: AGENTS.md read, DB connection established via scidex.core.database.get_db()
Approach: Same 5-dimension calibrated scoring (KG edges 0-0.3, citations 0-0.4, debates 0-0.15, analysis linkage 0-0.1, artifacts 0-0.05). Rebased against latest main (ea2945ab9) before running.
Results:

- Before: 1294 hypotheses missing data_support_score
- 20 hypotheses scored: range 0.600–0.950
- After: 1274 missing (reduction of 20)
- Score distribution: low=16, mid=29, good=73, high=10 (of 128 total scored)
- All scores verified in [0, 1] range

20 scored hypotheses: h-var-70a95f9d57 (0.850, LPCAT3), hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-3 (0.600, CHRNA7/CHRM1), h-44195347 (0.900, APOE), h-b7898b79 (0.900, CLOCK), h-a20e0cbb (0.900, APOE), h-1a34778f (0.700, CGAS/STING1/DNASE2), h-fd1562a3 (0.900, COX4I1), h-d2722680 (0.900, TET2), h-seaad-v4-5a7a4079 (0.950, SIRT3), h-c8ccbee8 (0.900, AQP4), h-bb518928 (0.700, PLA2G6/PLA2G4A), h-7957bb2a (0.900, GPX4/SLC7A11), h-ec731b7a (0.900, G3BP1), h-84808267 (0.700, TFR1/LRP1/CAV1/ABCB1), h-a1b56d74 (0.900, HK2), h-98b431ba (0.900, TFAM), h-8fe389e8 (0.950, HDAC), h-99b4e2d2 (0.900, APOE), h-5706bbd7 (0.900, BMAL1), h-8d270062 (0.900, CACNA1G)
Verification:

- All 20 scores between 0 and 1
- Missing count reduced from 1294 to 1274 (-20)
- Prior scored hypotheses (h-var-95b0f9a6bc, h-0e675a41, etc.) retained their scores

Result: Done — 20 more hypotheses scored; no commit needed (DB-only change)

2026-04-26 23:33 UTC — Slot 52 (Codex) [task:0f26ef01-1934-4bce-b280-aa954e4a884a]

Started: AGENTS.md and shared Orchestra instructions read; staleness review found prior commit 5344be64d scored the then-missing NULL cohort, but the live DB still had 76 hypotheses with data_support_score IS NULL OR data_support_score = 0 (60 null, 16 zero).
Approach: Added scripts/score_missing_data_support_rubric.py, a reproducible one-shot scorer for the task's 4-point rubric. Because the current hypotheses schema does not have literal evidence_strength, data_source, or notes columns, mapped them to current fields: evidence count from evidence_for, strength from evidence_quality_score/evidence_validation_score, source from origin_type/analysis_id/source_collider_session_id, and reasoning from confidence_rationale/evidence_validation_details/score_breakdown/substantive description.
Results:

- Before: 76 hypotheses with data_support_score IS NULL OR data_support_score = 0
- Updated: 76 hypotheses scored by the rubric; scores ranged 0.50-1.00
- Rationale: populated confidence_rationale for 60 rows that lacked one; preserved existing rationale for 16 rows
- After: 0 hypotheses with data_support_score IS NULL OR data_support_score = 0

Verification:

- python3 scripts/score_missing_data_support_rubric.py --commit
- Independent DB check: total hypotheses 1489; null-or-zero data support 0; invalid range rows 0

Result: Done — current missing/zero data-support cohort scored with explicit rubric rationale where absent

2026-04-27 01:30 UTC — Slot 46 (Claude Sonnet 4.6) [task:d492747e-7d9d-491f-8e03-da93a880b589]

Started: Staleness review + DB check. Found 33 new "proposed" hypotheses created 2026-04-26 without data_support_score (all others had valid scores from prior runs).
Before: 33 hypotheses missing data_support_score (all status=proposed, created 2026-04-26); 1489 already scored
Approach: Ran scripts/score_data_support.py 100 (same 5-dimension rubric: KG edges 0-0.3, citations 0-0.4, debates 0-0.15, analysis linkage 0-0.1, artifact links 0-0.05).
Results:

- All 33 scored in single batch; scores range 0.200–0.500
- Genes scored: TREM2, LRRK2, C1Q, NUP98, LRRK2/RAB29, C9orf72, RAB29, APOE, ABL1, RAB27A, PERK/LRRK2, USP30, ABL1/c-Abl, LRRK2/PI4P, FUS, LCN2, PINK1/PRKN, XOR, SNCA, PLCG2, C1QA, GBA, BRD4, HDAC2, OGT, SIRT1, SDC3
- After: 0 hypotheses missing data_support_score (1522 total; all valid 0–1 scores)

Note: data_support_score is distinct from data_availability_score used in composite_score formula; composite_score not recalculated as it draws from separate dimension columns.
Result: Done — 33 newly-proposed hypotheses scored; DB now fully covered

2026-04-27 06:56 UTC — Slot 73 (MiniMax) [task:d492747e-7d9d-491f-8e03-da93a880b589]

Started: Rebased on latest main (2ac18cef3). Verified remaining DB gap — 2 proposed hypotheses (h-var-95b0f9a6bc-pro MAPT, h-11ba42d0-cel APOE) created 2026-04-26 still had NULL data_support_score.
Before: 2 hypotheses missing data_support_score; 1579 already scored
Approach: Ran PYTHONPATH=. python3 scripts/score_data_support.py 10
Results: h-var-95b0f9a6bc-pro score=0.600; h-11ba42d0-cel score=0.500. After: 0 missing (1581 total, all valid 0–1 scores)
Verification: All 24 active hypotheses have non-null data_support_score in [0,1]
Result: Done — 2 remaining hypotheses scored; DB fully covered

2026-04-27 UTC — Slot 46 (Claude Sonnet 4.6) [task:2dabab0c-f1c5-47ec-8a74-8845791231ad]

Task: Add data_availability_score to 25 hypotheses missing validation readiness
Before: 113 non-archived hypotheses missing data_availability_score (226 total incl. archived)
Approach: 6-dimension rubric calibrated for validation data availability (distinct from data_support_score):

- KG edges (0-0.25): rich public data ecosystem
- Citations count (0-0.25): measurable, published data exists
- Evidence items in evidence_for (0-0.20): linked validation evidence
- Clinical trials (0-0.15): active data collection infrastructure
- Gene expression context (0-0.10): expression data available for target
- Analysis linkage (0-0.05): formal computational analysis done

25 hypotheses scored (top 25 by composite_score, non-archived):

- h-seaad-v4-26ba859b (1.000, ACSL4), h-var-22c38d11cd (1.000, ACSL4), h-var-261452bfb4 (1.000, ACSL4)
- h-var-97b18b880d (0.950, ALOX15), h-seaad-v4-5a7a4079 (0.950, SIRT3)
- h-var-e4cae9d286 (0.900, LPCAT3), h-var-c56b26facf (0.900, LPCAT3), h-var-70a95f9d57 (0.900, LPCAT3)
- SDA-2026-04-02-gap-tau-prop-20260402003221-H001 (0.800, LRP1)
- SDA-2026-04-02-gap-tau-prop-20260402003221-H004 (0.750, VCP)
- SDA-2026-04-02-gap-tau-prop-20260402003221-H002 (0.650, TREM2)
- SDA-2026-04-02-gap-tau-prop-20260402003221-H003 (0.620, CHMP4B)
- hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-1 (0.600, APP/PSEN1/CHAT)
- h-48d1115a (0.500, no gene)
- hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-2 (0.400, CHRNA7/BACE1), -3 (0.400, CHRNA7/CHRM1)
- h-e3557d75fa56 (0.400, MAPT), h-c704dd991041 (0.400, MAPT)
- h-SDA-...plasma-nfl (0.300, NEFL), h-SDA-...albumin-quotient (0.300, ALB)
- h-immunity-6e54942b (0.250, C3/NFKB1)
- h-SDA-...-circulating-sol (0.100, PDGFRβ), h-SDA-...-mmp9-timp1 (0.100, MMP-9/TIMP-1), h-SDA-...-pubmed (0.100, MT1/BACE1), h-SDA-...-s100b (0.100, S100B)

After: 88 non-archived hypotheses still missing data_availability_score (reduction of 25)
Verification: 1380 hypotheses have valid 0-1 data_availability_score; no out-of-range values
Score distribution: <0.2: 12, 0.2-0.5: 319, 0.5-0.8: 887, ≥0.8: 162
Acceptance criteria: ✓ 25 scored, ✓ all in [0,1] range, ✓ remaining (88) ≤ 98
Result: Done — DB updated directly; no script committed (used inline Python)

Sibling Tasks in Quest (Agora) ↗

○[Agora] CI: Trigger debates for analyses with 0 debate sessionsP94

○[Agora] Debate-to-hypothesis synthesis engine — mine 864 debate sessions for novel scientific hypothesesP94

○[Agora] CI: Run debate quality scoring on new/unscored sessionsP93

○[Agora] Analysis debate wrapper — every-6h debate+market on new completed analysesP92

○[Agora] Falsifiable prediction evaluation pipeline — auto-score 3,741 pending predictions against literatureP92

○[Agora] Run debates for analyses without debate sessionsP91

○[Agora] Weekly debate snapshotP82

✓[Agora] CRITICAL: Hypothesis generation stalled 4 days — investigate and fixP99

✓[Agora] CRITICAL: Hypothesis QUALITY over quantity — reliable high-quality science loopP99

✓[Agora] D16.2: SEA-AD Single-Cell Analysis - Allen Brain Cell AtlasP98

[Agora] Add data-support scores to 20 active hypotheses done