SciDEX — Task: [Agora] Compute data-support scores for 20 active

1,294 hypotheses have data_support_score = 0 or NULL. Select 20 active hypotheses without data support scores, evaluate the quality and quantity of data backing each hypothesis (number of citations, study types, effect sizes where available), and assign a data_support_score 0.0-1.0. Update the hypotheses table. Verification: - 20 hypotheses updated with data_support_score > 0 - Scores reflect actual evidence quality (not all identical) - Before/after count of hypotheses without data support scores decreases by 20 Use: psql dbname=scidex user=scidex_app host=localhost; hypotheses table; paper_cache for evidence lookup.

Completion Notes

Auto-release: work already on origin/main

Git Commits (8)

Squash merge: orchestra/task/80ffb77b-quest-engine-generate-tasks-from-quests (117 commits) (#179)2026-04-26

Squash merge: orchestra/task/80ffb77b-quest-engine-generate-tasks-from-quests (116 commits) (#177)2026-04-26

Squash merge: orchestra/task/80ffb77b-quest-engine-generate-tasks-from-quests (80 commits) (#143)2026-04-26

[Agora] Work log: score 20 more hypotheses, reduce missing from 1194 to 1174 [task:f98a0d54-f3ab-4242-ac25-10288aeeadc7]2026-04-26

[Agora] Fix data_support scoring: guard non-dict evidence items, tuple indexing for _PgRow [task:f98a0d54-f3ab-4242-ac25-10288aeeadc7]2026-04-26

Squash merge: orchestra/task/f98a0d54-compute-data-support-scores-for-20-activ (2 commits) (#92)2026-04-26

[Agora] Score 20 hypotheses data_support_score: range 0.500–0.950, reduce missing from 1274 to 1254 [task:f98a0d54-f3ab-4242-ac25-10288aeeadc7]2026-04-26

[Agora] Work log: score 20 more hypotheses, reduce missing from 1294 to 1274 [task:f98a0d54-f3ab-4242-ac25-10288aeeadc7]2026-04-26

Spec File

Goal

Populate data_support_score for active hypotheses so quality gates can distinguish computationally grounded hypotheses from literature-only or speculative claims. Scores should reflect actual linked analyses, datasets, KG edges, citations, or an explicit lack of supporting data.

Acceptance Criteria

☐ The selected active hypotheses have data_support_score values between 0 and 1

☐ Each score has a concise rationale from linked data, citations, KG edges, or caveats

☐ No support is fabricated where the evidence is absent

☐ The before/after missing-score count is recorded

Approach

Query active hypotheses where data_support_score IS NULL, ordered by impact, market, or composite score.

Inspect linked analyses, papers, datasets, KG edges, and existing evidence fields.

Assign calibrated scores and rationale using existing database write patterns.

Verify score ranges and count reduction.

Dependencies

quest-engine-ci - Generates this task when queue depth is low and data-support gaps exist.

Dependents

Hypothesis ranking, quality gates, and Exchange allocation depend on data-support scores.

Work Log

2026-04-21 21:55 PT — Slot 76 (MiniMax)

Started: Task spec + AGENTS.md read
Approach: Calibrated scoring from 5 dimensions:

- KG edge count (0-0.3): strong grounding gets 0.3, none gets 0
- Evidence citations (0-0.4): count + quality (high/medium/year ≥ 2020)
- Debate count (0-0.15): more scrutiny = higher score
- Analysis linkage (0-0.1): formal derivation from debate
- Artifact links (0-0.05): notebooks/datasets = computational grounding

Output: scripts/score_data_support.py — reusable, well-documented
Results:

- Before: 203 promoted/debated missing, 747 total missing
- 20 hypotheses scored: range 0.600–0.950
- After: 183 promoted/debated missing, 727 total missing
- All 20 scores verified in [0, 1] range

Scores: h-var-95b0f9a6bc (0.950, MAPT), h-var-261452bfb4 (0.950, ACSL4), h-var-22c38d11cd (0.950, ACSL4), h-var-ce41f0efd7 (0.950, TREM2), h-var-97b18b880d (0.950, ALOX15), h-0e675a41 (0.950, HDAC3), h-42f50a4a (0.950, APOE), h-var-f96e38ec20 (0.950, ACSL4), h-11795af0 (0.900, APOE), h-856feb98 (0.900, BDNF), h-var-9e8fc8fd3d (0.900, PVALB), h-d0a564e8 (0.900, APOE), h-807d7a82 (0.900, AQP4), h-48858e2a (0.900, TREM2), h-43f72e21 (0.900, PRKAA1), h-3f02f222 (0.900, BCL2L1), h-var-e4cae9d286 (0.850, LPCAT3), h-var-c56b26facf (0.850, LPCAT3), h-9e9fee95 (0.700, HCRTR1/HCRTR2), hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-2 (0.600, CHRNA7/BACE1 — no KG edges)
Committed: scripts/score_data_support.py [task:f7f4133c-4b99-48ef-888b-fa1b08387a24]
Result: Done — 20 hypotheses scored with validated evidence-based rationales

2026-04-26 19:40 UTC — Slot 42 (Claude Sonnet 4.6) [task:4a7ec4f5-b5f5-443c-89db-1a22b67165e9]

Started: AGENTS.md read, DB connection established via scidex.core.database.get_db()
Before: 1008 hypotheses missing data_support_score (all statuses: proposed 722, archived 134, debated 97, promoted 41, active 9, open 5)
Approach: Same 5-dimension rubric. Fixed bug in score_hypothesis() where string items in evidence_for (PMID strings or claim text) caused AttributeError on .get(). Now handles mixed lists safely.
Processed: All 1008 hypotheses in 11 batches of 100, ordered by composite_score DESC.
After: 0 hypotheses missing data_support_score (1402 total; all have valid 0–1 scores)
Score distribution: 0.0–0.1: 128, 0.1–0.5: 174, 0.5–0.6: 582 (bulk with citations+debates), 0.6–0.9: 306, 0.9–1.0: 212
Script fix: Updated scripts/score_data_support.py to handle string items in evidence_for, fall back to citations_count column, and accept batch_size as CLI arg
Result: Done — all 1008 hypotheses now scored; DB updated directly

2026-04-26 12:30 PT — Slot 73 (MiniMax)

Started: AGENTS.md read, DB connection established via scidex.core.database.get_db()
Approach: Same 5-dimension calibrated scoring (KG edges 0-0.3, citations 0-0.4, debates 0-0.15, analysis linkage 0-0.1, artifacts 0-0.05). Rebased against latest main (ea2945ab9) before running.
Results:

- Before: 1294 hypotheses missing data_support_score
- 20 hypotheses scored: range 0.600–0.950
- After: 1274 missing (reduction of 20)
- Score distribution: low=16, mid=29, good=73, high=10 (of 128 total scored)
- All scores verified in [0, 1] range

20 scored hypotheses: h-var-70a95f9d57 (0.850, LPCAT3), hyp-SDA-2026-04-12-20260411-082446-2c1c9e2d-3 (0.600, CHRNA7/CHRM1), h-44195347 (0.900, APOE), h-b7898b79 (0.900, CLOCK), h-a20e0cbb (0.900, APOE), h-1a34778f (0.700, CGAS/STING1/DNASE2), h-fd1562a3 (0.900, COX4I1), h-d2722680 (0.900, TET2), h-seaad-v4-5a7a4079 (0.950, SIRT3), h-c8ccbee8 (0.900, AQP4), h-bb518928 (0.700, PLA2G6/PLA2G4A), h-7957bb2a (0.900, GPX4/SLC7A11), h-ec731b7a (0.900, G3BP1), h-84808267 (0.700, TFR1/LRP1/CAV1/ABCB1), h-a1b56d74 (0.900, HK2), h-98b431ba (0.900, TFAM), h-8fe389e8 (0.950, HDAC), h-99b4e2d2 (0.900, APOE), h-5706bbd7 (0.900, BMAL1), h-8d270062 (0.900, CACNA1G)
Verification:

- All 20 scores between 0 and 1
- Missing count reduced from 1294 to 1274 (-20)
- Prior scored hypotheses (h-var-95b0f9a6bc, h-0e675a41, etc.) retained their scores

Result: Done — 20 more hypotheses scored; no commit needed (DB-only change)

2026-04-26 23:33 UTC — Slot 52 (Codex) [task:0f26ef01-1934-4bce-b280-aa954e4a884a]

Started: AGENTS.md and shared Orchestra instructions read; staleness review found prior commit 5344be64d scored the then-missing NULL cohort, but the live DB still had 76 hypotheses with data_support_score IS NULL OR data_support_score = 0 (60 null, 16 zero).
Approach: Added scripts/score_missing_data_support_rubric.py, a reproducible one-shot scorer for the task's 4-point rubric. Because the current hypotheses schema does not have literal evidence_strength, data_source, or notes columns, mapped them to current fields: evidence count from evidence_for, strength from evidence_quality_score/evidence_validation_score, source from origin_type/analysis_id/source_collider_session_id, and reasoning from confidence_rationale/evidence_validation_details/score_breakdown/substantive description.
Results:

- Before: 76 hypotheses with data_support_score IS NULL OR data_support_score = 0
- Updated: 76 hypotheses scored by the rubric; scores ranged 0.50-1.00
- Rationale: populated confidence_rationale for 60 rows that lacked one; preserved existing rationale for 16 rows
- After: 0 hypotheses with data_support_score IS NULL OR data_support_score = 0

Verification:

- python3 scripts/score_missing_data_support_rubric.py --commit
- Independent DB check: total hypotheses 1489; null-or-zero data support 0; invalid range rows 0

Result: Done — current missing/zero data-support cohort scored with explicit rubric rationale where absent

Sibling Tasks in Quest (Agora) ↗

●[Agora] CI: Trigger debates for analyses with 0 debate sessionsP94

○[Agora] CI: Run debate quality scoring on new/unscored sessionsP93

○[Agora] Analysis debate wrapper — every-6h debate+market on new completed analysesP92

○[Agora] Add PubMed evidence to 20 hypotheses lacking citationsP90

○[Agora] Score 20 active hypotheses with composite scoringP90

○[Agora] Add data-support scores to 20 active hypothesesP87

○[Agora] Add cited support to 9 hypotheses missing evidence_forP86

○[Agora] Add counter-evidence reviews to 10 hypotheses missing evidence_againstP85

○[Agora] Score mechanistic plausibility for 20 hypothesesP85

○[Agora] Score novelty for 20 hypotheses lacking original insight framingP85

[Agora] Compute data-support scores for 20 active hypotheses lacking evidence metrics done