Make SciDEX hypotheses GROUNDED IN DATA ANALYSIS, not just literature debates. Computational findings (statistical patterns, genomic correlations, biomarker trends) should feed into the debate engine as structured EVIDENCE, and hypotheses should carry a data_support_score based on how much computational evidence backs them.
forge/computational_analysis.py module performs REAL computations on seed datasets (AD genetic loci, biomarkers, trial failures)data_support_score is calculated and stored for hypotheses when they are savedforge/computational_analysis.pyA new module that:
datasets/)data_support_score calculationIn post_process.py, when hypotheses are saved:
data_support_score = (# computational findings supporting hypothesis) / (total findings) scaled 0-1In agent.py run_debate():
In post_process.py check_evidence_gate():
ad_genetic_risk_loci.csv — seed dataset (already exists)ad_biomarker_registry.csv — seed dataset (already exists)ad_clinical_trial_failures.csv — seed dataset (already exists)data_support_score column already exists in hypotheses table but is NULLforge/computational_analysis.py module with:compute_genetic_risk_analysis(): 8 findings from AD genetic loci (APOE, TREM2, BIN1, ABCA7, CLU, CR1, PICALM, MS4A6A, SORL1)compute_biomarker_analysis(): 7 findings from biomarker registry (CSF panel, plasma biomarkers, PET imaging, p-tau217 specificity)compute_trial_outcomes_analysis(): 6 findings from trial failures (BACE class failure, amyloid antibody pattern, symptomatic vs disease-modifying)get_computational_evidence_for_gene(): filter findings by geneformat_findings_for_prompt(): format as evidence string for Theorist promptcalculate_data_support_score(): compute 0-1 score based on computational evidence citationsagent.py run_debate():post_process.py:data_support_score before hypothesis INSERTdata_support_score to INSERT columnscheck_evidence_gate() to accept computational: source citations alongside PMIDs[Forge] Data-driven hypotheses — computational analysis for evidence grounding [task:ba0513b9-f6f2-4315-a13b-11b60109f9ef]Key changes:
forge/computational_analysis.py (NEW): 21 computational findings from 3 seed datasetsagent.py: injects computational evidence into Theorist promptspost_process.py: calculates and stores data_support_score, accepts computational: citations in evidence gatedocs/planning/specs/ba0513b9_f6f_spec.md (NEW): spec file with approach and work loggenerate_notebook_for_analysis(analysis_id, db_path, out_dir) to gen_notebooks.py_add_computational_evidence_cells() that injects all 21 computational findingspost_process.py: calls generate_notebook_for_analysis() after each analysis is parsedforge/computational_analysis.py:_hypergeometric_pmf/_pvalue(): hypergeometric test (stdlib only)_wilson_ci(): Wilson score 95% confidence interval (stdlib only)compute_pathway_enrichment_analysis(): enrichment of AD pathways (microglial, lipid,compute_mechanism_efficacy_statistics(): per-mechanism trial success rates with CIscompute_cross_domain_analysis() updated to include both new statistical analysesforge/provenance_tracker.py (NEW): full provenance from raw data to hypothesisrecord_hypothesis_provenance(): writes datasets/provenance/<id>.json per hypothesisget_hypothesis_provenance(): reads provenance for any hypothesiscompute_provenance_coverage(): aggregate statistics on provenance coverage
post_process.py: after each hypothesis save, calls record_hypothesis_provenance()run_ad_computational_analysis as a callable Claude tool in agent.py:CLAUDE_TOOLS with full JSON schema (gene_symbol, analysis_type enum)_run_ad_computational_analysis() dispatcher that routes to the 5 analysisrun_ad_computational_analysis(gene_symbol="TREM2")self.tool_functions so call_claude() can dispatch it
migrations/085_hypothesis_dataset_evidence.py): SQL tablehypothesis_dataset_evidence — queryable provenance linking hypotheses to specificcompute_gene_biomarker_cross_correlation() in forge/computational_analysis.py:compute_cross_domain_analysis() as a 4th cross_dataset key.forge/provenance_tracker.py: _write_provenance_to_db() inserts rowshypothesis_dataset_evidence after JSON file write; query_hypotheses_by_dataset()query_hypotheses_by_gene() reverse-lookup helpers added.compute_dataset_cross_correlation tool in agent.py: Theorist can now callcompute_dataset_cross_correlation(gene_symbol="TREM2") to get cross-dataset linkagesforge/computational_analysis.py: 30 findings from 3 datasets; hypergeometric p-valuesdata_support_score: 0.600 with computational evidence, 0.000 withoutCOMPUTATIONAL EVIDENCE header + computational: citations presentclaim (computational:dataset_name) confirmedforge/provenance_tracker.py: record_hypothesis_provenance() + compute_provenance_coverage() available
check_evidence_gate missed Format 1c — dict with claim containing(computational:dataset) text was silently ignored, causing Theorist-style citationspost_process.py (8 lines).{
"_stall_skip_providers": [
"minimax"
],
"_stall_requeued_by": "minimax",
"_stall_requeued_at": "2026-04-14 11:45:16",
"completion_shas": [
"439af56da",
"5cac24083",
"f6f090387",
"2706ed8a6",
"280fcb326"
],
"completion_shas_checked_at": "2026-04-16T23:43:19.039968+00:00",
"completion_shas_missing": [
"dbbd0d5c5f1c9f5fc2bca2081b414d1120fe8833",
"dfa1dd89ee0b07b3b5a9dc693f47b2b6ec4f3083",
"9b64a72d4616317de3f4bb2ad970f4340e3ed010",
"da4b84e68e4e615f0cfc3e8260d3de39c04b5c40",
"15e6394744a34a5a89b6558cb91f7ff4d0f51f72",
"d4bb108c961529777a0ecd01c15d3c9dab116328",
"7dc1f519e4cb816a6b32b08e5a5bfc5999e91391",
"58672c954b18b5c94e08d0fb36b7d4b1e6e08358",
"333807f1a8e1b6d79cd6d95fcef7004742e9032e",
"69cd90b08bb8fb1c371862724fcea34ed345dc32"
],
"requirements": {
"coding": 7,
"reasoning": 7,
"analysis": 8
},
"_stall_skip_at": {},
"_stall_skip_pruned_at": "2026-04-14T10:37:14.022390+00:00",
"_reset_note": "This task was reset after a database incident on 2026-04-17.\n\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\ncorruption. Some work done during Apr 16-17 may have been lost.\n\n**Before starting work:**\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\n\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\nSCIDEX_DB_BACKEND=postgres env var.",
"_reset_at": "2026-04-18T06:29:22.046013+00:00",
"_reset_from_status": "done",
"_watchdog_repair_task_id": "14126bdd-f965-4e6f-9bce-a07e89e8444a",
"_watchdog_repair_created_at": "2026-04-20T23:51:53.727957+00:00"
}