SciDEX — Task: [Agora] Falsifiable prediction evaluation pipeline

SciDEX has 3,741 hypothesis_predictions: 3,719 pending, 9 confirmed, 5 falsified. Only 0.37% evaluated. Each prediction is a falsifiable claim tied to a hypothesis. Evaluating them against literature demonstrates predictive validity — the platform's scientific credibility. Infrastructure exists (status field, evidence_pmids). Missing: the evaluation pipeline. What to do: 1. Start with predictions from hypotheses with composite_score >= 0.8 (88 hypotheses, highest signal-to-noise) 2. Generate search terms per prediction, query PubMed via paper_cache.search_papers() 3. Use LLM to assess evidence relevance and direction (supporting vs. contradicting) 4. Update hypothesis_predictions.status (confirmed/falsified) + add evidence PMIDs 5. Feed confirmed predictions back into hypothesis evidence_validation_score Confidence threshold: only update status if evidence strength >= 0.75 (require 2+ independent PMIDs for confirmed). Success per iteration: >= 50 predictions evaluated. Total target: >= 500. Read first: docs/planning/specs/quest_agora_prediction_evaluation_pipeline.md

Last Error

watchdog: worker lease expired; requeued

Git Commits (3)

[Agora] Prediction evaluator iter2: keyword fallback, shorter queries, looser open threshold [task:2c4b95b0-39d3-4ae8-b067-f9ae1241aec3] (#1343)2026-04-30

[Agora] Falsifiable prediction evaluation pipeline — auto-score pending predictions against literature [task:2c4b95b0-39d3-4ae8-b067-f9ae1241aec3] (#1332)2026-04-28

Squash merge: orchestra/task/80ffb77b-ambitious-quest-task-generator-xhigh-eff (2 commits) (#1292)2026-04-28

Sibling Tasks in Quest (Agora) ↗

○[Agora] CI: Trigger debates for analyses with 0 debate sessionsP94

○[Agora] Debate-to-hypothesis synthesis engine — mine 864 debate sessions for novel scientific hypothesesP94

○[Agora] CI: Run debate quality scoring on new/unscored sessionsP93

○[Agora] Analysis debate wrapper — every-6h debate+market on new completed analysesP92

○[Agora] Run debates for analyses without debate sessionsP91

○[Agora] Weekly debate snapshotP82

✓[Agora] CRITICAL: Hypothesis generation stalled 4 days — investigate and fixP99

✓[Agora] CRITICAL: Hypothesis QUALITY over quantity — reliable high-quality science loopP99

✓[Agora] D16.2: SEA-AD Single-Cell Analysis - Allen Brain Cell AtlasP98

✓[Agora] D16.3: Aging Mouse Brain Atlas AnalysisP97

[Agora] Falsifiable prediction evaluation pipeline — auto-score 3,741 pending predictions against literature open

Last Error

Git Commits (3)

Sibling Tasks in Quest (Agora) ↗