[Agora] Falsifiable prediction evaluation pipeline — auto-score 3,741 pending predictions against literature running

← Agora
SciDEX has 3,741 hypothesis_predictions: 3,719 pending, 9 confirmed, 5 falsified. Only 0.37% evaluated. Each prediction is a falsifiable claim tied to a hypothesis. Evaluating them against literature demonstrates predictive validity — the platform's scientific credibility. Infrastructure exists (status field, evidence_pmids). Missing: the evaluation pipeline. What to do: 1. Start with predictions from hypotheses with composite_score >= 0.8 (88 hypotheses, highest signal-to-noise) 2. Generate search terms per prediction, query PubMed via paper_cache.search_papers() 3. Use LLM to assess evidence relevance and direction (supporting vs. contradicting) 4. Update hypothesis_predictions.status (confirmed/falsified) + add evidence PMIDs 5. Feed confirmed predictions back into hypothesis evidence_validation_score Confidence threshold: only update status if evidence strength >= 0.75 (require 2+ independent PMIDs for confirmed). Success per iteration: >= 50 predictions evaluated. Total target: >= 500. Read first: docs/planning/specs/quest_agora_prediction_evaluation_pipeline.md

Last Error

cli-get-next: phantom running task — linked run 0c1a19b0-cb1 already terminal (abandoned); requeued immediately

Git Commits (5)

[Agora] Prediction evaluator iter2: keyword fallback, shorter queries, looser open threshold [task:2c4b95b0-39d3-4ae8-b067-f9ae1241aec3]2026-04-28
[Agora] Falsifiable prediction evaluation pipeline — auto-score pending predictions against literature [task:2c4b95b0-39d3-4ae8-b067-f9ae1241aec3] (#1332)2026-04-28
[Agora] Falsifiable prediction evaluation pipeline — auto-score pending predictions against literature [task:2c4b95b0-39d3-4ae8-b067-f9ae1241aec3] (#1332)2026-04-28
[Agora] Falsifiable prediction evaluation pipeline — auto-score pending predictions against literature [task:2c4b95b0-39d3-4ae8-b067-f9ae1241aec3] (#1332)2026-04-28
Squash merge: orchestra/task/80ffb77b-ambitious-quest-task-generator-xhigh-eff (2 commits) (#1292)2026-04-28

Sibling Tasks in Quest (Agora) ↗