[Atlas] Spawn open questions from falsified hypotheses + market-resolved misses done

← Open Questions as Ranked Artifacts
When a hypothesis is falsified or a market resolves against consensus, generate successor open_questions seeded with parent Elo+50.

Completion Notes

Auto-release: work already on origin/main

Git Commits (2)

Squash merge: orchestra/task/a4c450f7-biomni-analysis-parity-port-15-use-cases (87 commits) (#717)2026-04-27
[Atlas] Spawn open questions from falsified hypotheses + market misses [task:bbe35802-07b1-4bcc-8b0f-bd0c33a2cf41] (#672)2026-04-27
Spec File

Goal

When a hypothesis prediction is falsified or a prediction market resolves
against the consensus, the most valuable downstream artifact is the new
question that the falsification opens. ("If APOE4 → tau spread is wrong, what
DOES drive tau spread in the absence of APOE4?") Today, falsifications close
the hypothesis loop without spawning any new search frontier. This task wires
a falsification → open_question generator so every refuted prediction becomes
a ranked question for the field.

Acceptance Criteria

☐ New module scidex/agora/open_question_from_falsified.py (≤400 LoC).
☐ Triggers on two event sources:
- hypothesis_predictions.outcome_status flipping to
falsified / unsupported (poll table; no event_bus dependency).
- markets.resolution_status='resolved' with the resolution disagreeing
with the pre-resolution consensus probability by ≥0.4.
☐ For each trigger, calls scidex.core.llm.complete to ask a
domain-expert persona (selected from DOMAIN_JUDGES in
scidex/agora/open_question_tournament.py) to generate 1-3 successor
questions that the falsification specifically motivates. Each must
include a verbatim "what we now know" + "what we still don't know"
contrast.
☐ Emits open_question artifacts with:
- metadata.source_kind='falsified_prediction' or 'market_resolution'
- metadata.parent_hypothesis_id set
- metadata.field_tag inherited from the parent hypothesis
- metadata.importance_elo seeded at the parent hypothesis's prior
Elo + 50 (falsifications point at the most-bet-on questions, so they
deserve a head start in the per-field tournament).
☐ Cross-links via artifact_links: link_type='answered_by' (falsified
hypothesis → new question), and the new question gets a
link_type='succeeds' edge back.
☐ Acceptance backfill: process the existing falsified hypotheses
(SELECT id FROM hypotheses WHERE status='falsified') and emit ≥30
new open_question artifacts; assert in test.
☐ Idempotent: rerunning over the same falsification set creates 0 new
questions (dedup via question_hash).
☐ Pytest harness mocks the LLM and verifies link-graph shape, Elo seeding,
and dedup behavior.

Approach

  • Inspect hypothesis_predictions schema in scidex/core/database.py and
  • markets table in exchange.py for resolution columns.
  • Reuse persona selection logic from
  • scidex/agora/open_question_tournament.py DOMAIN_JUDGES.
  • Run as a daily systemd timer (scidex-openq-falsified.timer); single-shot
  • for backfill.
  • Write resulting question count + sample to
  • data/scidex-artifacts/reports/openq_falsified_<utc>.json.

    Dependencies

    • q-openq-mine-from-wiki-pages — shared dedup util
    • b2d85e76-51f3 — open_question schema
    • 47ee9103-ccc0 — Elo tournament reads seeded importance_elo

    Work Log

    2026-04-27 — Implementation (task:bbe35802)

    Schema findings:

    • hypothesis_predictions.status (not outcome_status) holds 'falsified'/'unsupported'. 5 falsified rows exist (test data).
    • hypotheses.status='falsified' — 0 rows currently; backfill handles both paths.
    • markets uses current_price + resolution_price; consensus fallback via metadata['consensus_probability'].
    Deliverables:
    • scidex/agora/open_question_from_falsified.py (378 LoC):
    - process_falsified_prediction / process_market_resolution per-source processors
    - run_poll(since_hours) / run_backfill(limit) batch runners
    - Idempotency: source_id + source_kind guard prevents re-processing
    - Dedup: SimHash question_hash from open_question_miner_wiki
    - Elo seeding: parent_hypothesis_elo + 50; answered_by + succeeds links
    - DOMAIN_JUDGES persona selection from open_question_tournament
    • tests/agora/test_open_question_from_falsified.py — 19 tests all pass:
    field-tag inference, heuristic stub, dedup, link-graph shape, idempotency,
    market divergence gate, backfill≥30 artifacts, Elo boost assertion
    • deploy/scidex-openq-falsified.{service,timer} — daily at 03:00 UTC

    Payload JSON
    {
      "completion_shas": [
        "15fa5de4c5dee93403dfb7df8cb37b372880e41f"
      ],
      "completion_shas_checked_at": ""
    }

    Sibling Tasks in Quest (Open Questions as Ranked Artifacts) ↗