[Exchange] Re-score hypotheses on new evidence

← All Specs

[Exchange] Re-score hypotheses on new evidence

Goal

When a new analysis references an existing hypothesis's target gene, auto re-evaluate. In post_process.py after parsing, check for overlapping target_gene with existing hypotheses. Use Claude to re-score given new evidence. Insert market_transactions row.

Acceptance Criteria

☑ market_transactions has rows.
☑ Hypothesis prices change as new analyses run.

Approach

  • Read AGENTS.md and relevant source files
  • Understand existing code patterns before modifying
  • Implement changes following existing conventions (f-string HTML, SQLite, Bedrock Claude)
  • Test: curl affected pages, verify rendering, run scidex status
  • Commit atomically with descriptive message
  • Work Log

    2026-04-25 22:30 PT — Slot 0

    • Re-validated the task against current origin/main state instead of trusting the older completion log.
    • Confirmed historical score_update rows still exist in PostgreSQL, but reproduced that the current post_process.py code path cannot run on the retired SQLite path because get_db() still calls get_db_write(DB_PATH) where DB_PATH is .sqlite-retired.
    • Identified a second runtime defect in the rescore path: log_edit() references before_state without defining it locally.
    • Plan: route post_process.get_db() through the shared default PostgreSQL helper, define the pre-update state for journaling, and verify a controlled rescore writes a fresh market_transactions score_update row and updates hypothesis price.

    2026-04-25 22:15 PT — Slot 0 — Verification

    • Patched post_process.get_db() to use the shared PostgreSQL default instead of the retired .sqlite-retired path.
    • Patched save_hypothesis_version() for the current PostgreSQL hypothesis_versions schema and explicit id allocation so version snapshots no longer fail on stale sequence state.
    • Added before_state initialization and per-hypothesis db.rollback() on errors in rescore_overlapping_hypotheses() so one bad candidate does not abort the rest of the batch.
    • Deterministic live verification against SDA-2026-04-26-gap-pubmed-20260410-181340-8acb24dc-debate with the Claude call stubbed returned rescored=6, error_lines=0, and increased market_transactions.action='score_update' rows from 53366 to 53372.
    • Latest verification write: hypothesis h-15b0ade220, old_price=0.57, new_price=0.57, created at 2026-04-25 22:14:03 PT.

    2026-04-01 19:50 PT — Slot 0 — COMPLETE

    • Analysis Phase:
    - Read AGENTS.md, post_process.py, agent.py for architecture understanding
    - Inspected database schema: hypotheses table has target_gene field
    - Confirmed market_transactions table exists with CHECK constraint on action field
    - Found multiple overlapping target genes (APOE: 5 hypotheses, AQP4: 3, etc.)

    • Implementation:
    - Added import anthropic to post_process.py
    - Created rescore_overlapping_hypotheses(analysis_id) function:
    - Queries new hypotheses with non-empty target_gene fields
    - For each, finds existing hypotheses (from different analyses) with same gene
    - Calls Claude Bedrock API to re-evaluate existing hypothesis with new evidence
    - Parses 10-dimension scores from Claude's JSON response
    - Computes new composite_score using same formula as initial scoring
    - Updates hypothesis scores and market_price in database
    - Inserts market_transactions row with action='score_update'
    - Integrated into main() workflow as Step 2 (after parsing, before enrichment)
    - Processes last 50 analyses to catch any recent overlaps

    • Testing:
    - Installed anthropic package (pip3 install --break-system-packages)
    - Tested with SDA-2026-04-01-gap-008: found AQP4 overlaps, re-scored 2 hypotheses
    - Tested with SDA-2026-04-01-gap-013: re-scored 2 AQP4 hypotheses successfully
    - Verified market_transactions table populated with score_update actions
    - Confirmed hypothesis prices changed: e.g., $0.42 → $0.60, $0.41 → $0.38
    - WAL mode working correctly (transactions persist after checkpoint)

    • Result: ✅ DONE
    - ✅ market_transactions has rows
    - ✅ Hypothesis prices change as new analyses run
    - Re-scoring triggers automatically when post_process.py runs
    - Claude evaluates impact of new evidence on existing hypotheses
    - Market reflects updated confidence based on accumulating evidence

    Tasks using this spec (1)
    [Exchange] Re-score hypotheses on new evidence
    Exchange done P85
    File: 78da8351_c5a_re_score_hypotheses_spec.md
    Modified: 2026-04-25 23:40
    Size: 4.6 KB