[Exchange] Re-score hypotheses on new evidence

Goal

When a new analysis references an existing hypothesis's target gene, auto re-evaluate. In post_process.py after parsing, check for overlapping target_gene with existing hypotheses. Use Claude to re-score given new evidence. Insert market_transactions row.

Acceptance Criteria

☑ market_transactions has rows.

☑ Hypothesis prices change as new analyses run.

Approach

Read AGENTS.md and relevant source files

Understand existing code patterns before modifying

Implement changes following existing conventions (f-string HTML, SQLite, Bedrock Claude)

Test: curl affected pages, verify rendering, run scidex status

Commit atomically with descriptive message

Work Log

2026-04-25 22:30 PT — Slot 0

Re-validated the task against current origin/main state instead of trusting the older completion log.
Confirmed historical score_update rows still exist in PostgreSQL, but reproduced that the current post_process.py code path cannot run on the retired SQLite path because get_db() still calls get_db_write(DB_PATH) where DB_PATH is .sqlite-retired.
Identified a second runtime defect in the rescore path: log_edit() references before_state without defining it locally.
Plan: route post_process.get_db() through the shared default PostgreSQL helper, define the pre-update state for journaling, and verify a controlled rescore writes a fresh market_transactions score_update row and updates hypothesis price.

2026-04-25 22:15 PT — Slot 0 — Verification

Patched post_process.get_db() to use the shared PostgreSQL default instead of the retired .sqlite-retired path.
Patched save_hypothesis_version() for the current PostgreSQL hypothesis_versions schema and explicit id allocation so version snapshots no longer fail on stale sequence state.
Added before_state initialization and per-hypothesis db.rollback() on errors in rescore_overlapping_hypotheses() so one bad candidate does not abort the rest of the batch.
Deterministic live verification against SDA-2026-04-26-gap-pubmed-20260410-181340-8acb24dc-debate with the Claude call stubbed returned rescored=6, error_lines=0, and increased market_transactions.action='score_update' rows from 53366 to 53372.
Latest verification write: hypothesis h-15b0ade220, old_price=0.57, new_price=0.57, created at 2026-04-25 22:14:03 PT.

2026-04-01 19:50 PT — Slot 0 — COMPLETE

Analysis Phase:

- Read AGENTS.md, post_process.py, agent.py for architecture understanding
- Inspected database schema: hypotheses table has target_gene field
- Confirmed market_transactions table exists with CHECK constraint on action field
- Found multiple overlapping target genes (APOE: 5 hypotheses, AQP4: 3, etc.)

Implementation:

- Added import anthropic to post_process.py
- Created rescore_overlapping_hypotheses(analysis_id) function:
- Queries new hypotheses with non-empty target_gene fields
- For each, finds existing hypotheses (from different analyses) with same gene
- Calls Claude Bedrock API to re-evaluate existing hypothesis with new evidence
- Parses 10-dimension scores from Claude's JSON response
- Computes new composite_score using same formula as initial scoring
- Updates hypothesis scores and market_price in database
- Inserts market_transactions row with action='score_update'
- Integrated into main() workflow as Step 2 (after parsing, before enrichment)
- Processes last 50 analyses to catch any recent overlaps

Testing:

- Installed anthropic package (pip3 install --break-system-packages)
- Tested with SDA-2026-04-01-gap-008: found AQP4 overlaps, re-scored 2 hypotheses
- Tested with SDA-2026-04-01-gap-013: re-scored 2 AQP4 hypotheses successfully
- Verified market_transactions table populated with score_update actions
- Confirmed hypothesis prices changed: e.g., $0.42 → $0.60, $0.41 → $0.38
- WAL mode working correctly (transactions persist after checkpoint)

Result: ✅ DONE

- ✅ market_transactions has rows
- ✅ Hypothesis prices change as new analyses run
- Re-scoring triggers automatically when post_process.py runs
- Claude evaluates impact of new evidence on existing hypotheses
- Market reflects updated confidence based on accumulating evidence

Tasks using this spec (1)

[Exchange] Re-score hypotheses on new evidence

Exchange done P85

File: 78da8351_c5a_re_score_hypotheses_spec.md

Modified: 2026-04-25 23:40

Size: 4.6 KB