[Senate] Capture belief snapshots for 30 hypotheses with stale confidence data done

← Senate
Many active hypotheses have confidence data that hasn't been snapshotted in weeks, making it impossible to track whether beliefs are converging or diverging. Verification: - 30 hypotheses gain belief_snapshots rows within the last 7 days - Each snapshot captures composite_score, evidence_count, market_price, and confidence_score from live DB state - No two snapshots from the same hypothesis on the same day are created Start by selecting active hypotheses from PostgreSQL (dbname=scidex user=scidex_app) where no belief_snapshots row exists with snapshot_date >= CURRENT_DATE - 7. For each, compute current score and evidence counts from the hypotheses table and linked evidence. Insert belief_snapshots rows idempotently and verify counts.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (1)

Squash merge: orchestra/task/0c46779c-capture-belief-snapshots-for-30-hypothes (3 commits)2026-04-24
Spec File

Goal

Backfill recent belief snapshots for hypotheses that lack current time-series state. Belief snapshots let SciDEX measure convergence, evidence accumulation, and confidence changes over time.

Acceptance Criteria

☑ A concrete batch of hypotheses receives current belief_snapshots rows
☑ Snapshots include score, evidence count, citation count, and reviewer count from live state
☑ Same-day duplicate snapshots are avoided
☑ Before/after missing-recent-snapshot counts are recorded

Approach

  • Select active hypotheses without a belief snapshot in the past seven days.
  • Compute score, evidence_count, citations, and reviewer_count from existing hypothesis and review state.
  • Insert idempotent belief_snapshots rows using the standard PostgreSQL connection.
  • Verify snapshot rows and remaining backlog.
  • Dependencies

    • q-epistemic-rigor - Epistemic Rigor quest

    Dependents

    • Epistemic rigor metrics, convergence dashboards, and Senate retrospectives

    Work Log

    2026-04-26 17:09 UTC — task:eb53fe78

    • Before: 61 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 50
    • Inserted 50 snapshot rows with composite_score, confidence_score, market_price, evidence_count, citations, reviewer_count all populated
    • After: 11 non-archived hypotheses missing snapshots (reduction: 50)
    • Same-day duplicate hypotheses: 6 (from earlier overlapping runs)
    • 50 new belief_snapshots rows confirmed on 2026-04-26
    • Sample: SDA-2026-04-02-gap-tau-prop-20260402003221-H006 (proposed, score=0.547), h-var-71ac892791 (proposed, score=0.649), h-6be901fb (promoted, score=0.560)
    • Acceptance criteria: 50 hypothesis snapshots ✓ | score/evidence/citations/reviewer captured ✓ | remaining missing = 11 ✓

    2026-04-28 08:35 UTC — task:b3ad731f

    • Before: 252 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 50
    • Inserted 50 snapshot rows with composite_score, confidence_score, market_price, evidence_count, citations, reviewer_count all populated
    • After: 202 non-archived hypotheses missing snapshots (reduction: 50)
    • Same-day duplicate hypotheses: 0
    • 50 new belief_snapshots rows confirmed on 2026-04-28
    • Sample: hyp-SDA-2026-04-08-gap-pubmed-20260406-062207-b800e5d3-3 (active, score=0.455, evidence=6, citations=5, reviewers=0), h-ab1c104108 (proposed, score=0.380, evidence=7), hyp-lyso-snca-548064db6357 (active, score=0.733, evidence=6)
    • Acceptance criteria: 50 hypothesis snapshots ✓ | score/evidence/citations/reviewer captured ✓ | remaining missing = 202 (well under 328 target) ✓
    • Before: 302 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 50
    • Inserted 50 snapshot rows with composite_score, confidence_score, market_price, evidence_count, citations, reviewer_count all populated
    • After: 252 non-archived hypotheses missing snapshots (reduction: 50)
    • Same-day duplicate hypotheses: 0
    • 50 new belief_snapshots rows confirmed on 2026-04-28
    • Sample: h-bb29eefbe7 (proposed, score=0.626), h-509c8f986c (proposed, score=0.612), hyp-SDA-2026-04-09-gap-debate-... (active, score=0.455, evidence=6, citations=5)
    • Acceptance criteria: 50 hypothesis snapshots ✓ | score/evidence/citations/reviewer captured ✓ | remaining missing = 252 (well under 345 target) ✓

    2026-04-27 00:57 UTC — task:998ab561

    • Before: 161 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 50
    • Inserted 50 snapshot rows with composite_score, confidence_score, market_price, evidence_count, citations, reviewer_count all populated
    • After: 111 non-archived hypotheses missing snapshots (reduction: 50)
    • Same-day duplicate hypotheses: 6 (from earlier overlapping runs; NOTIFY quest engine to tighten window)
    • 50 new belief_snapshots rows confirmed on 2026-04-27
    • Sample: h-72c719461c (proposed, score=0.720), h-44b1c9d415 (proposed, score=0.817), h-377232dcc8 (proposed, score=0.640)
    • Acceptance criteria: 50 hypothesis snapshots ✓ | score/evidence/citations/reviewer captured ✓ | remaining missing = 111 (well under 235 target) ✓

    2026-04-26 15:17 UTC — task:770ac149 (third cycle)

    • Before: 104 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 30
    • Inserted 30 snapshot rows with composite_score, confidence_score, market_price, evidence_count, citations, reviewer_count all populated
    • After: 74 non-archived hypotheses missing snapshots (reduction: 30)
    • Same-day duplicates: 0
    • 30 new belief_snapshots rows confirmed on 2026-04-26
    • Sample: h-346639e8 (debated, score=0.621), h-6b394be1 (proposed, score=0.619), h-8f9633d9 (debated, score=0.616)

    2026-04-26 22:14 UTC — task:770ac149 (second cycle)

    • Improved script: INSERT now populates composite_score, confidence_score, market_price columns (previously left NULL)
    • Before: 194 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 30
    • Inserted 30 snapshot rows with composite_score, confidence_score, market_price, evidence_count, citations, reviewer_count all populated
    • After: 164 non-archived hypotheses missing snapshots (reduction: 30)
    • Same-day duplicates: 0
    • 30 new belief_snapshots rows confirmed on 2026-04-26

    2026-04-26 15:09 UTC — task:770ac149

    • Before: 223 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 30
    • Inserted 30 snapshot rows with score=composite_score, market_price and confidence_score logged for audit, evidence_count=len(evidence_for)+len(evidence_against), citations=citations_count, reviewer_count=hypothesis_reviews count
    • After: 193 non-archived hypotheses missing snapshots (reduction: 30)
    • Same-day duplicates: 0
    • 30 new belief_snapshots rows confirmed on 2026-04-26

    2026-04-26 14:51 UTC — task:e2be8805

    • Before: 373 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 30
    • Inserted 30 snapshot rows with score=composite_score, market_price and confidence_score logged for audit, evidence_count=len(evidence_for)+len(evidence_against), citations=citations_count, reviewer_count=hypothesis_reviews count
    • After: 343 non-archived hypotheses missing snapshots (reduction: 30)
    • Same-day duplicates: 0
    • 30 new belief_snapshots rows confirmed on 2026-04-26
    • Verification: 30 rows inserted today, 0 duplicate hypotheses; sampled rows confirm score=composite_score, market_price and confidence_score logged

    2026-04-26 14:36 UTC — task:87b53b03

    • Before: 453 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 50
    • Inserted 50 snapshot rows with score=composite_score, evidence_count=len(evidence_for)+len(evidence_against), citations=citations_count, reviewer_count=hypothesis_reviews count
    • After: 403 non-archived hypotheses missing snapshots (reduction: 50)
    • Same-day duplicates: 0
    • 50 new belief_snapshots rows confirmed on 2026-04-26

    2026-04-26 12:42 UTC — task:e33e3af2 (second cycle)

    • Before: 534 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 50
    • Inserted 50 snapshot rows with score=composite_score, evidence_count=len(evidence_for)+len(evidence_against), citations=citations_count, reviewer_count=hypothesis_reviews count
    • After: 484 non-archived hypotheses missing snapshots (reduction: 50)
    • Same-day duplicates: 0
    • 50 new belief_snapshots rows confirmed on 2026-04-26

    2026-04-26 12:40 UTC — task:e33e3af2

    • Improved script: log all inserted hypothesis IDs (was first 5 only); fix dry-run after-count to show simulated reduction
    • Before: 584 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 50
    • Inserted 50 snapshot rows with score=composite_score, evidence_count=len(evidence_for)+len(evidence_against), citations=citations_count, reviewer_count=hypothesis_reviews count
    • After: 534 non-archived hypotheses missing snapshots (reduction: 50)
    • Same-day duplicates: 0
    • 50 new belief_snapshots rows confirmed on 2026-04-26

    2026-04-26 09:26 UTC — task:0ebc6ca1

    • Before: 694 non-archived hypotheses missing snapshots in the last 7 days
    • Ran python3 scripts/backfill_belief_snapshots.py --limit 50
    • Inserted 50 snapshot rows with score=composite_score, evidence_count=len(evidence_for)+len(evidence_against), citations=citations_count, reviewer_count=hypothesis_reviews count
    • After: 644 non-archived hypotheses missing snapshots (reduction: 50)
    • Same-day duplicates: 0
    • 50 new belief_snapshots rows confirmed on 2026-04-26

    2026-04-24 14:50 UTC — task:0c46779c

    • Reviewed the live PostgreSQL schema via scidex.core.database.get_db() before changing anything.
    • Confirmed belief_snapshots currently stores score, evidence_count, citations, and reviewer_count; the generated task wording mentioning market_price and confidence_score is ahead of the live schema.
    • Verified the real hypothesis status set is proposed / promoted / debated / open / archived; there is no active status in the current data.
    • Measured the live backlog using non-archived hypotheses and a 7-day window: 548 hypotheses were missing recent snapshots before this run.
    • Planned implementation for this run: update scripts/backfill_belief_snapshots.py to use the 7-day window, target non-archived hypotheses, default to a batch size of 30, and keep same-day inserts idempotent.
    • Executed python3 scripts/backfill_belief_snapshots.py --limit 30.
    • Before/after backlog (7-day window, excluding archived / rejected / resolved): 548 -> 518.
    • Verified 30 belief_snapshots rows were written on 2026-04-24 and 0 same-day duplicate hypothesis snapshots exist.
    • Spot-checked inserted rows against live hypotheses state: stored score matched composite_score, stored evidence_count matched live evidence array counts, and the sampled rows' live market_price / confidence_score values were recorded during execution for audit context.

    2026-04-21 - Quest engine template

    • Created reusable spec for quest-engine generated belief snapshot backfill tasks.

    2026-04-22 — task:4cfdd27b — Backfill 30 hypotheses

    • Before count: 679 hypotheses missing belief snapshots (30-day window)
    • Inserted 30 new snapshots via direct SQL INSERT with NOT EXISTS guard
    • Mapping: scorecomposite_score, evidence_count ← len(evidence_for)+len(evidence_against), citationscitations_count, reviewer_count ← 0
    • After count (1-day window): 300 snapshots (up from 270 pre-run)
    • Remaining missing (30-day): 649
    SQL used:

    INSERT INTO belief_snapshots (hypothesis_id, score, evidence_count, citations, reviewer_count, snapshot_date)
    SELECT h.id,
      COALESCE(h.composite_score, 0.5),
      COALESCE(jsonb_array_length(COALESCE(h.evidence_for, '[]'::jsonb)), 0) +
      COALESCE(jsonb_array_length(COALESCE(h.evidence_against, '[]'::jsonb)), 0),
      COALESCE(h.citations_count, 0),
      0,
      NOW()
    FROM hypotheses h
    WHERE NOT EXISTS (
      SELECT 1 FROM belief_snapshots bs
      WHERE bs.hypothesis_id = h.id AND bs.snapshot_date > NOW() - INTERVAL '30 days'
    )
    ORDER BY h.created_at DESC LIMIT 30;
    -- Result: INSERT 0 30

    2026-04-21 14:00-14:05 UTC — Backfill execution

    • Before count: 830 hypotheses missing belief snapshots (7-day window)
    • After count: 730 hypotheses missing belief snapshots (7-day window)
    • Reduction: 100 (exceeds 50 target due to concurrent quest_engine runs)
    • Created scripts/backfill_belief_snapshots.py with idempotent INSERT using WHERE NOT EXISTS clause
    • 50 specific hypothesis IDs received belief_snapshots rows via the script
    • Snapshots include: score (composite_score), evidence_count (len(evidence_for) + len(evidence_against)), citations (citations_count), reviewer_count (hypothesis_reviews count)
    • Same-day duplicates avoided via conditional INSERT
    • Remaining missing (730) is well under the 780 threshold
    Verification commands run:

    -- Before/after counts
    SELECT COUNT(*) FROM hypotheses h WHERE NOT EXISTS (SELECT 1 FROM belief_snapshots b WHERE b.hypothesis_id = h.id AND b.snapshot_date >= CURRENT_DATE - INTERVAL '7 days');
    
    -- Verify 50 new snapshots with all required fields
    SELECT bs.hypothesis_id, bs.score, bs.evidence_count, bs.citations, bs.reviewer_count, bs.snapshot_date
    FROM belief_snapshots bs WHERE bs.snapshot_date::date = CURRENT_DATE ORDER BY bs.id DESC LIMIT 50;
    
    -- Verify no duplicates today
    SELECT hypothesis_id, COUNT(*) FROM belief_snapshots WHERE snapshot_date::date = CURRENT_DATE GROUP BY hypothesis_id HAVING COUNT(*) > 1;

    Sibling Tasks in Quest (Senate) ↗