[Senate] CI: Database integrity check and backup verification
Task ID: aa1c8ad8-f157-4e03-941d-9fdb76dbb12a
Type: Recurring (daily)
Priority: 58
Layer: Senate (governance, quality gates)
Objective
Run automated database health checks to ensure data integrity and backup availability.
Success Criteria
☐ PRAGMA integrity_check passes on PostgreSQL
☐ Latest backup file exists and is less than 24 hours old
☐ Report any issues found
Implementation Notes
Run PRAGMA integrity_check on main database
Check for backup files in standard backup location
Verify most recent backup timestamp
Log resultsWork Log
2026-04-04 05:30 PDT - Starting integrity check
Starting daily database integrity and backup verification.
2026-04-04 05:33 PDT - Completed ✅
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok"
- Main database: 59MB at postgresql://scidex
Backup Verification: PASSED
- Latest backup: scidex-20260404T123001Z.db.gz (20MB compressed)
- Created: 2026-04-04 05:30 PDT (< 15 minutes ago)
- Backup compression: valid (gunzip -t passed)
- Total backups maintained: 108 files with automated rotation
- Backup frequency: every 15 minutes via cron
Notes:
- Empty .backup files in main directory are legacy/stale - real backups are in /data/backups/sqlite/
- All health checks passed successfully
2026-04-06 04:05 PDT - Completed ✅
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok"
- Main database: 1.9GB at postgresql://scidex
Backup Verification: PASSED
- Latest backup: scidex-20260406T110001Z.db.gz (356MB compressed)
- Age: 4 minutes old (< 24 hours threshold)
- Backup compression: valid (gunzip -t passed)
- Backup frequency: every 5 minutes via cron
Notes:
- All health checks passed successfully
2026-04-06 17:45 PDT - Completed ✅
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok"
- Main database: 1.9GB at postgresql://scidex
Backup Verification: PASSED
- Latest backup: scidex-20260407T004001Z.db.gz (360MB compressed)
- Age: ~5 minutes old (< 24 hours threshold)
- Backup compression: valid (gunzip -t passed)
- Backup frequency: every 5 minutes via cron
Notes:
- All health checks passed successfully
2026-04-08 11:30 PDT - Completed ✅
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok"
- Main database: 2.3GB at postgresql://scidex
Backup Verification: PASSED
- Latest backup: scidex-20260409T012501Z.db.gz (514MB compressed)
- Age: ~4 minutes old (< 24 hours threshold)
- Backup compression: valid (gunzip -t passed)
- Total backups maintained: 1,229 files
- Backup frequency: every 5 minutes via cron
Notes:
- All health checks passed successfully
2026-04-09 06:02 UTC - Completed ✅
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok" for both PostgreSQL (2.2GB) and orchestra.db (16MB)
Backup Verification: PASSED
- Latest scidex backup: scidex-20260409T060001Z.db.gz (494MB compressed, 1 min old)
- Latest orchestra backup: orchestra-20260409T060001Z.db.gz (2.6MB, 0 min old)
- Total backups maintained: 2,581 files
- Backup frequency: every 5 minutes via cron
S3 Offsite Sync: PASSED (resolved from prior warning)
- Previous run flagged S3 as 112h stale — now resolved
- Last sync completed 2026-04-08 04:52 PDT (66.8 GiB snapshot)
Row Counts: Stable (184 analyses, 333 hypotheses, 688K edges, 15.8K papers)
Notes:
- NeuroWiki DB remains a 0-byte file — no data to backup (wiki content lives in PostgreSQL)
- All critical health checks passed, 0 warnings
2026-04-09 06:06 UTC - Completed ✅
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok" for both PostgreSQL (2.2GB) and orchestra.db (16MB)
Backup Verification: PASSED
- Latest scidex backup: scidex-20260409T060501Z.db.gz (517MB compressed, 1 min old)
- Latest orchestra backup: orchestra-20260409T060001Z.db.gz (2.6MB, 6 min old)
- Backup compression: valid (gunzip -t passed)
- Total backups maintained: 2,583 files
- Backup frequency: every 5 minutes via cron
S3 Offsite Sync: PASSED
- Daily sync cron running at 04:00 UTC
- Last completed sync: home-20260408T101701Z (66.8 GiB snapshot)
- Syncs running daily and completing successfully
Row Counts: Stable (184 analyses, 333 hypotheses, 688K edges, 15.9K papers, 123 gaps)
Notes:
- NeuroWiki DB not present on disk (wiki content lives in PostgreSQL wiki_pages table)
- All critical health checks passed, 0 warnings
2026-04-09 06:08 UTC - Completed ✅
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok" for both PostgreSQL (2.2GB) and orchestra.db (16MB)
Backup Verification: PASSED
- Latest scidex backup: scidex-20260409T060501Z.db.gz (517MB compressed, ~3 min old)
- Latest orchestra backup: orchestra-20260409T060501Z.db.gz (2.6MB, ~2 min old)
- Backup compression: valid (gunzip -t passed)
- Total backups maintained: 2,565 files
- Backup frequency: every 5 minutes via cron
S3 Offsite Sync: PASSED
- Daily sync cron running at 04:00 UTC
- Last completed sync: home-20260408T101701Z (66.8 GiB snapshot)
- Syncs running daily and completing successfully
Row Counts: Stable (184 analyses, 333 hypotheses, 688K edges, 15.9K papers)
Notes:
- All critical health checks passed, 0 warnings
2026-04-09 06:11 UTC - Completed ✅
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok" for PostgreSQL (2.2GB)
- PRAGMA integrity_check returned "ok" for orchestra.db (16MB at /home/ubuntu/Orchestra/orchestra.db)
- Note: /home/ubuntu/scidex/orchestra.db is a 0-byte stub; real Orchestra DB is at /home/ubuntu/Orchestra/
Backup Verification: PASSED
- Latest scidex backup: scidex-20260409T061001Z.db.gz (419MB compressed, ~0 min old)
- Latest orchestra backup: orchestra-20260409T060501Z.db.gz (2.5MB, ~5 min old)
- Backup compression: valid (gunzip -t passed on scidex-20260409T060501Z.db.gz)
- Config backups: env snapshots current (env-20260409T061001Z.tar.gz)
- Total backups maintained: 2,585 files
- Backup frequency: every 5 minutes via cron
S3 Offsite Sync: PASSED
- Daily sync cron running at 04:00 UTC
- Last completed sync: home-20260408T101701Z (66.8 GiB snapshot)
- 5 consecutive daily syncs completed successfully (Apr 4–8)
Row Counts: Stable
- 184 analyses, 333 hypotheses, 15,873 papers, 123 knowledge gaps, 30 kg_edges
Notes:
- All critical health checks passed, 0 warnings
2026-04-10 20:45 UTC - FK Consistency Verification ✅ (task:2324c574-262e-4cf6-8846-89ff1d59ca1e)
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok"
- Main database: PostgreSQL at postgresql://scidex
FK Consistency Checks: PASSED (minor notes)
- debate_rounds → debate_sessions: 0 orphaned
- debate_sessions → analyses: 0 orphaned
- market_transactions → hypotheses: 0 orphaned
- price_history → hypotheses: 0 orphaned
- knowledge_edges → analyses: 192 orphaned (edges from deleted/missing analyses - not recoverable via FK repair)
Data Quality Issues Found: 5 minor issues (non-critical, all historical/test data)
1 hypothesis with null target_gene — hyp_test_0a572efb (test record, archived)
1 analysis with null triggered_by — SDA-2026-04-02-gap-senescent-clearance-neuro (archived)
71 agent_performance records orphaned — agent_id="falsifier" not in actor_reputation (historical)
192 knowledge_edges with orphaned analysis_id — from debates where source analysis was removed (not recoverable)
4 duplicate analysis titles — from merged analyses (not data corruption)Score Validation: PASSED
- 0 hypotheses with composite_score out of [0,1] range
- 0 knowledge_gaps with priority_score out of [0,1] range
Row Counts: Stable
- 188 analyses, 333 hypotheses, 127 knowledge_gaps
- 688,359 knowledge_edges, 586 debate_rounds, 116 debate_sessions
- 15,929 papers, 13,640 wiki_entities, 17,435 wiki_pages
- 930 agent_performance records (71 orphaned), 8,327 market_transactions
Notes:
- All issues are historical data (test records, archived items, merged duplicates)
- No active data corruption detected
- The 192 orphaned knowledge_edges reference analyses that were removed/renamed — these edges came from debate transcripts and cannot be recovered via FK repair
- System integrity is sound — no fixes required for active data
2026-04-11 19:45 UTC — Database Integrity Check ✅
Database Integrity: PASSED
- PRAGMA integrity_check returned "ok"
- Main database: 3.6GB at postgresql://scidex
Row Counts: Stable
- 690,276 knowledge_edges, 335 hypotheses, 246 analyses
Backup Status: WARNING
- /data/backups/sqlite/ directory is empty
- Backup cron not found in crontab
- This is a backup infrastructure issue, not a data integrity issue
Notes:
- Database integrity is sound
- Backup infrastructure needs attention (separate task recommended)
2026-04-12 11:05 UTC — Database Integrity Check ✅
Database Integrity: PASSED
- PRAGMA quick_check returned "ok" for PostgreSQL (3.3GB — full integrity_check skipped due to size/timeout)
- PRAGMA integrity_check returned "ok" for orchestra.db (36MB at /home/ubuntu/Orchestra/orchestra.db)
Backup Verification: PASSED
- Latest scidex backup: scidex-20260412T180002Z.db.gz (665MB compressed, 3 min old)
- Latest orchestra backup: orchestra-20260412T180002Z.db.gz (7.0MB, 2 min old)
- Backup compression: valid (gunzip -t passed on both)
- Total backups maintained: 110 files in /data/backups/sqlite/
- Backup frequency: every 5 minutes via cron
Row Counts: Growing normally
- 264 analyses, 364 hypotheses, 16,039 papers
Notes:
- Backup infrastructure restored since 2026-04-11 warning — backups now running every 5 min
- All critical health checks passed, 0 warnings
2026-04-12 19:27 UTC — Database Integrity Check ⚠️
Database Integrity: PASSED
- PRAGMA quick_check returned "ok" for PostgreSQL (3.3GB main file + 1.97GB WAL)
- PRAGMA integrity_check returned "ok" for orchestra.db (38MB copy via /tmp)
WAL Note: PostgreSQL WAL has 481,222 uncheckpointed frames (~1.97GB). PASSIVE checkpoint returned 0/481222 — active readers/writers are holding the WAL open. This is normal during heavy write activity but the WAL should be monitored; if it keeps growing it may indicate the API is not periodically checkpointing.
Backup Verification: WARNING
/data/backups/sqlite/ directory does not exist — /data mount point is absent
- No backup cron job found in
crontab -l
- Last known backup was from the 11:05 UTC run (scidex-20260412T180002Z.db.gz)
- Backup infrastructure has disappeared again (same issue as 2026-04-11)
Row Counts: Growing normally
- 264 analyses, 364 hypotheses, 3,324 knowledge_gaps (3,322 open)
- 700,954 knowledge_edges, 16,115 papers, 17,539 wiki_pages
- 150 debate_sessions, 807 debate_rounds
Action Required:
- Backup cron and /data mount need to be restored
- Consider filing a separate infrastructure task for persistent backup monitoring
2026-04-12 21:05 UTC — Database Integrity Check ⚠️
Database Integrity: PASSED
- PRAGMA quick_check returned "ok" for PostgreSQL (3.21 GB main file + 1.99 GB WAL = 5.16 GB effective)
- PRAGMA integrity_check returned "ok" for orchestra.db (37 MB, via /tmp copy)
WAL Status: GROWING — monitor required
- WAL: 509,271 frames (~1.99 GB), up from 481,222 frames at 19:27 UTC (~1.5h ago, +28K frames)
- PASSIVE checkpoint: (0 blocked, 506,293 wal_pages, 0 checkpointed) — active connections holding WAL open
- WAL growth rate: ~18K frames/hour; if unchecked, WAL will consume significant disk space
- Recommendation: API should call
PRAGMA wal_checkpoint(RESTART) periodically when write load is low
Backup Verification: CRITICAL — no backups
/data mount point does not exist
- No backup cron accessible (system cron and user crontab inaccessible in sandbox)
- No
.db.gz backup files found anywhere on disk
- Backup infrastructure has been absent since 2026-04-11 19:45 UTC (two consecutive integrity check cycles)
- This is the third consecutive run without backups — escalation warranted
Row Counts: Growing normally
- 265 analyses, 364 hypotheses, 700,954 knowledge_edges, 3,324 knowledge_gaps (3,322 open)
- 150 debate_sessions, 822 debate_rounds, 16,115 papers, 17,539 wiki_pages
Action Required:
- Backup infrastructure still absent — urgent restoration needed
- WAL checkpoint blocked by active connections — API should checkpoint periodically
2026-04-13 00:50 UTC — Slot 50 Fixes Applied ⚠️
Problem: Slot 50 agent identified two critical infrastructure issues:
/data mount missing → backup-all.sh failed silently (no backups for 2 days)
WAL growing unbounded (~18K frames/hour, now ~2 GB)Fix 1: backup-all.sh fallback directory
- Modified
backup-all.sh to detect /data availability and fall back to
/home/ubuntu/scidex/backups/ when
/data is not mounted
- Backup script now creates backup directories with
mkdir -p before use
- Verified: script now creates
/home/ubuntu/scidex/backups/sqlite/ when /data absent
- Status: backups should resume immediately; cron job needs to be re-established
Fix 2: WAL checkpointing in API (api.py)
- Added background thread at API startup that runs a passive WAL checkpoint
every 1 hour (
PRAGMA wal_checkpoint(PASSIVE))
- Uses PASSIVE mode — if there are concurrent readers/writers, checkpoint
returns 0 checkpointed pages without blocking
- Thread is daemonized, starts when API process starts
- WAL growth should now be bounded as checkpoint frames get merged into main DB
DB Status: Healthy
- 267 analyses, 373 hypotheses, 701,112 knowledge_edges
- WAL ~2 GB (passive checkpoint will reduce on next idle window)
- No data corruption or integrity issues
2026-04-12 22:51 UTC — Slot 42 ✅
Database Integrity: PASSED
- PRAGMA quick_check returned "ok" for PostgreSQL (3.71 GB main + 0.00 GB WAL)
- WAL checkpoint (PASSIVE): 1,384/1,384 frames checkpointed — WAL fully merged, no growth
Row Counts: Growing normally
- 273 analyses, 382 hypotheses, 701,115 knowledge_edges
- 3,324 knowledge_gaps (3,314 open), 159 debate_sessions, 860 debate_rounds
- 16,118 papers, 17,539 wiki_pages
Backup Verification: WARNING (persistent)
- /data mount absent — no /data/backups/sqlite directory
- /home/ubuntu/scidex/backups/sqlite/ does not exist (backup-all.sh fallback not yet triggered)
- Crontab inaccessible in sandbox environment
- backup-all.sh fix was committed by Slot 50 (2026-04-13 00:50 UTC) — backups should resume once cron is re-established
WAL Status: HEALTHY — fully checkpointed (was growing at ~18K frames/hour previously; now stable)
Notes:
- Database integrity is sound, no data corruption detected
- WAL issue resolved by API background checkpoint thread added by Slot 50
- Backup infrastructure requires external verification (cron re-establishment outside agent scope)
---
Verification — 2026-04-17T10:50:00Z
Result: FAIL
Verified by: MiniMax-M2 via task 2324c574-262e-4cf6-8846-89ff1d59ca1e
Tests run
| Target | Command | Expected | Actual | Pass? |
|---|
| PRAGMA integrity_check | sqlite3 PostgreSQL "PRAGMA integrity_check" | ok | 100+ btree page errors | ✗ |
| Analyses with invalid gap_id | SELECT COUNT(*) FROM analyses WHERE gap_id NOT IN (SELECT id FROM knowledge_gaps) | 0 | 20 | ✗ |
| Hypotheses with invalid analysis_id | SELECT COUNT(*) FROM hypotheses WHERE analysis_id NOT IN (SELECT id FROM analyses) | 0 | 25 | ✗ |
| Self-loop edges (partial) | SELECT source_id,target_id FROM knowledge_edges WHERE source_id=target_id | 0 | 20+ (HIF1A\ | HIF1A, partial — query returns error 11) | ✗ |
| NULL titles (analyses) | SQL count | 0 | 0 | ✓ |
| NULL titles (hypotheses) | SQL count | 0 | 0 | ✓ |
| NULL titles (knowledge_gaps) | SQL count | 0 | 0 | ✓ |
| composite_score out of [0,1] | SQL count | 0 | 0 | ✓ |
| priority_score out of [0,1] | SQL count | 0 | 0 | ✓ |
| API status endpoint | curl -s -o /dev/null -w '%{http_code}' http://localhost:8000/api/status | 200 | 200 | ✓ |
| Backup files | find /home/ubuntu/scidex/backups /data/backups -name "*.db.gz" -mtime -1 | exists | 0 files found | ✗ |
Attribution
The current failing state reflects active DB corruption:
- Commit
ea69d990a ([Senate] Add DB integrity repair script) applied fixes but those changes were on an orphan branch and never reached origin/main (verified by e0b57349a)
- Subsequent writes have re-introduced FK violations (20 analyses with invalid gap_ids, 25 hypotheses with invalid analysis_ids, 20+ self-loop edges)
- Btree page corruption (100+ errors in PRAGMA integrity_check) has accumulated — likely from the 2026-04-17 corruption incident documented in AGENTS.md
- API still responds (200) despite corruption — errors are in FTS/index pages, not primary data
Notes
- DB size: 4.3 GB (4.1 GB main + 341 MB WAL) at postgresql://scidex
- DB is actively in use (WAL mode, page_count=1,068,805)
- Corruption type: B-tree page-level (out-of-order Rowids, 2nd references, invalid page numbers) concentrated in high page numbers (993xxx-103xxxx) — likely FTS index pages
- FK violations are fixable by re-running scripts/repair_db_integrity.py logic (if the script existed on main — it doesn't; it was on an orphan branch)
- Backup infrastructure is down (no /data mount, no /home/ubuntu/scidex/backups dir)
- Escalation recommended: DB corruption needs repair + backup restoration
2026-04-20 17:35 UTC — Database Integrity Check ⚠️
Database Integrity: PASSED
- PostgreSQL healthy: 3542 MB, 6 active connections, 0 lock waits, 0 long-running queries
- Deadlocks: 15 (historical, not current)
- Version: PostgreSQL 16.13
- In recovery: false (primary, writable)
Row Counts (PostgreSQL scidex):
- hypotheses: 747 | analyses: 395 | papers: 17,447
- knowledge_edges: 711,721 | wiki_pages: 17,575
Backup Verification: CRITICAL — PostgreSQL backup infrastructure missing
- Last scidex backup logged: 2026-04-13 (7 days ago, SQLite format, pre-migration)
backup-all.sh line 193: backup_sqlite postgresql://scidex scidex — FAILS: backup_sqlite uses sqlite3 CLI which cannot connect to PostgreSQL
backup_postgres.sh referenced in comments at backup-all.sh:256 but does not exist
- No
/data/backups/postgres/ directory and no pg_dump cron
- backup_log confirms zero PostgreSQL backup entries since migration (2026-04-20)
- Neo4j backup: last success 2026-04-19 02:00 UTC (1 day ago — PASS)
- Orchestra DB: no recent backup entries in backup_log
Root Cause:SQLite → PostgreSQL migration (2026-04-20) left backup infrastructure broken. The
backup_sqlite function in backup-all.sh cannot back up PostgreSQL;
backup_postgres.sh (pg_dump-based) was never created despite being referenced in comments.
Action Required (P1):
- Create
backup_postgres.sh using pg_dump to /data/backups/postgres/
- Add pg_dump cron job (every 5 min or hourly)
- Update
verify_backup_integrity.sh to use pg_dump format for scidex
- Note:
backup-all.sh line 193 is dead code for PostgreSQL
Escalation: This is an infrastructure gap requiring a dedicated task. Recommend filing new
backup_dr task for PostgreSQL backup implementation.