Goal
The knowledge_gaps table grew from 48 → 2472 in one prioritization-quest cycle (50x). Only 19 of those 2472 gaps have ever produced an analysis (0.77%). The gap factory is creating gaps faster than the system can process them, and most of the new gaps appear to be low-quality (high-volume PubMed extraction without filtering).
This task implements four fixes:
Add a quality filter so only gaps with importance_score >= 0.5 are inserted
Add a per-cycle cap on new gap inserts (max 50 per cycle)
Close gaps older than 30 days with status='open' and zero downstream activity
Reduce the gap factory frequency from daily to weeklyAcceptance Criteria
☐ gap_scanner.py only inserts gaps when importance_score >= 0.5 (using gap_scoring.py)
☐ gap_scanner.py caps new gap inserts at 50 per cycle
☐ debate_gap_extractor.py only inserts gaps when importance_score >= 0.5
☐ gap_pipeline.py closes stale gaps (30+ days, open status, no downstream activity)
☐ systemd timer changed from daily to weekly
☐ All changes tested and committed
Approach
gap_scanner.py modifications:
- Import and call
gap_scoring.score_knowledge_gap() before creating each gap
- Use the returned
importance_score to filter (>= 0.5 threshold)
- Track
gaps_created count and stop when reaching cap of 50
- Change
--days default from 7 to 1 to reduce paper scan window
debate_gap_extractor.py modifications:
- After extracting question, call
gap_scoring.score_knowledge_gap() - Only create gap if importance_score >= 0.5
gap_pipeline.py additions:
- Add
close_stale_gaps() function to find and close gaps with:
- status = 'open'
- created_at < 30 days ago
- No entries in analyses or hypotheses tables
- Run this as part of the regular pipeline
scidex-gap-scanner.timer modification:
- Change
OnCalendar=daily to
OnCalendar=weekly - Alternatively:
OnCalendar=*-0/7 03:00:00 for weekly
Dependencies
- gap_scoring.py must remain functional (already exists)
- db_writes.create_knowledge_gap must remain functional (already exists)
Work Log
2026-04-11 — Slot 0
- Investigated gap factory issue: gap_scanner.py and debate_gap_extractor.py create gaps without setting importance_score
- Found gap_scoring.py exists but is not called during gap creation
- Analyzed database: 2470 open gaps, only ~2023 have importance_score >= 0.5
- Created spec file
2026-04-11 — Implementation
- gap_scanner.py: Added import for gap_scoring; modified create_gap() to call gap_scoring.score_knowledge_gap() and filter on importance_score >= 0.5; added MAX_GAPS_PER_CYCLE=50 constant; added per-cycle cap logic in scan_papers(); changed default --days from 7 to 1
- debate_gap_extractor.py: Added import for gap_scoring; modified create_gap_from_question() to call gap_scoring.score_knowledge_gap() and filter on importance_score >= 0.5; updated process_debate_session() to handle filtered (None) gap_ids
- gap_pipeline.py: Added close_stale_gaps() function with --close-stale and --close-stale-days CLI args; closes gaps with status='open', >30 days old, no downstream analyses/hypotheses
- scidex-gap-scanner.service: Changed --days from 7 to 1
- scidex-gap-scanner.timer: Changed OnCalendar from daily to weekly
- All Python files passed syntax check and import tests
- Committed as 37c7ec19 and pushed to origin
2026-04-11 — Merge Gate Fix (attempt 3)
- Merge gate again blocked: still had infra/scripts deletions, and close_stale_gaps not wired into pipeline
- Restored infra/scripts/{README.md,backup-all.sh,snapshot-home-hourly.sh,sync-full-s3.sh} (were deleted in prior commits but are unrelated to gap throttle)
- Wired close_stale_gaps() into systemd service: scidex-gap-scanner.service now runs both gap_scanner.py AND gap_pipeline.py --close-stale in a single ExecStart bash -c chain
- Committed as 8f136147, pushed to origin
- Final diff: 6 files, +225/-10 lines (gap_scanner.py, debate_gap_extractor.py, gap_pipeline.py, scidex-gap-scanner.{service,timer}, spec)
2026-04-14 — Slot 0 (Verification)
- Verified all 5 acceptance criteria on branch (already merged to main via 37c7ec199, 8f136147):
- [x] gap_scanner.py: importance_score >= 0.5 filter present (line 408-413, uses gap_scoring.score_knowledge_gap)
- [x] gap_scanner.py: MAX_GAPS_PER_CYCLE = 50 cap (line 462, enforced at lines 514-515, 530-531)
- [x] debate_gap_extractor.py: importance_score >= 0.5 filter (lines 286-291, 588-595)
- [x] gap_pipeline.py: close_stale_gaps() with --close-stale / --close-stale-days args (lines 400-480)
- [x] scidex-gap-scanner.timer: OnCalendar=weekly (line 6)
- [x] scidex-gap-scanner.service: wires both gap_scanner.py AND gap_pipeline.py --close-stale (line 9)
- Updated spec work log
- Result: Done — throttle gap factory task complete, all acceptance criteria verified