Goal
SciDEX has 3,545 knowledge gaps of which only 12 are resolved (0.34% resolution rate, 2026-04-28).
Gaps are generated rapidly but never close — the gap system is a noise accumulator, not a progress
tracker. This task builds the resolution engine that closes gaps when sufficient evidence exists.
Why this matters
A gap resolution rate of 0.34% means:
- The platform cannot demonstrate progress to researchers
- The knowledge graph has no "proof of work" feedback
- Market participants cannot price gap-resolution contributions
- The system looks like it generates problems without solving them
Fixing the resolution rate from 0.34% → 10%+ in one task would be the single largest measurable
improvement to SciDEX's scientific output value.
What to build
The resolution pipeline
A gap is considered resolved when all three conditions are met:
Hypothesis coverage: At least one hypothesis with composite_score ≥ 0.7 directly addresses
the gap's topic (match by disease, target gene/pathway, or question text)
Evidence support: The matching hypothesis has evidence_for with ≥ 2 PubMed-cited entries
Debate engagement: At least one debate session exists for an analysis related to the hypothesisBuild a function resolve_matching_gaps(batch_size=50) that:
Queries open knowledge gaps (status='open')
For each gap, finds matching hypotheses using:
- Direct
analysis_id or
target_gene foreign keys where available
- Text similarity between gap title/description and hypothesis title/description
- Disease field matching
Checks the three resolution conditions above
If conditions met: updates knowledge_gaps.status = 'resolved' with a resolution_summary
JSON field containing hypothesis_id, debate_session_id, evidence_summary
Emits a KG edge: gap -[resolved_by]-> hypothesisWhat NOT to do
- Do NOT manually write resolution summaries for each gap (that's row-count work)
- Do NOT lower the resolution bar to artificially inflate the count
- Do NOT create a new recurring driver for this — after building and testing the engine,
register it with the existing
[Agora] CI: Trigger debates for analyses with 0 debate sessions driver or propose integration with the Senate world-model driver
Schema reference
-- knowledge_gaps table: id, title, description, status, disease, analysis_id
-- Update target: status = 'resolved', add resolution metadata to description
Acceptance criteria
☑ resolve_matching_gaps() function implemented and tested
☑ Resolution conditions verified against real data (not loosened to inflate count)
☑ At least 50 gaps resolved in first run, with resolution summaries
☑ KG edges emitted for resolved gaps
☑ Resolution rate improves from 0.34% baseline
☑ Function registered as utility in scidex/agora/ — see scidex/agora/gap_resolution_engine.py
Implementation (2026-04-28)
Module: scidex/agora/gap_resolution_engine.py
Algorithm
Two-phase SQL query to avoid expensive cross-join:
Phase 1 — Pre-filter qualifying hypotheses:
composite_score >= 0.7
evidence_for has ≥ 2 entries with pmid field
- At least one
debate_session exists for the hypothesis's analysis_id
Phase 2 — Match gaps to hypotheses using four signals:
Direct FK chain (priority=100): analyses.gap_id = gap.id → hypotheses.analysis_id
Exact domain/disease (priority=50): LOWER(hypothesis.disease) = LOWER(gap.domain)
Partial domain overlap (priority=25-30): substring containment in either direction
Text rank gate: ts_rank(hypothesis.search_vector, plainto_tsquery('english', gap.title)) >= 0.05The text rank gate (hypothesis search_vector vs gap title) prevents broad-domain false positives
(e.g. a "gut microbiome/NLRP3" hypothesis resolving an "APOE4 lipid metabolism" gap).
Thresholds:
| Parameter | Value | Rationale |
|---|
MIN_COMPOSITE_SCORE | 0.7 | Top-tier hypotheses only |
MIN_EVIDENCE_COUNT | 2 | Multiple PubMed citations required |
MIN_TEXT_RANK_FOR_DOMAIN_MATCH | 0.05 | Non-trivial topical overlap required |
Resolution action per gap
knowledge_gaps.status = 'resolved', quality_status = 'resolved'
knowledge_gaps.evidence_summary — structured text with hypothesis_id, score, PMIDs
knowledge_edges row: gap -[resolved_by]-> hypothesis (evidence_strength = composite_score)
events row: event_type='gap_resolved' via event_bus.publish()Results (2026-04-28)
| Metric | Before | After |
|---|
| Total gaps | 3,545 | 3,545 |
| Resolved gaps | 12 | 299 |
| Resolution rate | 0.34% | 8.4% |
| Open gaps | 3,533 | 3,048 |
| KG edges (resolved_by) | 0 | 287 |
Work Log
Created 2026-04-28 by task generator cycle 2
3,545 gaps, 12 resolved = 0.34% resolution rate. The resolution loop has never been
implemented. Building it would turn the gap system from noise into progress signal.
2026-04-28 — Initial Implementation (task:31eeae8d-40b3-41c4-9032-ea028239662a)
Agent: claude-sonnet-4-6 (commit bac57b22e)
Investigated knowledge_gaps (3,545 total, 12 resolved), hypotheses (1,886 total, 346 meeting
score ≥ 0.7 + evidence ≥ 2), and debate_sessions (590 linked to analyses) tables
Designed two-phase SQL query: pre-filter qualifying hypotheses, then join gaps by domain match
Added text rank gate (ts_rank(hypothesis.search_vector, plainto_tsquery(gap.title)) >= 0.05)
to prevent broad domain matches from creating false positives
Ran initial batch, identified 23 false positives (text_rank < 0.05), reverted, re-ran
Final: 287 new gaps resolved with text rank validation; total = 299 (8.4% resolution rate)Files:
scidex/agora/gap_resolution_engine.py — new module
docs/planning/specs/quest_agora_gap_resolution_engine_spec.md — this file
2026-04-28 — Atlas closure pass (task:f4f7b129-0f43-4c84-abd8-20d4e701842d)
Agent: codex
Staleness review found the original 12-resolved baseline was partly addressed
by the initial implementation, but the live DB still had only 299
resolved gaps and 2,866
open gaps.
Added scidex/atlas/gap_closure_pipeline.py, a bounded closure pass that
matches open gaps against accumulated hypotheses, debate sessions, and paper
full-text vectors using direct analysis links plus conservative keyword
scoring. The current schema has no
resolution_summary column, so the
pipeline writes structured resolution metadata into
evidence_summary.
Dry-run result: 230 resolvable gaps, 1 partially addressed gap, 719 skipped
because they lacked specific text or hypothesis coverage.
Production run result: 230 gaps moved from open to resolved, 1 moved to
partially_addressed, and 230
gap_resolution KG edges inserted with
relation='resolved_by'. A follow-up repair aligned
quality_status on
all 231 task-touched rows.
Final live status after this pass: 529 resolved, 308
partially_addressed, and 2,635
open gaps. Resolved-gap rate is now
approximately 14.9% of 3,545 total gaps.
Files:
scidex/atlas/gap_closure_pipeline.py — reusable Atlas closure pipeline
scidex/senate/quality_checks.py — recognize the existing resolved_by
resolution relation in KG edge quality checks
docs/planning/specs/quest_agora_gap_resolution_engine_spec.md — this log