Effort: thorough
We can list the 310+ hypotheses but we cannot show what fraction of the
neurodegeneration research space they cover. compute_diversity_score in
gap_pipeline.py:61 gives a per-gap scalar — useful, but not a map.
Build a 2-D coverage map: project every hypothesis embedding to a fixed
2-D UMAP grid; overlay density of (a) existing hypotheses (orange =
crowded, white = untouched), (b) papers in papers table (blue, the prior
literature density), (c) Senate gaps (red dots, where we think there's
work). The white-on-blue regions — densely-cited literature with zero
SciDEX hypotheses — are the highest-leverage targets for new theorising.
scidex/atlas/coverage_map.py::build_map(snapshot_id) -> dict projects all hypotheses + papers via shared UMAP fit (parametric so future points re-project to same coords).coverage_map.compute_density_grid(points, grid=128) returns a 128×128 KDE density.coverage_map.find_underexplored_regions() -> list[Region] — regions with paper_density > 70th_percentile AND hypothesis_density < 10th_percentile.coverage_snapshots(id PK, built_at, umap_params_json, hypothesis_count, paper_count); coverage_underexplored(snapshot_id, region_id, centroid_xy, paper_count, hypothesis_count, top_terms_json).scidex/senate/scheduled_tasks.py rebuilds the snapshot Sundays 04:00 UTC (UMAP fit pinned via random_state=42)./atlas/coverage page renders the map (SVG, three layers as toggleable overlays) + a top-10 underexplored regions list with their representative top terms (TF-IDF on titles).GET /api/atlas/coverage/{snapshot_id} returns the regions JSON.paper_cluster, auto-emit a Senate coverage_gap proposal with the regions as evidence and a suggested Theorist task to draft hypotheses for them.scidex/agora/gap_pipeline.py:61 to mirror conventions.scidex/core/embeddings.py; parametric UMAP via umap-learn skill (parametric=True so future hypotheses re-project without refitting).scipy.stats.gaussian_kde; clip extreme outliers (>3σ).market_dynamics.generate_market_overview_svg — no front-end chart library.scidex/core/embeddings.py.scidex/agora/gap_pipeline.py:61 compute_diversity_score.umap-learn, scikit-learn (TF-IDF).q-hdiv-anti-mode-collapse-penalty (uses underexplored regions as a reward target).Files created/modified:
scidex/atlas/coverage_map.py — Core module: build_map, compute_density_grid, find_underexplored_regions, render_coverage_svg, _find_regions_in_grid, _maybe_emit_coverage_gap_proposalsmigrations/20260427_coverage_map.sql — Creates coverage_snapshots and coverage_underexplored tables (run against PG)scidex/senate/scheduled_tasks.py — Added coverage-map-weekly task (interval 10080 min)api.py — Added GET /api/atlas/coverage/{snapshot_id}, GET /api/atlas/coverage (latest), POST /api/atlas/coverage/build, GET /atlas/coverage HTML pagetests/atlas/test_coverage_map.py — 7 unit/integration tests; all passrandom_state=42 and saved to MODELS_DIR/coverage_map/umap_{snap_id}.pkl for re-projection of future pointsgovernance_rule type (within existing CHECK constraint) with coverage_gap_emission: true in metadataRebased onto origin/main (commit d67a47daa); resolved conflict in scheduled_tasks.py by
preserving both the upstream arbitrage-scanner task and the new coverage-map-weekly task.
All 7 tests pass post-rebase. Branch pushed as 5165d9e4a.