[Atlas] CI: Verify experiment extraction quality metrics and extract from new papers blocked analysis:6 coding:7 reasoning:6 safety:6

← Experiment Extraction
Check experiment extraction quality: (1) count papers with structured experiments (goal: >500), (2) check field completeness ratio (goal: >80%), (3) check KG entity link precision via sampling, (4) run extraction pipeline on papers added since last run that have 0 experiment records, (5) generate summary report. See: experiment_extractor.py, ci_notebook_coverage.py pattern for CI style. Spec: docs/planning/specs/quest_experiment_extraction_spec.md

Completion Notes

Auto-release: recurring task had no work this cycle

Git Commits (2)

[Atlas] Improve experiment extraction CI targeting and worktree DB access [task:8f7dc2dc-829a-41e0-8824-c5c872cde977]2026-04-09
[Atlas] CI: experiment extraction quality check + incremental pipeline [task:8f7dc2dc-829a-41e0-8824-c5c872cde977]2026-04-06
Spec File

[Atlas] CI: Verify experiment extraction quality metrics and extract from new papers

> ## Continuous-process anchor
>
> This spec describes an instance of one of the retired-script themes
> documented in docs/design/retired_scripts_patterns.md. Before
> implementing, read:
>
> 1. The "Design principles for continuous processes" section of that
> atlas — every principle is load-bearing. In particular:
> - LLMs for semantic judgment; rules for syntactic validation.
> - Gap-predicate driven, not calendar-driven.
> - Idempotent + version-stamped + observable.
> - No hardcoded entity lists, keyword lists, or canonical-name tables.
> - Three surfaces: FastAPI + orchestra + MCP.
> - Progressive improvement via outcome-feedback loop.
> 2. The theme entry in the atlas matching this task's capability:
> AG2 (pick the closest from Atlas A1–A7, Agora AG1–AG5,
> Exchange EX1–EX4, Forge F1–F2, Senate S1–S8, Cross-cutting X1–X2).
> 3. If the theme is not yet rebuilt as a continuous process, follow
> docs/planning/specs/rebuild_theme_template_spec.md to scaffold it
> BEFORE doing the per-instance work.
>
> **Specific scripts named below in this spec are retired and must not
> be rebuilt as one-offs.** Implement (or extend) the corresponding
> continuous process instead.

ID: 8f7dc2dc-829 Priority: 92 Frequency: weekly Status: open

Goal

Check experiment extraction quality: (1) count papers with structured experiments (goal: >500), (2) check field completeness ratio (goal: >80%), (3) check KG entity link precision via sampling, (4) run extraction pipeline on papers added since last run that have 0 experiment records, (5) generate summary report. See: experiment_extractor.py, ci_notebook_coverage.py pattern for CI style. Spec: docs/planning/specs/quest_experiment_extraction_spec.md

This recurring check should explicitly suppress chaff. Candidate selection and
reporting should distinguish empirical studies from reviews/editorials, so the
system spends extraction effort on papers that can actually ground analyses
and mechanistic claims.

Acceptance Criteria

☑ Concrete deliverables created
☑ Work log updated with timestamped entry

Work Log

2026-04-06 — Initial CI run (task:8f7dc2dc-829a-41e0-8824-c5c872cde977)

Created ci_experiment_extraction.py following the ci_notebook_coverage.py pattern.

Metrics from this run:

CheckResultGoalStatus
Papers with experiments26≥ 500❌ Far below target
Field completeness95.1% avg≥ 80%
KG entity precision94.7% (71/75 entities)≥ 90%
Notes:
  • 14,819 papers have abstracts but no extracted experiments.
  • Extraction batch of 5 papers ran but yielded 0 experiments — the top-cited unprocessed papers appear to be review articles (expected; extractor only extracts from empirical papers with clear methods/results).
  • Multi-entity string issue: 13 artifacts store multiple genes as "SQSTM1, CALCOCO2" in a single array element instead of separate elements. These still resolve correctly after comma-splitting.
  • disease field completeness is 74.8% (below 80%) but all other fields are ≥ 82%, bringing the average to 95.1%.
  • Report saved to: docs/code_health/experiment_extraction_ci_2026-04-06.md

2026-04-09 20:38 PDT — Slot 20

  • Re-ran the experiment extraction CI in a worktree after reviewing ci_experiment_extraction.py, experiment_extractor.py, and the quest spec.
  • Identified the main selection problem: the batch extractor was ordering unprocessed papers by citations only, which kept surfacing reviews and guidelines instead of empirical studies.
  • Added empirical-study ranking heuristics in experiment_extractor.py so candidate selection now boosts method/result-style abstracts and penalizes review-like titles and journals.
  • Extended ci_experiment_extraction.py reporting to show selected candidate papers and post-run coverage so the report reflects the actual state after extraction.
  • Found and fixed a worktree-specific DB bug in artifact_registry.py: its default DB_PATH was relative (PostgreSQL), which created/used an empty local DB in worktrees and caused no such table: artifacts during registration.
  • Validated the fix with live extraction runs:
- ci_experiment_extraction.py --limit 2 registered 11 experiments across PMIDs 41268978 and 40693377
- ci_experiment_extraction.py --limit 1 registered 2 experiments for PMID 40446574
- ci_experiment_extraction.py --limit 1 registered 5 experiments for PMID 40020261
  • Final report: docs/code_health/experiment_extraction_ci_2026-04-10.md
  • Post-run metrics from the latest validation:
- Papers with extracted experiments: 30
- Field completeness: 95.5%
- KG entity precision: 94.7%
  • Result: recurring CI now selects experiment-like papers and successfully registers extracted experiments from an Orchestra worktree.

Payload JSON
{
  "requirements": {
    "coding": 7,
    "reasoning": 6,
    "analysis": 6,
    "safety": 6
  }
}

Sibling Tasks in Quest (Experiment Extraction) ↗