[Atlas] Extraction quality scoring and confidence calibration running analysis:5

← Experiment Extraction
Quality scoring for extracted experiments: completeness, statistical rigor, consistency, calibrated confidence

Completion Notes

Changed files: - api.py - docs/planning/specs/85f3ccfa-539d-4c45-bee2-690f36715fa6_spec.md - docs/planning/specs/atl-ex-04-QUAL_extraction_quality_scoring_spec.md - docs/planning/specs/atl-ex-06-META_meta_analysis_support_spec.md - scidex/agora/extraction_quality.py - scidex/atlas/meta_analysis.py - scripts/generate_wiki_content.py Diff stat: api.py | 16 + .../85f3ccfa-539d-4c45-bee2-690f36715fa6_spec.md | 23 - ...l-ex-04-QUAL_extraction_quality_scoring_spec.md | 30 + .../atl-ex-06-META_meta_analysis_support_spec.md | 22 +- scidex/agora/extraction_quality.py | 791 +++++++++++++++ scidex/atlas/meta_analysis.py | 569 ----------- scripts/generate_wiki_content.py | 1004 ++++++++------------ 7 files changed, 1214 insertions(+), 1241 deletions(-)

Last Error

Review gate REVISE attempt 1/10: Auto-deploy blocked: Push failed: [pre-push] BLOCKED: commit range touches critical file 'api.py' but no commit
[pre-push]          message in the range mentions it. This pattern is how
[pre-push]          the 2026-04-11 stale-worktre

Git Commits (18)

[Atlas] API endpoints for experiment browsing, search, and filtering [task:atl-ex-08-API]2026-04-25
[Atlas] Update spec work log for extraction quality scoring [task:atl-ex-04-QUAL]2026-04-25
[Atlas] Extraction quality scoring and confidence calibration [task:atl-ex-04-QUAL]2026-04-25
Squash merge: orchestra/task/atl-ex-0-meta-analysis-support-aggregate-results (2 commits)2026-04-25
[Atlas] Update spec work log for extraction quality scoring [task:atl-ex-04-QUAL]2026-04-25
[Verify] Meta-analysis spec verified and updated — all criteria implemented [task:atl-ex-06-META]2026-04-25
[Atlas] Extraction quality scoring and confidence calibration [task:atl-ex-04-QUAL]2026-04-25
[Atlas] Add meta-analysis module with pooled effect sizes and heterogeneity [task:atl-ex-06-META]2026-04-25
[Atlas] Replication tracking: clustering module + /api/experiments/replication/{entity} [task:atl-ex-05-REPL]2026-04-25
[Atlas] Replication tracking: clustering module + /api/experiments/replication/{entity} [task:atl-ex-05-REPL]2026-04-25
Squash merge: orchestra/task/atl-ex-0-build-llm-extraction-pipeline-from-paper (2 commits)2026-04-15
[Atlas] Update spec work log for atl-ex-02-PIPE [task:atl-ex-02-PIPE]2026-04-15
[Atlas] Improve experiment extraction pipeline with schema-aligned prompts, graceful missing data, and quota-aware rate limiting2026-04-15
Squash merge: orchestra/task/atl-ex-0-backfill-188-existing-experiment-artifac (1 commits)2026-04-15
[Atlas] Backfill 188 experiment artifacts with structured metadata [task:atl-ex-07-BKFL]2026-04-15
[Atlas] Auto-link extracted experiments to KG entities [task:atl-ex-03-LINK]2026-04-13
[Docs] Update atl-ex-01-SCHM work log: implementation complete [task:atl-ex-01-SCHM]2026-04-13
[Atlas] Add experiment extraction constants and validate_experiment_metadata() [task:atl-ex-01-SCHM]2026-04-13
Spec File

Goal

Build a quality scoring system for extracted experiments that assesses completeness,
consistency, and extraction confidence. Calibrate confidence scores so that experiments
marked 0.9 confidence are correct 90% of the time (well-calibrated).

Acceptance Criteria

☐ Quality score computed from: field completeness, statistical rigor, internal consistency
☐ Completeness score: fraction of schema fields populated (weighted by importance)
☐ Statistical rigor: presence of p-values, confidence intervals, effect sizes, sample sizes
☐ Consistency: results match conclusions, effect direction matches measurements
☐ Confidence calibration: sample-verify 50 extractions, plot calibration curve
☐ Quality feeds into artifact quality_score via propagate_quality()
☐ Low-confidence extractions flagged for human review or re-extraction
☐ API: quality distribution dashboard showing extraction health

Dependencies

  • atl-ex-02-PIPE — Extraction pipeline produces the experiments to score
  • atl-ex-03-LINK — Entity linking quality is a scoring factor

Dependents

  • atl-ex-05-REPL — Replication tracking needs quality-filtered experiments

Work Log

Payload JSON
{
  "requirements": {
    "analysis": 5
  },
  "_gate_retry_count": 1,
  "_gate_last_decision": "REVISE",
  "_gate_last_reason": "Auto-deploy blocked: Push failed: [pre-push] BLOCKED: commit range touches critical file 'api.py' but no commit\n[pre-push]          message in the range mentions it. This pattern is how\n[pre-push]          the 2026-04-11 stale-worktre",
  "_gate_branch": "orchestra/task/atl-ex-0-extraction-quality-scoring-and-confidenc",
  "_gate_changed_files": [
    "api.py",
    "docs/planning/specs/85f3ccfa-539d-4c45-bee2-690f36715fa6_spec.md",
    "docs/planning/specs/atl-ex-04-QUAL_extraction_quality_scoring_spec.md",
    "docs/planning/specs/atl-ex-06-META_meta_analysis_support_spec.md",
    "scidex/agora/extraction_quality.py",
    "scidex/atlas/meta_analysis.py",
    "scripts/generate_wiki_content.py"
  ],
  "_gate_diff_stat": "api.py                                             |   16 +\n .../85f3ccfa-539d-4c45-bee2-690f36715fa6_spec.md   |   23 -\n ...l-ex-04-QUAL_extraction_quality_scoring_spec.md |   30 +\n .../atl-ex-06-META_meta_analysis_support_spec.md   |   22 +-\n scidex/agora/extraction_quality.py                 |  791 +++++++++++++++\n scidex/atlas/meta_analysis.py                      |  569 -----------\n scripts/generate_wiki_content.py                   | 1004 ++++++++------------\n 7 files changed, 1214 insertions(+), 1241 deletions(-)",
  "_gate_history": [
    {
      "ts": "2026-04-26 06:33:20",
      "decision": "REVISE",
      "reason": "Auto-deploy blocked: Push failed: [pre-push] BLOCKED: commit range touches critical file 'api.py' but no commit\n[pre-push]          message in the range mentions it. This pattern is how\n[pre-push]          the 2026-04-11 stale-worktre",
      "instructions": "",
      "judge_used": "",
      "actor": "minimax:77",
      "retry_count": 1
    }
  ]
}

Sibling Tasks in Quest (Experiment Extraction) ↗