SciDEX — Task: [Atlas] Replication tracking

Cluster experiments by target/relation/direction, track replication status, flag conflicts

Completion Notes

Auto-release: work already on origin/main

Git Commits (13)

[Docs] Spec the shipped experiment-extras work + capture deferred ideas + fix stale handler refs [task:experiment-extras-docs-2026-05-18] (#1419)2026-05-18

[Docs] Spec the shipped experiment-extras work + capture deferred ideas + fix stale handler refs [task:experiment-extras-docs-2026-05-18]2026-05-18

Squash merge: orchestra/task/atl-ex-0-api-endpoints-for-experiment-browsing-se (7 commits)2026-04-26

Squash merge: atlas/atl-ex-04-QUAL-push (2 commits)2026-04-26

[Atlas] Update spec work log for extraction quality scoring [task:atl-ex-04-QUAL]2026-04-25

[Atlas] Extraction quality scoring and confidence calibration [task:atl-ex-04-QUAL]2026-04-25

Squash merge: orchestra/task/atl-ex-0-meta-analysis-support-aggregate-results (2 commits)2026-04-25

[Atlas] Replication tracking: clustering module + /api/experiments/replication/{entity} [task:atl-ex-05-REPL]2026-04-25

Squash merge: orchestra/task/atl-ex-0-build-llm-extraction-pipeline-from-paper (2 commits)2026-04-15

Squash merge: orchestra/task/atl-ex-0-backfill-188-existing-experiment-artifac (1 commits)2026-04-15

[Atlas] Auto-link extracted experiments to KG entities [task:atl-ex-03-LINK]2026-04-13

[Docs] Update atl-ex-01-SCHM work log: implementation complete [task:atl-ex-01-SCHM]2026-04-13

[Atlas] Add experiment extraction constants and validate_experiment_metadata() [task:atl-ex-01-SCHM]2026-04-13

Spec File

Goal

Identify when multiple experiments test the same claim and track whether results replicate,
conflict, or are inconclusive. This is critical for evidence quality — a replicated finding
has much higher trust than a single observation.

Acceptance Criteria

☑ Clustering algorithm that groups experiments by: target entity + relation + direction

☑ replication_groups table or metadata linking replicated experiments

☑ Replication status per group: replicated, conflicting, single_study, inconclusive

☑ Effect size comparison across replication group (forest plot data)

☑ Replication status feeds back into experiment quality scores (replicated = higher trust)

☑ API: GET /api/experiments/replication/{entity} — show replication landscape

☑ Conflicting results flagged as high-priority debate targets

Dependencies

atl-ex-04-QUAL — Need quality-filtered experiments for meaningful comparison

Dependents

atl-ex-06-META — Meta-analysis builds on replication groups
q-artifact-debates — Conflicting experiments are prime debate targets

Work Log

2026-04-25 — Implementation

State at start: replication_groups table (80 rows), replication_clusters/replication_cluster_members tables (empty), 228/632 experiments had replication_group_id set (all via a one-off 2026-04-17 batch). No Python module for clustering, no API endpoint, no effect-size data.

Implemented:

scidex/atlas/replication_clustering.py — new module with:

- cluster_experiments(db, entity_filter=None): groups experiments by normalised target_gene + inferred relation_type (from experiment_type) + inferred direction (from title/primary_outcome keyword matching). Upserts replication_groups rows with experiment_ids (JSONB), replication_status, agreement_score, and effect_sizes (forest-plot data: per-experiment effect_estimate, 95% CI, weight). Applies a composite_score nudge: +0.05 for replicated, −0.05 for conflicting, +0.01 for inconclusive.
- get_replication_landscape(entity, db): fetches all replication groups matching an entity, annotates conflict flag, returns effect_sizes and slim experiment details. Falls back to per-experiment listing when no groups exist.

api.py — added GET /api/experiments/replication/{entity} endpoint (after the existing priority-queue route). Accepts optional ?refresh=true to re-cluster on demand. Returns entity, groups, conflict_count, replicated_count, single_study_count, experiment_count, summary.

Conflict flagging already handled by scidex/agora/debate_trigger.py rule 4 (_rule4_conflicting_replication) — experiments with replication_status='conflicting' are queued as high-priority debate targets.

Verified:

get_replication_landscape('TREM2') returns 3 groups, 38 experiments, effect_sizes populated.
cluster_experiments(db, entity_filter='TREM2') writes 1 group, updates 19 experiments, composite_score nudged +0.05.
python3 -m py_compile api.py passes.

Sibling Tasks in Quest (Experiment Extraction) ↗

○[Atlas] CI: Verify experiment extraction quality metrics and extract from new papersP88

✓[Atlas] Define experiment extraction schemas per experiment typeP93

✓[Atlas] Auto-link extracted experiments to KG entitiesP93

✓[Atlas] Backfill 188 existing experiment artifacts with structured metadataP93

✓[Atlas] Build LLM extraction pipeline from paper abstracts and full textP92

✓[Atlas] Extraction quality scoring and confidence calibrationP88

✓[Atlas] API endpoints for experiment browsing, search, and filteringP87

✓[Atlas] Meta-analysis support — aggregate results across experimentsP84

[Atlas] Replication tracking — match experiments testing same hypothesis done