[Exchange] Enrich experiment descriptions — add protocols and expected outcomes

← All Specs

Goal

Bring thin neurodegeneration-relevant experiment records up to a usable Exchange quality bar by expanding protocol text, expected outcomes, and success criteria, and by attaching missing parent hypothesis links where the scientific connection is defensible. The task should improve what users see on /experiments and experiment detail pages without changing the page code itself. The current repository state differs from the original task framing, so the work will target the current PostgreSQL-backed experiment corpus rather than the old 188-row snapshot.

Acceptance Criteria

☑ Current state audited for stale assumptions and remaining thin experiment records
☑ PostgreSQL-safe enrichment script added under scripts/
☑ A targeted batch of neurodegeneration-relevant experiments updated with richer protocols, outcomes, and criteria
☑ Missing hypothesis links added where the parent relationship is supported by target, disease, or title context
☑ Updated records verified via SQL and at least one page/API rendering check

Approach

  • Audit current experiment counts and identify the highest-value incomplete neurodegeneration records.
  • Add a current enrichment script that works against scidex.core.database.get_db() and the repo-root llm.py abstraction.
  • Run a small batch, inspect generated content quality, then continue with a larger targeted pass.
  • Verify the updated records in PostgreSQL and confirm that experiment pages still render successfully.
  • Dependencies

    • None

    Dependents

    • None known

    Work Log

    2026-04-25 23:29 PDT — Codex

    • Performed staleness review against current main (28f900c8f) and current DB state.
    • Confirmed the task is still necessary: the experiment corpus is now 632 rows, with 315 thin protocols and 165 missing hypothesis links; prior task commit a8007ddc1 only enriched the top 5 records.
    • Found the archived helper scripts/archive/oneoff_scripts/enrich_experiment_descriptions.py is not usable on current HEAD: it assumes SQLite-era import/layout and fails immediately with ModuleNotFoundError: No module named 'llm'.
    • Decided to create a new PostgreSQL-safe enrichment script under scripts/ and target neurodegeneration-relevant experiments where the work is most aligned with SciDEX's mission and visible on Exchange pages.

    2026-04-26 00:07 PDT — Codex

    • Added scripts/enrich_experiment_descriptions.py, a PostgreSQL-safe enrichment utility with neurodegeneration filtering, current DB access, JSON parsing guards, and deterministic fallback text generation.
    • Applied a targeted live batch update to 12 top incomplete neurodegeneration experiments using the deterministic fallback path plus heuristic parent-hypothesis linking.
    • Post-update counts for neurodegeneration experiments improved from 62 → 44 thin protocols, 55 → 39 thin expected-outcomes fields, 59 → 38 thin success-criteria fields, and 10 → 7 missing hypothesis-link sets.
    • Verified sample updated records in PostgreSQL and confirmed /experiment/exp-e9c371ae-4aea-46c9-b21f-946ea6c42bd7 and /experiment/exp-6815b60b-6325-4c88-b293-ef6936222780 both return HTTP 200 and render Protocol / Expected Outcomes / Success Criteria sections.

    2026-04-26 08:12 PDT — Codex

    • Re-verified the current live neurodegeneration subset with python3 scripts/enrich_experiment_descriptions.py --stats: 286 neurodegeneration-relevant experiments remain, with 44 thin protocols, 39 thin expected-outcomes fields, 38 thin success-criteria fields, and 7 missing hypothesis-link sets after the applied batch.
    • Re-checked representative updated records directly in PostgreSQL: exp-e9c371ae-4aea-46c9-b21f-946ea6c42bd7 now has protocol/outcome/criteria lengths 1026/450/447 and five linked hypotheses; exp-6815b60b-6325-4c88-b293-ef6936222780 now has lengths 414/527/466 and three linked hypotheses.
    • Re-verified Exchange rendering with HTTP 200 responses for both experiment detail pages and confirmed the rendered HTML still includes Protocol, Expected Outcomes, and Success Criteria sections.

    2026-04-26 09:03 PDT — Codex

    • Patched scripts/enrich_experiment_descriptions.py to support --skip-llm deterministic execution and heuristic parent-hypothesis scoring so the batch can complete even when LLM providers are rate-limited.
    • Applied an additional live batch update to 20 neurodegeneration-relevant experiments, including several rich-but-unlinked records such as the TRIM21 stress-granule and autophagy-receptor experiments.
    • Post-update neurodegeneration stats improved from 44/39/38/7 thin-protocol/thin-outcomes/thin-criteria/missing-link counts to 27/25/23/3.
    • Re-verified rendered Exchange pages for exp-a3090cf0-854f-45dc-8ef7-06cf9a6bb754 and exp-b6c4a13e-3b7b-41fa-9e4d-eacfb11bb61f; both return HTTP 200 and include Protocol, Expected Outcomes, and Success Criteria sections.

    File: 81a417e3_exchange_enrich_experiment_descriptions_spec.md
    Modified: 2026-04-26 01:25
    Size: 5.0 KB