[Agora] Generate structured research briefs for top 25 hypotheses — actionable scientific synthesis reports

← All Specs

Goal

SciDEX has 77 hypotheses with composite_score ≥ 0.8 and 1,878 total hypotheses, but these
ideas exist only as database rows with partial scores and fragmentary evidence. They lack the structured scientific output that would make them actionable to researchers, funders, or
drug discovery teams.

This task generates structured research briefs for the top 25 hypotheses (by composite_score)
in a format that demonstrates SciDEX's core value: synthesizing debates, evidence, and KG
connections into actionable scientific intelligence.

Output format per hypothesis

Each research brief is a structured analysis artifact (~1,500-2,500 words) covering:

  • Executive summary (3-4 sentences): the core claim and why it matters for neurodegeneration
  • Mechanistic model: detailed A→B→C causal chain supported by evidence
  • Evidence synthesis: structured summary of evidence_for and evidence_against with
  • PMID citations evaluated for recency, quality, and consensus
  • Debate synthesis: summary of key points from all debate sessions for this hypothesis
  • Falsifiable predictions: 3-5 specific testable predictions with experimental designs
  • Therapeutic angles: druggable targets, existing compounds, estimated clinical relevance
  • Confidence assessment: 8-dimensional score explanation (mechanistic_plausibility,
  • druggability, safety, data_availability, reproducibility, etc.)
  • Open questions: what remains unresolved (feeds back to knowledge_gaps)
  • Suggested next experiments: ranked by cost/feasibility/impact
  • What to do

  • Query top 25 hypotheses by composite_score (all must have composite_score ≥ 0.7)
  • For each hypothesis:
  • a. Collect all evidence_for/evidence_against entries, look up PMIDs
    b. Find debate sessions linked to the hypothesis's analysis_id
    c. Query KG edges involving the hypothesis target_gene/pathway
    d. Search PubMed for recent evidence using paper_cache.search_papers()
    e. Generate the research brief via Claude API call with the collected context
    f. Store the brief as an analyses artifact with:
    - analysis_type = 'research_brief'
    - hypothesis_id in metadata
    - Full markdown content
    - Source citations list
  • Update hypotheses table with a research_brief_url or similar pointer
  • Why this is ambitious

    • Research briefs are SciDEX's primary deliverable to researchers — the "so what" of the platform
    • A collection of 25 high-quality research briefs would demonstrate SciDEX's scientific value
    proposition better than any system metric
    • Each brief closes the loop between: paper ingestion → debate → hypothesis → actionable output
    • Briefs feed back into the prediction markets (priced against experimental outcomes)

    Acceptance criteria

    ☐ 25 research briefs generated, each covering all 9 sections
    ☐ All PMIDs in briefs are real and verified
    ☐ Briefs stored as analysis artifacts in the DB
    ☐ Each brief references at least 2 debate sessions and 3 PubMed citations
    ☐ Hypothesis table updated with brief references
    ☐ Briefs accessible via the API (at minimum as /api/analyses/<id>)

    What NOT to do

    • Do NOT generate briefs with hallucinated citations
    • Do NOT create briefs shorter than 1,000 words (length indicates substance)
    • Do NOT include hypotheses with composite_score < 0.7

    Dependencies

    • paper_cache.search_papers() and paper_cache.get_paper() for citation lookup
    • scidex.core.database.get_db() for DB writes
    • Debate sessions in debate_sessions.transcript_json
    • KG edges for mechanistic context

    Work Log

    Created 2026-04-28 by task generator cycle 2

    77 hypotheses at composite_score > 0.8 exist as database rows without structured scientific
    output. Research briefs are the platform's core deliverable. Task generator cycle 2 identified
    this as the highest-leverage scientific-output gap.

    2026-04-28 iteration 1 plan — task 33dca458-3177-4621-ad53-0a5d07c885c5

    Staleness review found the task is still valid: PostgreSQL currently has hundreds of
    hypotheses above the 0.7 score threshold and no existing analyses whose metadata marks analysis_type=research_brief. This iteration will add a reusable batch generator and run
    it for an initial top-hypothesis cohort. Because the current hypotheses table has no
    dedicated brief-pointer column and migrations are protected for this task, each generated
    brief will be linked through the new analyses row metadata (hypothesis_id, analysis_type=research_brief) and through the governed artifacts row entity_ids
    field rather than changing schema.

    2026-04-28 iteration 1 result — task 33dca458-3177-4621-ad53-0a5d07c885c5

    Added scripts/generate_hypothesis_research_briefs.py and generated the first five
    structured research briefs for high-scoring hypotheses with at least two linked debate
    sessions. Each brief is 2,057-2,265 words, covers all nine required sections, includes
    13-17 real PMID identifiers, records citation-alignment risk for weak evidence-PMID
    matches, cites four linked debate sessions, and has a corresponding PostgreSQL analyses
    row with metadata.analysis_type=research_brief. The five analysis IDs are: SRB-2026-04-28-h-var-b7e4505525, SRB-2026-04-28-h-var-e2b5a7e7db, SRB-2026-04-28-h-var-e95d2d1d86, SRB-2026-04-28-h-bdbd2120, and SRB-2026-04-28-h-var-a4975bdd96.

    2026-04-28 continuation — task 33dca458-3177-4621-ad53-0a5d07c885c5

    Revalidated the five generated briefs and registered them into PostgreSQL as both analyses rows and artifacts rows with artifact_type='analysis', entity links,
    and /api/analyses/<id> report URLs. Updated the registrar so PMID validation uses
    the existing paper cache without network fetches, avoiding sandbox-side paper-cache
    mutations and failed submodule auto-commit attempts. Verification query found 5 research_brief analyses, minimum 2,076 words, 13-17 verified PMIDs, four linked
    debate sessions per brief, and all nine required sections in each committed brief.

    File: quest_agora_hypothesis_research_briefs_spec.md
    Modified: 2026-04-28 18:12
    Size: 6.2 KB