SciDEX has 77 hypotheses with composite_score ≥ 0.8 and 1,878 total hypotheses, but these
ideas exist only as database rows with partial scores and fragmentary evidence. They lack the
structured scientific output that would make them actionable to researchers, funders, or
drug discovery teams.
This task generates structured research briefs for the top 25 hypotheses (by composite_score)
in a format that demonstrates SciDEX's core value: synthesizing debates, evidence, and KG
connections into actionable scientific intelligence.
Each research brief is a structured analysis artifact (~1,500-2,500 words) covering:
evidence_for and evidence_against withpaper_cache.search_papers()analyses artifact with:analysis_type = 'research_brief'hypothesis_id in metadatahypotheses table with a research_brief_url or similar pointer/api/analyses/<id>)paper_cache.search_papers() and paper_cache.get_paper() for citation lookupscidex.core.database.get_db() for DB writesdebate_sessions.transcript_json77 hypotheses at composite_score > 0.8 exist as database rows without structured scientific
output. Research briefs are the platform's core deliverable. Task generator cycle 2 identified
this as the highest-leverage scientific-output gap.
Staleness review found the task is still valid: PostgreSQL currently has hundreds of
hypotheses above the 0.7 score threshold and no existing analyses whose metadata marks
analysis_type=research_brief. This iteration will add a reusable batch generator and run
it for an initial top-hypothesis cohort. Because the current hypotheses table has no
dedicated brief-pointer column and migrations are protected for this task, each generated
brief will be linked through the new analyses row metadata (hypothesis_id,
analysis_type=research_brief) and through the governed artifacts row entity_ids
field rather than changing schema.
Added scripts/generate_hypothesis_research_briefs.py and generated the first five
structured research briefs for high-scoring hypotheses with at least two linked debate
sessions. Each brief is 2,057-2,265 words, covers all nine required sections, includes
13-17 real PMID identifiers, records citation-alignment risk for weak evidence-PMID
matches, cites four linked debate sessions, and has a corresponding PostgreSQL analyses
row with metadata.analysis_type=research_brief. The five analysis IDs are:
SRB-2026-04-28-h-var-b7e4505525, SRB-2026-04-28-h-var-e2b5a7e7db,
SRB-2026-04-28-h-var-e95d2d1d86, SRB-2026-04-28-h-bdbd2120, and
SRB-2026-04-28-h-var-a4975bdd96.
Revalidated the five generated briefs and registered them into PostgreSQL as both
analyses rows and artifacts rows with artifact_type='analysis', entity links,
and /api/analyses/<id> report URLs. Updated the registrar so PMID validation uses
the existing paper cache without network fetches, avoiding sandbox-side paper-cache
mutations and failed submodule auto-commit attempts. Verification query found 5
research_brief analyses, minimum 2,076 words, 13-17 verified PMIDs, four linked
debate sessions per brief, and all nine required sections in each committed brief.