Implement a paper_review_workflow tool that: (1) ingests a paper by PMID/DOI, (2) extracts named entities (genes, proteins, diseases, pathways, phenotypes, brain regions, cell types, drugs) via LLM, (3) cross-references each entity against the SciDEX knowledge graph (KG) to find existing edges and match strength, (4) finds related hypotheses and knowledge gaps, (5) produces a structured review summary. Results are stored in the paper_reviews table.
The paper_reviews table already exists in PostgreSQL with the right schema:
id, paper_id, pmid, doi — paper identificationextracted_entities — JSON dict of entity type → list of names (from LLM extraction)kg_matches — JSON dict of entity → KG edge count (how well-connected each entity is)related_hypotheses — JSON list of {id, title, composite_score} for related hypothesesrelated_gaps — JSON list of {gap_id, title, priority_score} for related gapsnovel_findings — JSON list of novel entity findingsreview_summary — human-readable review of paper's contribution to SciDEXpaper_review_workflow tool function in scidex/forge/tools.py@log_tool_call
def paper_review_workflow(identifier: str) -> dict:
"""
Run the full paper review pipeline:
1. Fetch paper metadata (via paper_cache.get_paper)
2. Extract entities via LLM from title+abstract
3. Cross-reference entities against KG (knowledge_edges table)
4. Find related hypotheses (by entity/gene match)
5. Find related knowledge gaps (by entity match)
6. Identify novel findings (entities with 0 KG edges)
7. Generate structured review summary via LLM
8. Write to paper_reviews table
Args:
identifier: PMID or DOI of the paper to review
Returns: dict with review_id, extracted_entities, kg_matches,
related_hypotheses, related_gaps, novel_findings, review_summary
"""Entity extraction: Use llm.py::complete with a prompt that parses title+abstract and returns structured JSON of entities by type. Types: gene, protein, disease, pathway, phenotype, brain_region, cell_type, drug.
KG cross-reference: For each extracted entity name, query knowledge_edges for count of edges where source_id or target_id matches the entity (case-insensitive). Also fetch sample edges for context.
Related hypotheses: For top entities (by KG edge count), search hypotheses table for matches in title or target_gene.
Related gaps: For top entities, search knowledge_gaps for title matches.
Novel findings: Entities with 0 KG edges are flagged as novel.
Review summary: LLM call that synthesizes the above into a structured review of the paper's contribution.
DB write: Use db_writes.py helper or direct INSERT into paper_reviews.
POST /api/papers/{pmid}/review — trigger workflow for a paper by PMID. Returns the full result dict including review_id.
GET /api/papers/{pmid}/review — get existing review for a paper.
paper_review_workflow("31883511") returns structured dict with all 6 fieldspaper_reviews table with all fields populatedPOST /api/papers/{pmid}/review endpoint exists and worksGET /api/papers/{pmid}/review endpoint returns stored reviewTOOL_NAME_MAPPINGpaper_cache.get_paper() for paper fetchingllm.complete() for LLM callsdatabase.get_db() for DB accesspaper_reviews table schemafrom llm import complete to tools.py imports{
"requirements": {
"coding": 8,
"reasoning": 8
},
"completion_shas": [
"09f8582ba6daec83ca419178572295ba713f2d86",
"1745a983df2a87a7cd8ef17ef24978b9148fbdf3"
],
"completion_shas_checked_at": "2026-04-14T04:49:21.420936+00:00"
}