[Docs] Paper review workflow — ingest paper, cross-reference against KG, produce structured review done coding:8 reasoning:8

← Mission Control
## REOPENED TASK — CRITICAL CONTEXT This task was previously marked 'done' but the audit could not verify the work actually landed on main. The original work may have been: - Lost to an orphan branch / failed push - Only a spec-file edit (no code changes) - Already addressed by other agents in the meantime - Made obsolete by subsequent work **Before doing anything else:** 1. **Re-evaluate the task in light of CURRENT main state.** Read the spec and the relevant files on origin/main NOW. The original task may have been written against a state of the code that no longer exists. 2. **Verify the task still advances SciDEX's aims.** If the system has evolved past the need for this work (different architecture, different priorities), close the task with reason "obsolete: " instead of doing it. 3. **Check if it's already done.** Run `git log --grep=''` and read the related commits. If real work landed, complete the task with `--no-sha-check --summary 'Already done in '`. 4. **Make sure your changes don't regress recent functionality.** Many agents have been working on this codebase. Before committing, run `git log --since='24 hours ago' -- ` to see what changed in your area, and verify you don't undo any of it. 5. **Stay scoped.** Only do what this specific task asks for. Do not refactor, do not "fix" unrelated issues, do not add features that weren't requested. Scope creep at this point is regression risk. If you cannot do this task safely (because it would regress, conflict with current direction, or the requirements no longer apply), escalate via `orchestra escalate` with a clear explanation instead of committing.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

[Docs] Update sci-doc-15 work log: implementation complete [task:4df875b8-817f-42ca-add6-1b80e45edf51]2026-04-13
[Docs] Paper review workflow: ingest paper, cross-reference KG, produce structured review [task:4df875b8-817f-42ca-add6-1b80e45edf51]2026-04-13
Spec File

Goal

Implement a paper_review_workflow tool that: (1) ingests a paper by PMID/DOI, (2) extracts named entities (genes, proteins, diseases, pathways, phenotypes, brain regions, cell types, drugs) via LLM, (3) cross-references each entity against the SciDEX knowledge graph (KG) to find existing edges and match strength, (4) finds related hypotheses and knowledge gaps, (5) produces a structured review summary. Results are stored in the paper_reviews table.

Context

The paper_reviews table already exists in PostgreSQL with the right schema:

  • id, paper_id, pmid, doi — paper identification
  • extracted_entities — JSON dict of entity type → list of names (from LLM extraction)
  • kg_matches — JSON dict of entity → KG edge count (how well-connected each entity is)
  • related_hypotheses — JSON list of {id, title, composite_score} for related hypotheses
  • related_gaps — JSON list of {gap_id, title, priority_score} for related gaps
  • novel_findings — JSON list of novel entity findings
  • review_summary — human-readable review of paper's contribution to SciDEX
Problem: The table exists but there is no write API. The 3 existing rows show "Review summary generation failed." — meaning a prior attempt existed but the LLM step failed silently.

Approach

Step 1 — paper_review_workflow tool function in scidex/forge/tools.py

@log_tool_call
def paper_review_workflow(identifier: str) -> dict:
    """
    Run the full paper review pipeline:
    1. Fetch paper metadata (via paper_cache.get_paper)
    2. Extract entities via LLM from title+abstract
    3. Cross-reference entities against KG (knowledge_edges table)
    4. Find related hypotheses (by entity/gene match)
    5. Find related knowledge gaps (by entity match)
    6. Identify novel findings (entities with 0 KG edges)
    7. Generate structured review summary via LLM
    8. Write to paper_reviews table

    Args:
        identifier: PMID or DOI of the paper to review

    Returns: dict with review_id, extracted_entities, kg_matches,
             related_hypotheses, related_gaps, novel_findings, review_summary
    """

Entity extraction: Use llm.py::complete with a prompt that parses title+abstract and returns structured JSON of entities by type. Types: gene, protein, disease, pathway, phenotype, brain_region, cell_type, drug.

KG cross-reference: For each extracted entity name, query knowledge_edges for count of edges where source_id or target_id matches the entity (case-insensitive). Also fetch sample edges for context.

Related hypotheses: For top entities (by KG edge count), search hypotheses table for matches in title or target_gene.

Related gaps: For top entities, search knowledge_gaps for title matches.

Novel findings: Entities with 0 KG edges are flagged as novel.

Review summary: LLM call that synthesizes the above into a structured review of the paper's contribution.

DB write: Use db_writes.py helper or direct INSERT into paper_reviews.

Step 2 — API endpoint

POST /api/papers/{pmid}/review — trigger workflow for a paper by PMID. Returns the full result dict including review_id.

GET /api/papers/{pmid}/review — get existing review for a paper.

Step 3 — Error handling

  • If paper not found: raise 404
  • If entity extraction fails: return partial results with review_summary = "Entity extraction failed"
  • If DB write fails: raise 500 with error detail

Acceptance Criteria

paper_review_workflow("31883511") returns structured dict with all 6 fields
☑ Entities extracted from title+abstract (gene, protein, disease, pathway, phenotype, brain_region, cell_type, drug)
☑ KG cross-reference shows edge count per entity (0 = novel)
☑ Related hypotheses found (by entity/gene match)
☑ Related gaps found (by entity match)
☑ Novel findings flagged (entities with 0 KG edges)
☑ Review summary generated by LLM
☑ Result written to paper_reviews table with all fields populated
POST /api/papers/{pmid}/review endpoint exists and works
GET /api/papers/{pmid}/review endpoint returns stored review
☑ Tool registered in TOOL_NAME_MAPPING
☑ Test with a known PMID, verify paper_reviews row created

Dependencies

  • paper_cache.get_paper() for paper fetching
  • llm.complete() for LLM calls
  • database.get_db() for DB access
  • Existing paper_reviews table schema

Dependents

  • Quest task for paper review enrichment will batch-process papers via this tool
  • Wiki entity enrichment will use extracted entities for cross-linking

Work Log

2026-04-14 04:25 PT — Slot minimax:56

  • Investigated: paper_reviews table exists in PostgreSQL with correct schema
  • Found 3 existing rows (all show "Review summary generation failed." — prior LLM-based attempt that silently failed)
  • Confirmed: no write API exists for paper_reviews in api.py
  • Confirmed: no paper_review_workflow tool in forge/tools.py
  • Task is NOT done — needs implementation

2026-04-14 05:05 PT — Slot minimax:56

  • Created sci-doc-15-REVIEW_paper_review_workflow_spec.md with full spec
  • Implemented paper_review_workflow() in scidex/forge/tools.py:
- Step 1: Fetch paper via paper_cache.get_paper
- Step 2: LLM entity extraction (8 entity types)
- Step 3: KG cross-reference via knowledge_edges count query
- Step 4: Related hypotheses lookup (by entity/gene match)
- Step 5: Related gaps lookup (by entity match in title/description)
- Step 6: Novel findings flag (entities with 0 KG edges)
- Step 7: LLM review summary generation
- Step 8: DB write to paper_reviews table
  • Added POST/GET /api/papers/{pmid}/review endpoints to api.py
  • Registered paper_review_workflow in TOOL_NAME_MAPPING
  • Added from llm import complete to tools.py imports
  • Committed and pushed (7d6fd8844)
  • Status: done

Payload JSON
{
  "requirements": {
    "coding": 8,
    "reasoning": 8
  },
  "completion_shas": [
    "09f8582ba6daec83ca419178572295ba713f2d86",
    "1745a983df2a87a7cd8ef17ef24978b9148fbdf3"
  ],
  "completion_shas_checked_at": "2026-04-14T04:49:21.420936+00:00"
}