search_pubmed() (15x duplicates) → scidex/agora/pubmed_utils.py (EXISTS)extract_edges_from_abstract() (12x) → scidex/agora/kg_extraction_utils.py (NEED TO CREATE)classify_relation() (9x) → scidex/agora/kg_extraction_utils.pyfind_entities_in_text() (8x) → scidex/agora/kg_extraction_utils.pygenerate_mermaid() (7x) → scidex/atlas/mermaid_utils.py (NEED TO CREATE)scidex/agora/kg_extraction_utils.py created with consolidated extract_edges_from_abstract, classify_relation, find_entities_in_textscidex/atlas/mermaid_utils.py created with consolidated generate_mermaidenrichment/*.py files updated to import from scidex.agora.pubmed_utilskg_expansion/*.py files updated to import from scidex.agora.kg_extraction_utilsscripts/ updated to use consolidated modulesscripts/archive/ NOT modified (preserve historical state)python3 -c "from scidex.agora import pubmed_utils, kg_extraction_utils; from scidex.atlas import mermaid_utils"scidex/agora/pubmed_utils.py to understand patternskg_extraction_utils.py with the most complete implementations + constantsmermaid_utils.py with consolidated generate_mermaidenrichment/enrich_kg_abstracts.py to use new modules (first mover)enrichment/*.py files to import instead of definekg_expansion/*.py files similarlyscidex/agora/pubmed_utils.py (already exists on main)Created consolidated modules and updated first caller:
scidex/agora/kg_extraction_utils.py (350+ lines)scidex/atlas/mermaid_utils.py (200+ lines)enrichment/enrich_kg_abstracts.pyRemaining work (not done due to scope):
Consolidated 2 more kg_expansion files:
kg_expansion/expand_kg_pubmed.pyscidex.agora.kg_extraction_utilsextract_edges_from_abstract with thin wrapper to extract_edges_from_abstract_dict()kg_expansion/expand_kg_batch2.pyscidex.agora.kg_extraction_utilsextract_edges_from_abstract with thin wrapper to extract_edges_from_abstract_dict()Why mermaid files not updated: The mermaid generate_mermaid implementations have incompatible signatures and different behaviors (different SQL queries, node ID schemes, styling approaches). Direct replacement would change output format and break callers. These need case-by-case review.
Remaining duplicates (require architectural review to consolidate safely):
generate_mermaid has 7+ files with different signatures(client, pmid, title, abstract, mesh_terms="")(client, pmid, title, abstract, focus_entity, mesh_terms="")python3 -c "from kg_expansion.expand_kg_pubmed import extract_edges_from_abstract; from kg_expansion.expand_kg_batch2 import extract_edges_from_abstract as b2; print('OK')" passesResolved merge failure by cleaning up branch:
python3 -c "from scidex.agora import kg_extraction_utils, pubmed_utils; from scidex.atlas import mermaid_utils; print('All imports OK')" passes{
"_reset_note": "This task was reset after a database incident on 2026-04-17.\n\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\ncorruption. Some work done during Apr 16-17 may have been lost.\n\n**Before starting work:**\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\n\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\nSCIDEX_DB_BACKEND=postgres env var.",
"_reset_at": "2026-04-18T06:29:22.046013+00:00",
"_reset_from_status": "done"
}