Published papers concentrate their open questions in Discussion, Limitations,
and Future Work sections. SciDEX has 15,000+ papers in papers and PMC
full-text in paper_cache. Most papers are never re-read after ingestion.
Extract their open questions into first-class open_question artifacts so
literature-grounded questions land in the per-field leaderboards alongside
internally-mined ones, with full PMID provenance.
scidex/agora/open_question_miner_papers.py (≤600 LoC).papers joined to paper_cache.full_text_xml (PMC OApapers.abstract.
<sec sec-type="discussion">,<sec sec-type="conclusions">, and headings matching(?i)^(discussion|future (work|directions)|limitations|outlook|open questions){question_text, field_tag, evidence_summary, page_anchor,
verbatim_excerpt, tractability_score, potential_impact_score}.
open_question:metadata.source_kind='paper'metadata.source_id=<pmid>, metadata.source_doi, metadata.source_paper_idartifact_links row link_type='derived_from' pointing to the papermetadata.evidence_summary includes the verbatim excerpt + PMID for_render_open_question_detail at line ~26912).
SELECT pmid FROM papers ORDER BY cited_by_count DESC NULLS LAST LIMIT 500),scidex.exchange.cost_ledger.
open_question artifacts via the shared_question_dedup.py SimHash util; expected dedup-rate ≥30% (papersdata/scidex-artifacts/reports/openq_papers_<utc>.jsonpaper_cache schema in scidex/atlas/ to find the full_text_xmllxml for JATS parsing (already a dep via biopython); regex fallback_question_dedup.py from q-openq-mine-from-wiki-pages.scidex-openq-papers.timer to incrementallypapers.created_at > last_high_water).q-openq-mine-from-wiki-pages — dedup utilb2d85e76-51f3 — open_question schemascidex/agora/open_question_miner_papers.py (779 LoC)<sec sec-type="discussion">,<sec sec-type="conclusions">, limitations, future-work, outlook)scidex.core.llm.complete, JSON mode, falls back to defaultsartifact_registry.register_artifact with source_kind='paper'derived_from artifact_links when paper artifact existsdata/scidex-artifacts/reports/openq_papers_<utc>.jsontests/agora/test_open_question_miner_papers.py (326 LoC, 22 passing tests)) in character class)