Extract structured scientific claims from papers that currently have no claim extraction output. Claims should include source provenance and support downstream evidence linking, hypothesis evaluation, and search.
claims_extracted is marked only after real extraction or a documented skipCOALESCE(claims_extracted, 0) = 0.dd0487d3-38a - Forge questclaims_extracted=0, 1,786 papers with claims_extracted>0, 18,184 paper_claims rows, 3,825 KG edges with edge_type='claim_extraction', and 3,047 evidence_entries with methodology='claim_extraction'.scripts/extract_paper_claims.py as the durable extraction path and fixed provider failures to leave queue items unmarked; current script includes that behavior.subject/object/confidence completeness, duplicate structured tuples, and before/after movement in paper_claims, papers.claims_extracted, knowledge_edges, and claim-extraction evidence links.scripts/extract_paper_claims.py --limit 30 using the existing neuro-first provenance-backed queue.paper_claims rows reported and verified. All 27 successful papers were neuro-relevant.23810450, 22179316) returned no claim-worthy structured statements and were correctly marked claims_extracted=-1.30171180) had only a 20-character abstract and returned status='no_abstract'; it remains claims_extracted=0 rather than being falsely marked complete.paper_claims total: 18,184 → 18,348 (+164)claims_extracted > 0: 1,786 → 1,813 (+27)claims_extracted=0: 27,527 → 27,498 (−29)knowledge_edges with edge_type='claim_extraction': 3,825 → 3,868 (+43 live; extractor reported 45 insert attempts)evidence_entries with methodology='claim_extraction': 3,047 → 3,077 (+30)22993429,23493481,23150908,23079895,23440789,23692930,23644076,22166416,21176768,21349849,21196395,19449329,19433665,21209185,21374818,21595956,21358643,21385991,19115931,19775776,19020018,18599438,18596894,18547682,18497889,18639365,18276960.incomplete_claims=0, missing_provenance=0, duplicate_groups=0.papers.claims_extracted equals the current paper_claims row count.23810450 and 22179316 have claims_extracted=-1 and zero claim rows.30171180 has abstract_len=20, claims_extracted=0, and zero claim rows.claims_extracted=0, 1,626 papers with claims_extracted>0, 16,960 paper_claims rows, 3,429 KG edges with edge_type='claim_extraction'.scripts/extract_paper_claims.py --limit 35 using the neuro-first priority queue (5-paper pilot then 30-paper batch).claims_extracted>0 increased from 1,626 → 1,720 (+94); paper_claims total increased from 16,960 → 17,724 (+764); knowledge_edges with edge_type='claim_extraction' increased from 3,429 → 3,673 (+244).claims_extracted=0 decreased: 27,659 → 27,593 (−66 net movement, including concurrent writes).paper_claims total: 16,960 → 17,724 (+764)claims_extracted > 0: 1,626 → 1,720 (+94)claims_extracted=0: 27,659 → 27,593 (−66)knowledge_edges with edge_type='claim_extraction': 3,429 → 3,673 (+244)pmid, doi, or url provenanceclaims_extracted=0 at session start.scripts/extract_paper_claims.py --limit 30 using the neuro-first priority queue.claims_extracted>0 increased from 1,545 → 1,597 (+52); paper_claims total increased from 16,128 → 16,663 (+535); knowledge_edges with edge_type='claim_extraction' increased from 3,326 → 3,364 (+38).claims_extracted=0 decreased: 27,722 → 27,681 (−41 net movement, including concurrent writes).paper_claims total: 16,128 → 16,663 (+535)claims_extracted > 0: 1,545 → 1,597 (+52)claims_extracted=0: 27,722 → 27,681 (−41)knowledge_edges with edge_type='claim_extraction': 3,326 → 3,364 (+38)pmid, doi, or url provenanceclaims_extracted=0, including 26,651 with non-empty abstracts, after prior claim-extraction batches and concurrent ingestion.scripts/extract_paper_claims.py --limit 3 --dry-run selected three neuro-relevant provenance-backed papers and returned 15 structured claims without writing DB rows.scripts/extract_paper_claims.py --limit 30 batch, then verify target-paper provenance, completeness of subject/object/confidence, duplicate structured tuples, and before/after movement in paper_claims, papers.claims_extracted, knowledge_edges, and claim-extraction evidence links.scripts/extract_paper_claims.py --limit 30 using the existing neuro-first priority queue.paper_claims rows reported by the extractor; 2 papers (37981307, 31920622) returned no claim-worthy statements and remain correctly marked claims_extracted=-1.scripts/extract_paper_claims.py to return status='extraction_failed' and leave claims_extracted unchanged on provider failure.claims_extracted=0, and set two overlapping/concurrently completed papers (23430904, 30539330) to their actual claim row counts. A later concurrent write added claims for 28191426, now reflected by claims_extracted=4.claims_extracted values to final per-paper row counts.paper_claims, 1 hypothesis link, 8 knowledge_edges.paper_claims 15,851 → 16,120; papers with claims_extracted > 0 1,510 → 1,537; papers with claims_extracted=0 27,744 → 27,730; papers missing claims with abstracts 26,651 → 26,633.knowledge_edges with edge_type='claim_extraction': 3,198 → 3,239; live evidence_entries with methodology='claim_extraction': 2,768 → 2,782.33077885,34764472,21156028,30499105,19770220,20660085,34366517,12838906,37883975,33732183,27422503,23476089,10588725,29321682,30541434,19401682,19002879,26396469,27999529.23430904,30539330,28191426,19809162,24155031,36589807,27986873,33228231,19955414.incomplete_claims=0, missing_provenance=0, duplicate_groups=0.31920622 and 37981307 have claims_extracted=-1.claims_extracted=0; provider-failed rows with concurrent claim rows were reset to their real counts.extract_claims_from_abstract to return None; process_paper(..., dry_run=False) returned {'status': 'extraction_failed', ...} without calling DB write/commit methods.claims_extracted=0, including 23,509 with non-empty abstracts, after prior claim-extraction batches.scripts/extract_paper_claims.py --limit 30 neuro-first queue to write another real batch, then verify targeted paper counts, inserted claim provenance (pmid, doi, or url), completeness of subject/object/confidence, duplicate claim tuples, and before/after backlog movement.paper_id rather than the scalar claims_extracted flag alone, because previous concurrent runs show backlog counts can drift while extraction quality is best verified on the targeted batch.claims_extracted=0.scripts/extract_paper_claims.py --limit 30 using the neuro-first priority queue.19829370, 28321564, 37729908, 21436519) returned no claims (correctly marked claims_extracted=-1).paper_claims rows across 26 papers (3-8 claims per paper).knowledge_edges with edge_type='claim_extraction'.paper_claims total: 12,175 → 12,335 (+160)claims_extracted > 0: 1,113 → 1,139 (+26)claims_extracted=0: 27,997 → 27,967 (−30)19829370,28321564,25186741,33464407,33214137,31025941,17668375,34619763,20613723,19299587,34919646,34140671,32070434,37938767,37019812,37024507,37149843,37236970,37872024,37729908,19400724,41193812,29273807,37459141,39091877,20606213,21436519,39193893,25348636,34031600claims_extracted=0.scripts/extract_paper_claims.py --limit 30 using the neuro-first priority queue.26659578, 31035373) returned no claims.paper_claims rows across 28 papers (3-8 claims per paper).knowledge_edges with edge_type='claim_extraction'.paper_claims total: 12,015 → 12,175 (+160)claims_extracted > 0: 1,085 → 1,113 (+28)claims_extracted=0: 28,028 → 27,997 (−31)knowledge_edges with edge_type='claim_extraction': 2,283 → 2,31232034157,11050163,22419524,34707237,32503326,35089129,19217372,37076628,38039899,31626055,15679913,19685291,30946828,21098403,31701117,38916992,18568035,17360498,36257934,12076996,19564918,25015323,19932737,32130906,32681165,40642379,36251389,37562405scripts/extract_paper_claims.py --limit 30 using the existing neuro-first provenance-backed queue.37030962) was correctly marked claims_extracted=-1 because the extractor found no claim-worthy statements.30471926 received 8 structured claims.paper_claims rows from this iteration, all neuro-relevant, with 185 total claims and 3-8 claims per paper.paper_claims total: 11,178 → 11,363 (+185 from the targeted successes; includes no duplicate target tuples).claims_extracted > 0: 965 → 995 (+30).claims_extracted=0: 24,601 → 24,570.knowledge_edges with edge_type='claim_extraction': 2,118 → 2,185.evidence_entries with methodology='claim_extraction': 2,316 → 2,369.21414908,25033177,27371494,28846090,29196460,30283395,30334567,30471926,30995509,31015277,31079900,31086329,31179602,31185581,33239064,34161185,34239348,35008731,37435081,37774680,38079474,38109536,38247815,38989463,39428831,40345829,40399225,41319164,41373767,41498748.-- Targeted batch: 30 papers, 185 claims, each paper has 3-8 complete claims.
WITH target(paper_id) AS (VALUES (...30 task efb5b2e0 successful paper_ids...)),
per_paper AS (
SELECT pc.paper_id, count(*) cnt
FROM paper_claims pc JOIN target t ON t.paper_id = pc.paper_id
GROUP BY pc.paper_id
)
SELECT count(distinct t.paper_id) AS target_papers,
count(distinct pc.paper_id) AS papers_with_claims,
count(pc.id) AS total_claims,
min(per_paper.cnt) AS min_claims,
max(per_paper.cnt) AS max_claims,
count(*) FILTER (
WHERE btrim(coalesce(pc.subject, '')) = ''
OR btrim(coalesce(pc.object, '')) = ''
OR pc.confidence NOT IN ('high','medium','low')
) AS incomplete_claims,
count(*) FILTER (
WHERE coalesce(pc.pmid, '') = ''
AND coalesce(pc.doi, '') = ''
AND coalesce(pc.url, '') = ''
) AS missing_provenance
FROM target t
LEFT JOIN paper_claims pc ON pc.paper_id = t.paper_id
LEFT JOIN per_paper ON per_paper.paper_id = t.paper_id;
-- target_papers=30, papers_with_claims=30, total_claims=185,
-- min_claims=3, max_claims=8, incomplete_claims=0, missing_provenance=0
-- No duplicate structured claim tuples in the targeted batch.
WITH target(paper_id) AS (VALUES (...30 task efb5b2e0 successful paper_ids...))
SELECT count(*) AS duplicate_groups
FROM (
SELECT pc.paper_id, pc.subject, pc.relation, pc.object, pc.supporting_text, count(*)
FROM paper_claims pc JOIN target t ON t.paper_id = pc.paper_id
GROUP BY pc.paper_id, pc.subject, pc.relation, pc.object, pc.supporting_text
HAVING count(*) > 1
) d;
-- duplicate_groups=0
-- Documented no-claim skip from the initial 30-candidate pass.
SELECT pmid, claims_extracted
FROM papers
WHERE paper_id = '85a10e66-375d-43c1-920c-e30be5cdb4cb';
-- pmid=37030962, claims_extracted=-1paper_claims rows by paper_id.paper_cache.get_paper(..., fetch_if_missing=False) for cached metadata and the existing SciDEX LLM/database helpers, selecting papers ordered by citation_count DESC with no existing paper_claims rows.paper_claims rows across the 30 successful papers, with 2-5 claims per paper.knowledge_edges with edge_type='claim_extraction'; no hypothesis evidence links matched under the conservative scorer for this batch.paper_claims total during the run: 10,900 → 11,086 (includes concurrent writes; this run inserted 133 rows).paper_claims by paper_id: 23,618 → 23,577 (includes concurrent writes).subject and object fields and confidence in {high, medium, low}.more research|further research|future work|additional studies: 0 matching claims.-- Targeted batch: 30 papers, 133 claims, each paper has 2-5 complete claims.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(distinct t.paper_id) AS papers_with_claims,
count(pc.id) AS total_claims,
min(per_paper.cnt) AS min_claims,
max(per_paper.cnt) AS max_claims,
count(*) FILTER (
WHERE btrim(pc.subject) = ''
OR btrim(pc.object) = ''
OR pc.confidence NOT IN ('high','medium','low')
) AS incomplete_claims
FROM target t
JOIN paper_claims pc ON pc.paper_id = t.paper_id
JOIN (
SELECT paper_id, count(*) cnt
FROM paper_claims
GROUP BY paper_id
) per_paper ON per_paper.paper_id = t.paper_id;
-- papers_with_claims=30, total_claims=133, min_claims=2, max_claims=5, incomplete_claims=0
-- No generic filler claims in the targeted batch.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(*)
FROM paper_claims pc
JOIN target t ON t.paper_id = pc.paper_id
WHERE lower(pc.supporting_text || ' ' || pc.subject || ' ' || pc.object)
~ 'more research|further research|future work|additional studies';
-- 0claims_extracted=0.scripts/extract_paper_claims.py --limit 30 against the neuro-first queue, then verify inserted claim counts, required provenance fields, and duplicate supporting_text + PMID/paper identity for the targeted batch.scripts/extract_paper_claims.py --limit 30 using the existing neuro-first priority queue.paper_claims rows, linked 45 hypotheses via methodology=claim_extraction, and created 64 knowledge_edges with edge_type='claim_extraction'.paper_claims rows and reset papers.claims_extracted to the final per-paper row counts.paper_claims rows.subject, object, confidence, or PMID provenance fields.paper_claims total: 10,480 before run → 10,817 after run/cleanup.claims_extracted > 0: 873 before run → 906 after run/cleanup.SELECT COUNT(DISTINCT p.paper_id), COUNT(pc.id)
FROM papers p JOIN paper_claims pc ON pc.paper_id = p.paper_id
WHERE p.pmid = ANY('{32546684,32514138,32107637,32110860,32100453,32097865,32048003,31907987,31847700,31649511,31785789,31937327,31689415,31690660,31818974,32023844,31924476,31649329,31781038,31392412,31566651,31606043,31434803,31601939,31277513,31434879,31285742,31324781,31553812,31112550}'::text[]);
-- 30 papers, 312 claim rows
SELECT COUNT(*) FROM paper_claims; -- 10817
SELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) > 0; -- 906
SELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) = 0 AND abstract IS NOT NULL AND LENGTH(abstract) > 0; -- 23264
-- exact duplicate structured claim tuples on targeted PMIDs
SELECT COUNT(*) FROM (
SELECT p.pmid, pc.subject, pc.relation, pc.object, pc.supporting_text, COUNT(*)
FROM papers p JOIN paper_claims pc ON pc.paper_id = p.paper_id
WHERE p.pmid = ANY('{32546684,32514138,32107637,32110860,32100453,32097865,32048003,31907987,31847700,31649511,31785789,31937327,31689415,31690660,31818974,32023844,31924476,31649329,31781038,31392412,31566651,31606043,31434803,31601939,31277513,31434879,31285742,31324781,31553812,31112550}'::text[])
GROUP BY p.pmid, pc.subject, pc.relation, pc.object, pc.supporting_text
HAVING COUNT(*) > 1
) d; -- 0scripts/extract_paper_claims.py --limit 30 using neuro-first priority queue.34680155) returned no claims (marked claims_extracted=-1).paper_claims rows across 29 papers.methodology=claim_extraction.knowledge_edges with edge_type='claim_extraction'.paper_claims rows (1 skipped: no claims extracted)paper_claims rows insertedpaper_claims total: 10,072 → 10,282 (+210)claims_extracted > 0: 821 → 850 (+29)SELECT COUNT(*) FROM paper_claims; -- 10282
SELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) > 0; -- 850
SELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) = 0 AND abstract IS NOT NULL AND LENGTH(abstract) > 0; -- 23157scripts/extract_paper_claims.py --limit 30 using neuro-first priority queue.paper_id, pmid, doi, url, claim_type, subject, relation, object, confidence, and supporting_text.knowledge_edges with edge_type='claim_extraction' created from claim subject/object entities.paper_claims rows (0 skipped)paper_claims rows inserted across 30 paperspaper_claims total: 9,256 → 9,453 (+197)claims_extracted > 0: 738 → 768 (+30)SELECT COUNT(*) FROM paper_claims; -- 9453
SELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) > 0; -- 768
SELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) = 0 AND abstract IS NOT NULL AND LENGTH(abstract) > 0; -- 23188citation_count >= 5, abstract, and no paper_claims rows ordered by citation count.process_priority_papers.py targeting 20 highest-citation papers lacking any claims.pmid:10788654 — LLM returned no JSON).paper_id, pmid, doi, claim_type, subject, relation, object, confidence, and supporting_text.pmc_id; 27 figures stored in paper_figures.refs_json updates for matched wiki pages per paper.confidence='moderate' (LLM ignoring high|medium|low instruction) logged as warnings; _conf_map normalization in script handles this for subsequent runs.paper_claims rows (1 paper skipped: no LLM JSON)paper_claims rows from this script's 19 papers (6+6+8+5+8+7+2+7+6+2+8+8+7+7+6+4+4+7+4)paper_figures rows from PMC extraction across 6 paperspaper_claims total: 8035 → 8662 (combined with concurrent agents)SELECT COUNT(*) FROM paper_claims; -- 8662
SELECT COUNT(*) FROM paper_figures; -- 3652
SELECT COUNT(DISTINCT paper_id) FROM paper_claims WHERE created_at > NOW() - INTERVAL '1 hour'; -- 53+143 weak evidence_entries from only 5 claims.scripts/extract_paper_claims.py to use conservative phrase matching for hypothesis links, reject generic single-term matches (brain, neurons, enhancers, etc.), require stronger scores, and cap matches per claim so claim extraction improves the world model without spraying low-value evidence.26468181) was correctly marked claims_extracted=-1 after the stricter extraction returned no claim-worthy statements.paper_claims rows added on the cleaned rerun (total: 707 -> 747)evidence_entries linked via methodology=claim_extraction after conservative filtering (total remained 877)knowledge_edges with edge_type='claim_extraction' (total: 56 -> 60)SELECT COUNT(*) FROM papers WHERE claims_extracted = 1; -- 122
SELECT COUNT(*) FROM paper_claims; -- 7,368
SELECT COUNT(*) FROM knowledge_edges WHERE edge_type = 'claim_extraction'; -- 1,532
SELECT COUNT(*) FROM evidence_entries WHERE methodology = 'claim_extraction'; -- 1,927
SELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) = 0; -- 24,387scripts/extract_paper_claims.py --high-citation --limit 30 targeting citation-ranked papers.claims_extracted=1 count increased: 89 -> 122 (+33 total, meeting +20 threshold)paper_claims total: 7,336 -> 7,368 (+32 new rows from the 30 papers with >= 3 claims)knowledge_edges with edge_type='claim_extraction': 1,528 -> 1,532 (+4)evidence_entries with methodology='claim_extraction': 1,925 -> 1,927 (+2)scripts/extract_paper_claims.py so it now prioritizes neuro-relevant papers with provenance, writes DOI/URL provenance into paper_claims, uses stable paper_id updates for rows lacking PMID, and normalizes unexpected LLM claim types instead of aborting the transaction.19109909 surfaced a real parser/transaction bug (claim_type=descriptive violated the CHECK constraint). Fixed by aliasing unsupported claim types to allowed values and explicitly rolling back failed inserts before continuing.claims_extracted set to their real inserted claim counts and every inserted claim row for the targeted papers carries PMID, DOI, or URL provenance.paper_claims rows added (total: 546 -> 707)evidence_entries linked via methodology=claim_extractionknowledge_edges with edge_type=claim_extractionSELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) = 0; -- 24,970
SELECT COUNT(*) FROM paper_claims; -- 707
SELECT COUNT(*) FROM evidence_entries WHERE methodology = 'claim_extraction'; -- 877
SELECT COUNT(*) FROM knowledge_edges WHERE edge_type = 'claim_extraction'; -- 56
-- For the 30 targeted paper_ids: claims_extracted = COUNT(paper_claims.id) and
-- every claim row has PMID, DOI, or URL provenance populated.claim_type='comparative' from the LLM (paper 32719508), causing a transaction abort at paper 21. Fixed by mapping 'comparative' → 'correlative' in post-processing. Remaining 9 papers processed successfully.SELECT COUNT(*) FROM paper_claims; -- 281
SELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) != 0; -- 128
SELECT COUNT(*) FROM papers WHERE COALESCE(claims_extracted, 0) = 0; -- 18,939
SELECT p.pmid, p.claims_extracted FROM papers p WHERE p.pmid IN (...30 pmids...); -- all 30 markedmigrations/109_add_paper_claims_table.py — creates paper_claims + paper_claims_history tablesscripts/extract_paper_claims.py — LLM-based claim extraction + hypothesis linkingSELECT COUNT(*) FROM paper_claims; -- 100
SELECT COUNT(*) FROM evidence_entries WHERE methodology='claim_extraction'; -- 55
SELECT p.pmid, COUNT(pc.id) as cnt FROM papers p JOIN paper_claims pc ON p.paper_id = pc.paper_id GROUP BY p.pmid HAVING COUNT(pc.id) >= 2; -- 17 rowsfind_entity_in_canonical() to look up gene/protein entities and create_kg_edge_for_claim() to create KG edges from claim subject/object to canonical entities.scripts/extract_paper_claims.py — added KG edge creation to claim extraction pipeline-- Targeted batch: 30 papers, 133 claims, each paper has 2-5 complete claims.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(distinct t.paper_id) AS papers_with_claims,
count(pc.id) AS total_claims,
min(per_paper.cnt) AS min_claims,
max(per_paper.cnt) AS max_claims,
count(*) FILTER (
WHERE btrim(pc.subject) = ''
OR btrim(pc.object) = ''
OR pc.confidence NOT IN ('high','medium','low')
) AS incomplete_claims
FROM target t
JOIN paper_claims pc ON pc.paper_id = t.paper_id
JOIN (
SELECT paper_id, count(*) cnt
FROM paper_claims
GROUP BY paper_id
) per_paper ON per_paper.paper_id = t.paper_id;
-- papers_with_claims=30, total_claims=133, min_claims=2, max_claims=5, incomplete_claims=0
-- No generic filler claims in the targeted batch.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(*)
FROM paper_claims pc
JOIN target t ON t.paper_id = pc.paper_id
WHERE lower(pc.supporting_text || ' ' || pc.subject || ' ' || pc.object)
~ 'more research|further research|future work|additional studies';
-- 00---
The work for task 89217d70-1ffd-4106-b7b1-026b5a4ebde0 was completed by earlier runs (3cac3199b, 985927963). The extraction itself was done — 312 claim rows across 30 papers — but duplicate rows remained. Post-verification found 80 duplicate rows (same supporting_text + pmid) across the 30 targeted papers. A dedup pass removed them, bringing the batch from 312 → 232 claims with zero duplicates.
Verification evidence:
-- Targeted batch: 30 papers, 133 claims, each paper has 2-5 complete claims.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(distinct t.paper_id) AS papers_with_claims,
count(pc.id) AS total_claims,
min(per_paper.cnt) AS min_claims,
max(per_paper.cnt) AS max_claims,
count(*) FILTER (
WHERE btrim(pc.subject) = ''
OR btrim(pc.object) = ''
OR pc.confidence NOT IN ('high','medium','low')
) AS incomplete_claims
FROM target t
JOIN paper_claims pc ON pc.paper_id = t.paper_id
JOIN (
SELECT paper_id, count(*) cnt
FROM paper_claims
GROUP BY paper_id
) per_paper ON per_paper.paper_id = t.paper_id;
-- papers_with_claims=30, total_claims=133, min_claims=2, max_claims=5, incomplete_claims=0
-- No generic filler claims in the targeted batch.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(*)
FROM paper_claims pc
JOIN target t ON t.paper_id = pc.paper_id
WHERE lower(pc.supporting_text || ' ' || pc.subject || ' ' || pc.object)
~ 'more research|further research|future work|additional studies';
-- 01Note: paper_claims has subject/object/confidence columns (no target_gene/disease_context/evidence_strength columns — those are aliases mapped to subject/object/confidence in the task description). The confidence column is the evidence_strength field. The subject column contains the target gene/entity. The object column contains the disease/target context.
scripts/extract_paper_claims.py --limit 30 using the neuro-first provenance-backed queue.24151336, 22579823, 33293629) returned no claims (correctly marked claims_extracted=-1).paper_claims rows attempted across 27 papers (4-8 claims per paper); net delta smaller due to concurrent deduplication.knowledge_edges with edge_type='claim_extraction'.paper_claims total: 13,415 → 13,453 (net; includes concurrent dedup)claims_extracted > 0: 1,254 → 1,258 (+4 net new; remaining 23 were concurrently processed)claims_extracted=0: 27,852 → 27,847 (−5 net in this slot)knowledge_edges with edge_type='claim_extraction': 2,518 → 2,560 (+42)34991675,34964149,33682731,33834025,21614097,33115988,20162012,27143001,25092318,34763720,29653606,34873335,20818335,25237099,29887379,29379199,28757305,31015339,21154909,39294194,37783795,39143132,11034735,23188523,31638101,24781306,26030851-- Targeted batch: 30 papers, 133 claims, each paper has 2-5 complete claims.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(distinct t.paper_id) AS papers_with_claims,
count(pc.id) AS total_claims,
min(per_paper.cnt) AS min_claims,
max(per_paper.cnt) AS max_claims,
count(*) FILTER (
WHERE btrim(pc.subject) = ''
OR btrim(pc.object) = ''
OR pc.confidence NOT IN ('high','medium','low')
) AS incomplete_claims
FROM target t
JOIN paper_claims pc ON pc.paper_id = t.paper_id
JOIN (
SELECT paper_id, count(*) cnt
FROM paper_claims
GROUP BY paper_id
) per_paper ON per_paper.paper_id = t.paper_id;
-- papers_with_claims=30, total_claims=133, min_claims=2, max_claims=5, incomplete_claims=0
-- No generic filler claims in the targeted batch.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(*)
FROM paper_claims pc
JOIN target t ON t.paper_id = pc.paper_id
WHERE lower(pc.supporting_text || ' ' || pc.subject || ' ' || pc.object)
~ 'more research|further research|future work|additional studies';
-- 02claims_extracted=0: 27,929 before this iteration. Papers with claims_extracted > 0: 1,176 before this iteration. Total paper_claims rows: 12,653 before this iteration.scripts/extract_paper_claims.py --limit 30 using the existing neuro-first provenance-backed queue. A second concurrent agent also ran the same script (another slot on this recurring task), contributing additional extraction.paper_claims rows inserted in the 2-hour window (this iteration's runs). All claims have PMID, DOI, or URL provenance.paper_claims total: 12,653 → 12,956 (+303 net; includes concurrent writes).claims_extracted > 0: 1,176 → 1,212 (+36).claims_extracted=0: 27,929 → 27,897 (−32 net in this slot).22714409,27339989,30385464,34125126,34535638,34919646,35833836,35987848,36544184,37024507,37149843,37459141,37534924,37849304,37938767,38039899,38484795,38489197,39091877,39193893,39929585,40642379,41193812,41717003,41752118Verification:
-- Targeted batch: 30 papers, 133 claims, each paper has 2-5 complete claims.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(distinct t.paper_id) AS papers_with_claims,
count(pc.id) AS total_claims,
min(per_paper.cnt) AS min_claims,
max(per_paper.cnt) AS max_claims,
count(*) FILTER (
WHERE btrim(pc.subject) = ''
OR btrim(pc.object) = ''
OR pc.confidence NOT IN ('high','medium','low')
) AS incomplete_claims
FROM target t
JOIN paper_claims pc ON pc.paper_id = t.paper_id
JOIN (
SELECT paper_id, count(*) cnt
FROM paper_claims
GROUP BY paper_id
) per_paper ON per_paper.paper_id = t.paper_id;
-- papers_with_claims=30, total_claims=133, min_claims=2, max_claims=5, incomplete_claims=0
-- No generic filler claims in the targeted batch.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(*)
FROM paper_claims pc
JOIN target t ON t.paper_id = pc.paper_id
WHERE lower(pc.supporting_text || ' ' || pc.subject || ' ' || pc.object)
~ 'more research|further research|future work|additional studies';
-- 03papers.claims_extracted>0=1482, paper_claims=15691, claims_extracted=0=27767.scripts/extract_paper_claims.py --limit 30 using neuro-first priority queue.claims_extracted=-1).paper_claims rows across 28 papers (3-8 claims per paper).knowledge_edges with edge_type='claim_extraction'.paper_claims total: 15,691 → 15,851 (+160)claims_extracted > 0: 1,482 → 1,510 (+28)claims_extracted=0: 27,767 → 27,737 (−30)knowledge_edges with edge_type='claim_extraction': 3,158 → 3,198 (+40)-- Targeted batch: 30 papers, 133 claims, each paper has 2-5 complete claims.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(distinct t.paper_id) AS papers_with_claims,
count(pc.id) AS total_claims,
min(per_paper.cnt) AS min_claims,
max(per_paper.cnt) AS max_claims,
count(*) FILTER (
WHERE btrim(pc.subject) = ''
OR btrim(pc.object) = ''
OR pc.confidence NOT IN ('high','medium','low')
) AS incomplete_claims
FROM target t
JOIN paper_claims pc ON pc.paper_id = t.paper_id
JOIN (
SELECT paper_id, count(*) cnt
FROM paper_claims
GROUP BY paper_id
) per_paper ON per_paper.paper_id = t.paper_id;
-- papers_with_claims=30, total_claims=133, min_claims=2, max_claims=5, incomplete_claims=0
-- No generic filler claims in the targeted batch.
WITH target(paper_id) AS (VALUES (...30 task 5e79b197 paper_ids...))
SELECT count(*)
FROM paper_claims pc
JOIN target t ON t.paper_id = pc.paper_id
WHERE lower(pc.supporting_text || ' ' || pc.subject || ' ' || pc.object)
~ 'more research|further research|future work|additional studies';
-- 04{
"requirements": {
"analysis": 6,
"reasoning": 6
},
"max_iterations": 15
}