Run artifact_dedup_agent.run_full_scan() on a recurring schedule to generate deduplication recommendations as content grows. Uses high thresholds to minimize false positives.
scan_hypothesis_duplicates (threshold=0.42, limit=200) — finds near-duplicate hypotheses across different analysesscan_wiki_duplicates (threshold=0.60, limit=500) — finds similar wiki pages by title + entity overlapscan_artifact_duplicates (threshold=0.85) — finds exact duplicates by content_hashdedup_recommendations tableRequires postgresql://scidex (main database with full schema)
db.executemany() → cursor loop with individual cursor.execute() (psycopg has no executemany)GROUP_CONCAT → string_agg(id::text, ',') (PostgreSQL has no GROUP_CONCAT)json_set() → Python dict + json.dumps() (similarity_details is TEXT, not JSONB)
dedup_recommendations_id_seq sequence (was stuck at 1, table had rows up to id=2362)postgresql://scidex{
"requirements": {
"analysis": 3
},
"completion_shas": [
"f6f2cd3b5f98bc3746ed0cf99bfd56862c7a282a"
],
"completion_shas_checked_at": "2026-04-12T17:19:29.992468+00:00",
"completion_shas_missing": [
"30683bafd3f72119104eedcd61c83303cf6172c3",
"af558f3aa94ddc4dec8fb60c3d6ffa6fc205445e"
],
"_stall_skip_providers": [
"glm"
]
}