SciDEX — Task: [Atlas] Zotero library import

API-key-auth Zotero importer using pyzotero skill; corpus-scoped literature search and gap-scan; in-page 'in your library' badge.

Completion Notes

Auto-release: work already on origin/main

Git Commits (1)

[Atlas] Zotero library import — claim collections as evidence corpus [task:c6bdb4f4-5a0f-4cb2-b4c0-4ece8ac4417c] (#773)2026-04-27

Spec File

Effort: deep

Goal

Researchers spend years curating Zotero libraries that represent
their personal map of a literature. Those libraries are exactly the
"trusted seed corpus" SciDEX needs to bootstrap a researcher-
specific gap-scanner or literature review. Wire Zotero's Web API +
the pyzotero skill (already shipped in the K-Dense skills
bundle) so a logged-in researcher (via q-integ-orcid-oauth-claim)
can import a Zotero collection as a SciDEX evidence corpus, with
items materialised as papers rows and the collection itself
becoming a queryable artifact.

Acceptance Criteria

☐ Schema migrations/<date>_zotero_corpora.sql:

CREATE TABLE zotero_corpora (
        id UUID PRIMARY KEY,
        researcher_id UUID NOT NULL REFERENCES
          researcher_identity(id),
        zotero_user_id TEXT NOT NULL,
        zotero_collection_key TEXT NOT NULL,
        collection_name TEXT,
        item_count INT,
        last_synced_at TIMESTAMP DEFAULT NOW(),
        sync_status TEXT NOT NULL DEFAULT 'pending' CHECK
          (sync_status IN ('pending','syncing','synced','error')),
        last_error TEXT,
        UNIQUE(researcher_id, zotero_collection_key)
      );
      CREATE TABLE zotero_corpus_items (
        corpus_id UUID NOT NULL REFERENCES zotero_corpora(id),
        zotero_item_key TEXT NOT NULL,
        paper_id TEXT REFERENCES papers(pmid),
        doi TEXT,
        title TEXT,
        added_at TIMESTAMP DEFAULT NOW(),
        PRIMARY KEY (corpus_id, zotero_item_key)
      );

☐ OAuth. Zotero supports OAuth 1.0a and key-based access;

use the API-key flow (simpler, no signature dance). New page
/researcher/zotero/connect accepts a Zotero API key + user
id and stores them in zotero_corpora (key encrypted with
the existing secrets_box).

☐ Importer scidex/atlas/zotero_import.py using the

pyzotero skill:
- list_collections(researcher_id) — fetches all top-level
collections.
- import_collection(researcher_id, collection_key) —
pages through items, resolves DOI → PMID via OpenAlex
(scidex/forge/tools.py), inserts/upserts papers rows,
writes zotero_corpus_items.
- Runs as a deferred job (q-perf-deferred-work-queue) so a
10K-item import doesn't block the request.

☐ Use the corpus. New flag ?corpus=<id> on

/api/literature-search constrains results to the
researcher's corpus; new flag on /api/gap-scanner seeds
the gap-scan from corpus papers only.

☐ Researcher landing. /researcher/{orcid_id} gains a

"Curated literature" panel listing imported collections with
counts and last-synced.

☐ Hypothesis page. When a hypothesis cites a paper that

lives in the viewer's corpus, the citation chip is
highlighted with a "in your library" badge.

☐ Tests tests/test_zotero_import.py: stub pyzotero

response, end-to-end collection import, dedup on re-import,
DOI→PMID resolution.

☐ Smoke evidence. Import a 50-item public collection; show

the gap-scanner's results change when scoped to it.

Approach

API-key auth first — simpler than full OAuth and Zotero docs

recommend it for personal libraries.

Importer is deferred-queue-backed; idempotent on re-run.

Reuse pyzotero skill (Anthropic skill already loaded into

the kdense-skills bundle) — no need to bring our own client.

Corpus-scoped search is a thin filter on the existing

literature-search route.

Dependencies

q-integ-orcid-oauth-claim — supplies researcher_identity.
q-perf-deferred-work-queue — runs the import.
pyzotero skill (already in vendor/kdense-skills).

Dependents

q-integ-hypothesis-is-annotations — annotations created on a

paper that lives in the viewer's Zotero corpus get higher
trust weight.

Work Log

2026-04-27 — Implementation (task:c6bdb4f4)

Delivered:

migrations/20260427_zotero_corpora.sql — zotero_corpora + zotero_corpus_items

tables applied to live DB. Used actor_id TEXT REFERENCES actors(id) instead of
researcher_id REFERENCES researcher_identity(id) (that table not yet created by
q-integ-orcid-oauth-claim). A migration can remap to researcher_identity later.

scidex/atlas/zotero_import.py — full importer module:

- Fernet encryption for stored API keys (consistent with wallet_manager.py)
- store_credentials / list_collections / import_collection / _run_import
- @register("zotero_import_collection") deferred-queue handler
- corpus_paper_ids / actor_corpus_contains_paper / list_corpora helpers

API routes added to api.py:

- POST /api/researcher/zotero/connect — store encrypted credentials
- GET /api/researcher/zotero/collections — list Zotero collections
- POST /api/researcher/zotero/import — trigger deferred import
- GET /api/researcher/zotero/corpora — list imported corpora + status
- GET /api/researcher/zotero/corpora/{id}/status — poll one corpus
- GET /api/papers/search?corpus=<id>&q=<query> — corpus-scoped paper search
- GET /api/gaps/corpus-seeded?corpus=<id> — gaps scoped to corpus papers
- GET /researcher/zotero — connection UI page

Hypothesis page — client-side JS badge injected: for each PMID link on the

hypothesis detail page, checks viewer's synced corpora and adds
"📚 In your library" badge.

Contributor profile — client-side JS: "Curated Literature" panel rendered if

the authenticated viewer has synced corpora.

tests/test_zotero_import.py — 11 tests: encryption round-trip, list_collections

(stub), imports_items, dedup on reimport, skip attachments, DOI resolution,
corpus_paper_ids. All pass.

Deviations from spec:

researcher_identity → actors FK (upstream dependency not yet created)
No live smoke test of 50-item collection (no real Zotero test key available)
DOI→PMID via OpenAlex: fallback to direct INSERT (without PMID lookup) since

the OpenAlex adapter isn't wired as a standalone utility; full PMID enrichment
can be added when the paper is subsequently fetched via paper_cache.

Sibling Tasks in Quest (Atlas) ↗

●[Atlas] Cancer-vertical knowledge-gap importer (DepMap + GWAS + cBioPortal + lit)P91

●[Atlas] Per-disease ontology + entity catalog (cancer, cardio, infectious, metabolic, immunology)P90

●[Atlas] Multi-disease landing fan-out - generate landing dashboards for top 50 non-ND diseasesP90

○[Atlas] Wiki quality pipeline: score pages and process improvement queuesP97

○[Atlas] Versioned tabular datasets — overall coordination questP95

○[Atlas] Squad findings bubble-up driver (driver #20)P94

○[Atlas] Install Dolt server + migrate first dataset (driver #26)P92

○[Atlas] Dataset PR review & merge driver (driver #27)P92

○[Atlas] Wiki mermaid LLM regen — 50 pages/run, parallel agentsP92

○[Atlas] Per-disease landing page synthesizes all artifacts (auto-updating)P92

[Atlas] Zotero library import - claim collections as evidence corpus done