[Atlas] Zotero library import - claim collections as evidence corpus done

← Atlas
API-key-auth Zotero importer using pyzotero skill; corpus-scoped literature search and gap-scan; in-page 'in your library' badge.

Completion Notes

Auto-release: work already on origin/main

Git Commits (1)

[Atlas] Zotero library import — claim collections as evidence corpus [task:c6bdb4f4-5a0f-4cb2-b4c0-4ece8ac4417c] (#773)2026-04-27
Spec File

Effort: deep

Goal

Researchers spend years curating Zotero libraries that represent
their personal map of a literature. Those libraries are exactly the
"trusted seed corpus" SciDEX needs to bootstrap a researcher-
specific gap-scanner or literature review. Wire Zotero's Web API +
the pyzotero skill (already shipped in the K-Dense skills
bundle) so a logged-in researcher (via q-integ-orcid-oauth-claim)
can import a Zotero collection as a SciDEX evidence corpus, with
items materialised as papers rows and the collection itself
becoming a queryable artifact.

Acceptance Criteria

Schema migrations/<date>_zotero_corpora.sql:

CREATE TABLE zotero_corpora (
        id UUID PRIMARY KEY,
        researcher_id UUID NOT NULL REFERENCES
          researcher_identity(id),
        zotero_user_id TEXT NOT NULL,
        zotero_collection_key TEXT NOT NULL,
        collection_name TEXT,
        item_count INT,
        last_synced_at TIMESTAMP DEFAULT NOW(),
        sync_status TEXT NOT NULL DEFAULT 'pending' CHECK
          (sync_status IN ('pending','syncing','synced','error')),
        last_error TEXT,
        UNIQUE(researcher_id, zotero_collection_key)
      );
      CREATE TABLE zotero_corpus_items (
        corpus_id UUID NOT NULL REFERENCES zotero_corpora(id),
        zotero_item_key TEXT NOT NULL,
        paper_id TEXT REFERENCES papers(pmid),
        doi TEXT,
        title TEXT,
        added_at TIMESTAMP DEFAULT NOW(),
        PRIMARY KEY (corpus_id, zotero_item_key)
      );

OAuth. Zotero supports OAuth 1.0a and key-based access;
use the API-key flow (simpler, no signature dance). New page
/researcher/zotero/connect accepts a Zotero API key + user
id and stores them in zotero_corpora (key encrypted with
the existing secrets_box).
Importer scidex/atlas/zotero_import.py using the
pyzotero skill:
- list_collections(researcher_id) — fetches all top-level
collections.
- import_collection(researcher_id, collection_key)
pages through items, resolves DOI → PMID via OpenAlex
(scidex/forge/tools.py), inserts/upserts papers rows,
writes zotero_corpus_items.
- Runs as a deferred job (q-perf-deferred-work-queue) so a
10K-item import doesn't block the request.
Use the corpus. New flag ?corpus=<id> on
/api/literature-search constrains results to the
researcher's corpus; new flag on /api/gap-scanner seeds
the gap-scan from corpus papers only.
Researcher landing. /researcher/{orcid_id} gains a
"Curated literature" panel listing imported collections with
counts and last-synced.
Hypothesis page. When a hypothesis cites a paper that
lives in the viewer's corpus, the citation chip is
highlighted with a "in your library" badge.
Tests tests/test_zotero_import.py: stub pyzotero
response, end-to-end collection import, dedup on re-import,
DOI→PMID resolution.
Smoke evidence. Import a 50-item public collection; show
the gap-scanner's results change when scoped to it.

Approach

  • API-key auth first — simpler than full OAuth and Zotero docs
  • recommend it for personal libraries.
  • Importer is deferred-queue-backed; idempotent on re-run.
  • Reuse pyzotero skill (Anthropic skill already loaded into
  • the kdense-skills bundle) — no need to bring our own client.
  • Corpus-scoped search is a thin filter on the existing
  • literature-search route.

    Dependencies

    • q-integ-orcid-oauth-claim — supplies researcher_identity.
    • q-perf-deferred-work-queue — runs the import.
    • pyzotero skill (already in vendor/kdense-skills).

    Dependents

    • q-integ-hypothesis-is-annotations — annotations created on a
    paper that lives in the viewer's Zotero corpus get higher
    trust weight.

    Work Log

    2026-04-27 — Implementation (task:c6bdb4f4)

    Delivered:

    • migrations/20260427_zotero_corpora.sqlzotero_corpora + zotero_corpus_items
    tables applied to live DB. Used actor_id TEXT REFERENCES actors(id) instead of
    researcher_id REFERENCES researcher_identity(id) (that table not yet created by
    q-integ-orcid-oauth-claim). A migration can remap to researcher_identity later.

    • scidex/atlas/zotero_import.py — full importer module:
    - Fernet encryption for stored API keys (consistent with wallet_manager.py)
    - store_credentials / list_collections / import_collection / _run_import
    - @register("zotero_import_collection") deferred-queue handler
    - corpus_paper_ids / actor_corpus_contains_paper / list_corpora helpers

    • API routes added to api.py:
    - POST /api/researcher/zotero/connect — store encrypted credentials
    - GET /api/researcher/zotero/collections — list Zotero collections
    - POST /api/researcher/zotero/import — trigger deferred import
    - GET /api/researcher/zotero/corpora — list imported corpora + status
    - GET /api/researcher/zotero/corpora/{id}/status — poll one corpus
    - GET /api/papers/search?corpus=<id>&q=<query> — corpus-scoped paper search
    - GET /api/gaps/corpus-seeded?corpus=<id> — gaps scoped to corpus papers
    - GET /researcher/zotero — connection UI page

    • Hypothesis page — client-side JS badge injected: for each PMID link on the
    hypothesis detail page, checks viewer's synced corpora and adds
    "📚 In your library" badge.

    • Contributor profile — client-side JS: "Curated Literature" panel rendered if
    the authenticated viewer has synced corpora.

    • tests/test_zotero_import.py — 11 tests: encryption round-trip, list_collections
    (stub), imports_items, dedup on reimport, skip attachments, DOI resolution,
    corpus_paper_ids. All pass.

    Deviations from spec:

    • researcher_identityactors FK (upstream dependency not yet created)
    • No live smoke test of 50-item collection (no real Zotero test key available)
    • DOI→PMID via OpenAlex: fallback to direct INSERT (without PMID lookup) since
    the OpenAlex adapter isn't wired as a standalone utility; full PMID enrichment
    can be added when the paper is subsequently fetched via paper_cache.

    Sibling Tasks in Quest (Atlas) ↗