validator LLM call crashed: RuntimeError("All LLM providers failed. Last error: CLI harness codex_cli returned exit 1: Error: No such file or directory (os error 2)\n. Tried: ['minimax', 'glm', 'claude_cli', 'codex_cli']. Check API keys and provider availability.")
> Goal. Build high-fidelity personas of named scientists — starting with 9 Allen Institute affiliates — so SciDEX debates can be scoped to "what would a working scientist actually want / critique / publish?". Replaces the dormant upstream mimeo submodule (never initialized on this host) with a native builder that uses the existing scidex/core/llm.py provider-swap machinery so personas can be minted via Claude, Codex, MiniMax, or any provider in that chain.
Parent: [scidex_economy_design_spec.md](scidex_economy_design_spec.md).
Prereqs: [quest_paper_accumulation_spec.md](quest_paper_accumulation_spec.md).
Consumers: [quest_allen_experiments_spec.md](quest_allen_experiments_spec.md), every multi_iter artifact-generation quest that needs scientist roles in its debate round.
---
Existing state (as of 2026-04-24):
/home/ubuntu/scidex/personas/ (theorist, skeptic, methodologist, synthesizer, falsifier, replicator, domain-expert, evidence-auditor, statistician). Each is a SKILL.md bundle read by the Agora debate engine.mimeo is wired as a git submodule at vendor/mimeo but never initialized; the SciDEX wrapper scidex/ingest/mimeo_runner.py shells out to a uv run mimeo … binary that doesn't exist on this host. Even if initialized, mimeo hard-codes OPENROUTER_API_KEY + PARALLEL_API_KEY.scidex/core/llm.py already routes through Claude / MiniMax / GLM / OpenRouter / Anthropic / Codex-CLI with automatic fallback — the exact provider swap this quest needs.mimeo_runner.py:114-126 with a native Python pipeline that produces the same SKILL.md output (so persona_registry.register() and the debate engine keep working unchanged) but draws on paper cache + Allen profile + provider-swap LLM instead of Parallel.ai + OpenRouter.First 19 personas built. Seventeen are Allen Institute affiliates (or likely-Allen — the builder confirms on first fetch); the list includes Christof Koch, emeritus founding Chief Scientist of the Allen Institute for Brain Science, whose perspective as a consciousness researcher and long-time strategist is invaluable for grounding the economy's valuation logic. Two are from the Allen Institute for AI (AI2) — Peter Clark and Dan Weld — who bring AI-research perspectives complementary to Allen's wet-lab / brain-science side. Each points to the scientist's institutional profile URL plus ORCID (where known); for the scientists added late (Torgerson, Skene, Gustavson, Li, Mufti, Pinglay, Pepper) the builder resolves profile URL + role from the Allen Institute site during the first build and can emit a needs_curator sub-task if a scientist is ambiguous or not actually Allen-affiliated.
Andy Hickl's persona is built from the same pipeline but with a different scidex_skills set ([ai_infrastructure, ml_pipelines, data_platforms, research_tooling]) and scidex_default_actions oriented toward technical-feasibility critique ([review_computational_plan, evaluate_ml_approach, flag_tooling_risks]). In debates he reads like an engineering director, not a bench scientist — useful as an orthogonal voice when proposed experiments have heavy computational / ML / data-platform components.
Peter Clark and Dan Weld (AI2) bring the "AI-for-science" perspective. Peter's persona reads with scidex_skills = [scientific_reasoning, commonsense_qa, knowledge_representation, evaluation_rigor] and greenlights artifacts where the reasoning chain is explicit and testable. Dan's persona reads with scidex_skills = [scholarly_discovery, literature_review, semantic_scholar, research_tools, human_ai_interaction] and greenlights artifacts where literature coverage is genuine and where the artifact would measurably improve a researcher's workflow. Both are especially valuable for evaluating the weight-vector + composite-value model artifacts (parent spec §2) — they can critique whether the valuation logic is itself well-reasoned.
Disambiguation for the AI2 pair:
@allenai.org.@allenai.org / @cs.washington.edu.alleninstitute.org/person/<slug>/. If the profile page 404s, the builder falls back to an Allen-site search + a name+"Allen Institute" web search, and opens a needs_curator sub-task tagged to this quest if the match is still ambiguous.@alleninstitute.org email, Allen-affiliated publications, or a curator pin).Per scientist, the builder consumes:
quest_paper_accumulation. Output of that quest is a papers/ directory with the scientist's tagged publications. Persona builder reads titles + abstracts + (when open-access) full text.alleninstitute.org/person/<slug>/ and parsed for role, bio, recent news blurbs, lab links.data_portal_hint column per scientist) — gives the builder the concrete artifacts the persona would care about.Every persona's SKILL.md references a local avatar so /showcase, /personas, and debate-transcript UIs can render a face next to the persona name. Photos live inside each persona bundle:
personas/<slug>/
SKILL.md
avatar.jpg (primary — prefer square crop)
avatar-attribution.json (source URL + license + fetched_at)
references/...The persona builder's photo step:
alleninstitute.org/person/<slug>/, extracts the <img> on the profile card (typically a CDN URL matching cdn.alleninstitute.org/…), downloads it, and stores as avatar.jpg.allenai.org/team/<slug> or semanticscholar.org/author/<id>; same extraction logic.person.orcidId → picture record if set; then a cautious Google Images query scoped to the institution domain. Never commits a photo from an unverified source — if the attribution isn't clean (institutional domain, CC / public-domain license, or explicit author consent), the builder leaves avatar.jpg absent and flags the persona with needs_curator_photo=true in the frontmatter so the UI can render a placeholder + "add photo" action.avatar-attribution.json records {source_url, fetched_at, license_statement, curator_verified: false}. Curator flips curator_verified to true after a human check. The UI shows attribution on hover.This step is part of the persona build task (blocked_by the paper-accumulation task, same as the rest of the build). Failures on the photo step are non-blocking — a persona without an avatar still admits; the UI just falls back to initials.
Same schema as /home/ubuntu/scidex/personas/theorist/SKILL.md (Apache-2.0 example). Validated by scidex/senate/personas/schemas.py:26-156 (pydantic PersonaMetadata + PersonaBundle). Key fields:
---
name: hongkui-zeng
description: Brain-cell-types expert — 20+ years at Allen Institute, lead on the mouse whole-brain cell atlas…
license: Apache-2.0
metadata:
author: SciDEX persona builder (2026-04-24)
version: 1.0.0
scidex_agent_type: scientist
scidex_layer: agora
scidex_persona_category: mimeo_generated
scidex_reputation_floor: 0.6
scidex_skills: [brain_cell_types, spatial_transcriptomics, mouse_connectomics]
scidex_default_actions: [critique_hypothesis, propose_experiment, evaluate_invention]
scidex_can_score: true
allen_profile_url: https://alleninstitute.org/person/hongkui-zeng/
orcid: 0000-0002-0326-5878
data_portals:
- portal.brain-map.org/atlases-and-data/bkp
- celltypes.brain-map.org
avatar: avatar.jpg # relative to persona dir; see §3a
avatar_source: https://alleninstitute.org/person/hongkui-zeng/
avatar_license: Allen Institute profile page (public)
---
# Hongkui Zeng — brain-cell-types expert
[markdown body — the body IS the LLM system prompt, per schemas.py:153-155]
## What I care about
…
## How I critique an invention
…
## What data / datasets I reach for
…The markdown body is the LLM-facing persona prompt. It combines:
New module: /home/ubuntu/scidex/scidex/ingest/mimeo_native.py (name preserves the mimeo_* convention so CLI entry points don't move).
# Pseudocode — single scientist build
def build_persona(slug: str, *, provider: str = "minimax") -> Path:
paper_set = load_papers_from_cache(slug) # quest_paper_accumulation
profile = fetch_allen_profile(slug) # alleninstitute.org
orcid_bio = fetch_orcid_if_present(slug)
avatar_info = fetch_profile_photo(slug, profile) # §3a
research_brief = llm.complete(system=RESEARCH_INTERESTS_SYS,
prompt=assemble(paper_set, profile),
provider=provider)
method_sig = llm.complete(system=METHODS_SIG_SYS,
prompt=assemble(paper_set.methods_sections),
provider=provider)
critique_styles = llm.complete(system=CRITIQUE_SYS,
prompt=assemble(paper_set.review_articles),
provider=provider)
skill_md = render_skill_template(
slug=slug, profile=profile, research=research_brief,
methods=method_sig, critique=critique_styles,
bio=orcid_bio, portals=ALLEN_PORTAL_HINTS.get(slug, []),
)
write_skill_bundle(slug, skill_md)
avatar=avatar_info,
)
write_skill_bundle(slug, skill_md, avatar_bytes=avatar_info.bytes)
return Path(f"/home/ubuntu/scidex/personas/{slug}/SKILL.md")Key points:
llm.complete(…, provider="minimax" | "claude_cli" | …) is the existing abstraction in scidex/core/llm.py — NO new LLM-call code needed.SCIDEX_LLM_PROVIDERS env var; override per call if the user wants a specific provider for persona building.render_skill_template is a new Jinja template shared by all personas so formatting stays consistent.mimeo_runner.py subprocess path is kept intact for backward compat; run_pipeline(name, provider="native") now dispatches to mimeo_native.build_persona(slug, provider=...).task_type = multi_iter, artifact_class = "persona". Per-scientist:
required_roles = ["researcher", "critic", "synthesizer"]
debate_rounds = 2
max_iterations = 2
target_cell = (scientist_slug)
acceptance_criteria = [
{metric: "schema_valid", op: "=", threshold: true}, # pydantic validation passes
{metric: "paper_grounding", op: ">=", threshold: 0.6}, # fraction of persona claims backed by cited papers
{metric: "allen_dataset_mentions", op: ">=", threshold: 2}, # references ≥2 specific portals
{metric: "orcid_verified", op: "=", threshold: true}, # where ORCID known
]"Critic" role fact-checks the draft persona against the paper corpus. Any unsupported claim triggers a retry with the critique appended.
quest_paper_accumulation): accumulate papers for scientist X. Emits paper set.quest_personas): build persona for X. blocked_by = [A].quest_allen_experiments): use the persona to propose experiments. blocked_by = [B].blocked_by.scidex_skills; if the persona gets invoked for a topic outside its specs, the reputation weight lowers.)WebFetch to confirm Apache-compatibility if anyone ever wants to initialize the submodule. Not blocking — the native builder doesn't need the upstream code.scidex/ingest/mimeo_native.py: profile-declared og:image / JSON-LD image URLs could bypass the institutional-domain allowlist entirely.<img> candidates, and rebuilds must delete stale avatar artifacts when no clean photo remains.mimeo_native.py so metadata-declared images are filtered through the same institutional-host checks as ordinary <img> candidates, and rebuilds now remove stale avatar.jpg / avatar-attribution.json files when no clean source survives.peter-clark via python -m scidex.ingest.mimeo_native "Peter Clark" --slug peter-clark --provider auto; the AI2 page only exposed https://allenai.org/team/images/Peter_Clark.jpg, which returned HTTP 404, so the bundle now falls back to initials with needs_curator_photo: "true" instead of preserving an unverified github.io photo.scidex.senate.personas.loader.load_bundle('peter-clark'); avatar_path is absent, needs_curator_photo=true, and the stale avatar artifacts were removed from personas/peter-clark/.[Senate] Persona build — Xiaojun Li (xiaojun-li) in the bound task worktree after reading AGENTS.md, CLAUDE.md, docs/planning/alignment-feedback-loops.md, and this quest spec.personas/xiaojun-li/ was absent in the current worktree, so the persona bundle still needed to be built here.xiaojun-li via python -m scidex.ingest.mimeo_native "Xiaojun Li" --slug xiaojun-li --provider auto.scidex/ingest/mimeo_native.py avatar extraction to prefer og:image / twitter:image metadata before generic <img> tags; without this, Allen profile pages like Xiaojun Li's could resolve to a decorative institute banner instead of the scientist headshot.personas/xiaojun-li/SKILL.md (4293 bytes), avatar.jpg (41,061 bytes, 600x600 JPEG), and avatar-attribution.json (398 bytes).scidex.senate.personas.loader.load_bundle('xiaojun-li') (can_score=True, reputation_floor=0.6, avatar=avatar.jpg).https://alleninstitute.org/wp-content/uploads/2022/12/xiaojun_li__web.jpg instead of the prior generic site image fallback; curator verification remains false pending human review.TBD; scidex_skills remains domain_expertise_tbd, consistent with the current builder seed map for late-added personas.personas/peter-clark/ is only partially staged on main: author_manifest.json exists, but SKILL.md, avatar.jpg, and avatar-attribution.json do not.author_manifest.json resolves Peter Clark to AI2 profile https://allenai.org/team/peterc with ORCID 0000-0002-8006-7015.scidex/ingest/mimeo_native.py, scidex/senate/personas/schemas.py, and this quest spec before editing.mimeo_native.py to read personas/<slug>/author_manifest.json for profile_url/orcid fallback so the task's required command (python -m scidex.ingest.mimeo_native "Peter Clark" --slug peter-clark --provider auto) works without manual --ai2-url, then build + validate the persona bundle and log the resulting artifacts here.mimeo_native.py to load manifest-backed profile_url/orcid values and to prefer absolute profile-declared image URLs (og:image / JSON-LD image) before falling back to <img> tags.peter-clark via python -m scidex.ingest.mimeo_native "Peter Clark" --slug peter-clark --provider auto.personas/peter-clark/SKILL.md (4479 bytes), avatar.jpg (19,001 bytes), avatar-attribution.json (361 bytes).scidex.senate.personas.loader.load_bundle('peter-clark').https://allenai.org/team/peterc, ORCID 0000-0002-8006-7015, skills scientific_reasoning, commonsense_qa, knowledge_representation, evaluation_rigor.https://pclark425.github.io/images/Peter_Clark.jpg with curator_verified=false; build warnings cleared after image-extraction fix.hongkui-zeng persona via python -m scidex.ingest.mimeo_native "Hongkui Zeng" --slug hongkui-zeng --allen-url "https://alleninstitute.org/person/hongkui-zeng/" --provider autopersonas/hongkui-zeng/SKILL.md (6114 bytes), avatar.jpg (40KB), avatar-attribution.json (411 bytes)_orcid_for_slug() to mimeo_native.py to emit confirmed ORCIDs (hongkui-zeng → 0000-0002-0326-5878); wired into template at build_persona()scidex.senate.personas.loader.load_bundle)790c4fedc — [Senate] Build Hongkui Zeng persona; add _orcid_for_slug() [task:d40b75de-d60d-485b-9abf-1fb83afeea00]avatar-attribution.json correctly records the source URL; curator review needed to confirm acceptable usejesse-gray persona using fixed mimeo_native.py (post f8aac49a3 avatar-extraction fix)cc752cca2, merged as 38c67110e) had avatar sourced from Allen theme texture (texture-bottom-sm.png, 40KB)wp-content/uploads/2023/01/mirall-allen-institute-hq-DSC03284_e-1_1920x1080.jpg (478KB)ru-gunawardane persona via python -m scidex.ingest.mimeo_native "Ru Gunawardane" --slug ru-gunawardane --allen-url "https://alleninstitute.org/person/ruwanthi-ru-gunawardane/" --provider autopersonas/ru-gunawardane/SKILL.md (5118 bytes), avatar.jpg (40KB), avatar-attribution.json (422 bytes)https://alleninstitute.org/person/ruwanthi-ru-gunawardane/ (note: slug is ruwanthi-ru-gunawardane but Allen URL uses full name ruwanthi-ru-gunawardane)PersonaBundle with all fields validated (name, description, skills_list, can_score, reputation_floor)ef97a2bf3 — [Senate] Build Ru Gunawardane persona; add _orcid_for_slug() [task:8b7007cf-6795-4058-a277-54d94b586d9b]andy-hickl persona via python -m scidex.ingest.mimeo_native "Andy Hickl" --slug andy-hickl --allen-url "https://alleninstitute.org/person/andrew-hickl/" --provider autopersonas/andy-hickl/SKILL.md (4455 bytes), avatar.jpg (478,824 bytes), avatar-attribution.json (434 bytes)https://alleninstitute.org/person/andrew-hickl/load_bundle('andy-hickl')32aea73bc — [Senate] Build Andy Hickl scientist persona — SKILL.md + avatar + attribution [task:bdba74f1-5f11-4b74-8745-cce357a9f234]andy-hickl avatar: replaced building/HQ placeholder photo with actual headshot from Allen Institute profilemirall-allen-institute-hq-DSC03284_e-1_1920x1080.jpg (478,824 bytes — same photo used as page background)Andy-Hickl-Headshot-1.jpg (40,555 bytes — confirmed headshot via og:image meta tag)TBD to empty string (no confirmed ORCID available)avatar_source in SKILL.md and source_url/bytes in avatar-attribution.jsonfd4efe572 — [Senate] Fix Andy Hickl avatar — replace building photo with real headshot [task:bdba74f1-5f11-4b74-8745-cce357a9f234]origin/main (had diverged by 15 commits — main had added andy-hickl, jesse-gray, ru-gunawardane, rui-costa personas that were absent when branch was created)cc7495ef3 (karel-svoboda build commit) onto rebased HEAD after resolving .orchestra-slot.json conflictddc0d5171 — [Senate] Build Karel Svoboda scientist persona via mimeo_native [task:318b8bff-52db-4c25-82e8-2e1decccc905]personas/karel-svoboda/SKILL.md (5841 bytes), avatar.jpg (40,700 bytes), avatar-attribution.json (414 bytes)https://alleninstitute.org/person/karel-svoboda-2/PersonaBundle with all fields validated (name, description, can_score=True, reputation_floor=0.6)avatar-attribution.json correctly records the source URL; curator review needed to confirm acceptable userui-costa persona via python -m scidex.ingest.mimeo_native "Rui Costa" --slug rui-costa --allen-url "https://alleninstitute.org/person/rui-costa/" --provider autoorcid: TBD and a 40KB theme texture as avatar0000-0003-0495-8374 and fetches real photo (478KB, mirall-allen-institute-hq-DSC03284_e-1_1920x1080.jpg)personas/rui-costa/SKILL.md (+13/-16 lines), avatar.jpg (40KB → 478KB), avatar-attribution.json (updated source_url + fetched_at)load_bundle('rui-costa')0000-0003-0495-8374 (confirmed via builder's known list)e257a9165 — [Senate] Rebuild Rui Costa persona — correct ORCID + real photo [task:4c597f41-7fe2-45aa-aa4a-f6e9f43da06b]claire-gustavson persona via python -m scidex.ingest.mimeo_native "Claire Gustafson" --slug claire-gustavson --allen-url "https://alleninstitute.org/person/claire-gustafson/" --provider autoclaire-gustafson (confirmed 200); author_manifest.json from paper-accumulation task confirmed ORCID and OpenAlex IDshttps://alleninstitute.org/person/claire-gustafson/ (HTTP 200 confirmed)personas/claire-gustavson/SKILL.md (4835 bytes), avatar.jpg (478,824 bytes), avatar-attribution.json (438 bytes)0000-0002-1437-6709 (confirmed via author_manifest.json from paper-accumulation task 24a1b067)avatar-attribution.json records source URL with curator_verified=false; note image may be institutional rather than personal portrait — curator review recommendedpersonas/shoaib-mufti/ did not exist yet in this worktree and no prior Shoaib bundle was present on current origin/main.AGENTS.md, CLAUDE.md, docs/planning/alignment-feedback-loops.md, and the quest spec before execution; inspected scidex/ingest/mimeo_native.py to confirm current builder behavior, avatar handling, and known slug skill seeds.scidex.senate.personas.schemas.PersonaBundle, and patch the builder only if build output exposed a concrete defect.shoaib-mufti persona via python -m scidex.ingest.mimeo_native "Shoaib Mufti" --slug shoaib-mufti --provider auto._extract_photo_url() selected the generic Allen HQ hero image (mirall-allen-institute-hq-DSC03284_e-1_1920x1080.jpg) instead of Shoaib Mufti's headshot because the real image was stored in data-lazy-src.scidex/ingest/mimeo_native.py to parse whole <img> tags, prefer data-lazy-src / data-src over placeholder src, and prioritize headshot-hinted alt/class attributes.personas/shoaib-mufti/SKILL.md (3842 bytes), avatar.jpg (27,315 bytes), avatar-attribution.json (403 bytes).https://alleninstitute.org/wp-content/uploads/2022/12/shoaib_mufti-5-new.jpg from https://alleninstitute.org/person/shoaib-mufti/.scidex.senate.personas.loader.load_bundle('shoaib-mufti').data_platforms,infrastructure,engineering_leadership.TBD (no confirmed ORCID in builder's known list yet).susan-kaech persona via python -m scidex.ingest.mimeo_native "Sue Kaech" --slug susan-kaech --allen-url "https://alleninstitute.org/person/susan-kaech-2/" --provider autopersonas/susan-kaech/SKILL.md (4364 bytes), avatar.jpg (159,882 bytes), avatar-attribution.json (396 bytes)https://alleninstitute.org/wp-content/uploads/2026/02/Sue-Kaech.jpg (real Allen headshot)scidex.senate.personas.loader.load_bundle('susan-kaech')t_cell_memory,immune_atlas,tumor_immunology0000-0002-3339-8698 (confirmed via OpenAlex; added in follow-up commit)84160fa5f (initial build), aec42f02f (ORCID + author_manifest.json)