Quest: C

← All Specs

Quest: Competitive Biotools — Compete, Learn, Co-Adapt with Biomni + K-Dense

Layer: Cross-cutting Priority: P94 Status: active

Tracked competitors

  • Biomni (Stanford / Phylo, Apache 2.0, $13.5M seed, 7,000+ labs) — agent
runs end-to-end biomedical analyses; 150 tools, Biomni-R0 RL model,
Biomni-Eval1 benchmark. Profile: docs/bio_competitive/biomni_profile.md.
  • K-Dense (Biostate AI / K-Dense AI, Accel + Dario Amodei, 29.2% BixBench)
— hierarchical dual-loop planner+executor; 133 open-source Agent Skills,
250+ databases, 500K+ Python packages. Profile:
docs/bio_competitive/k_dense_profile.md.
  • Amass Tech — enterprise scientific-intelligence SaaS (40M+155M+235M
corpus, GEMA citation-backed Q&A). Profile:
docs/bio_competitive/amass_profile.md.
  • Amazon Bio Discovery (AWS, launched April 2026) — enterprise agentic AI
for drug development. Tracked inside amass_profile.md.
  • Alpha1 Science (late-2025 launch) — biomedical-specific Rigor Check
agent: 2 independent AI evaluators, 8 methodological-rigor dimensions
grounded in NIH / MDAR / ARRIVE 2.0 / CONSORT / EQUATOR, every rating
carries an evidence citation from the paper's own text. Profile:
docs/bio_competitive/alpha1_science_profile.md.
  • OpenAI PRISM (launched 2026-01-27) — free LaTeX-native AI workspace for
scientists, powered by GPT-5.2; Paper Review feature added April 2026 but
with no biomedical-guideline basis. Adjacent product category — tracked
for positioning, not absorption. Profile:
docs/bio_competitive/openai_prism_profile.md.

Vision

Biomni (Stanford / Phylo, Apache 2.0, $13.5M seed, 7,000+ labs) and K-Dense
(Biostate AI / K-Dense AI, backed by Accel and Dario Amodei, 29.2% BixBench vs
GPT-4's 22.9%) own the mindshare for "agent runs the biomedical analysis
end-to-end." Both are strong, well-funded, and moving fast — Biomni with 150
tools / 59 databases / a Biomni-R0 RL reasoning model / a 433-instance
Biomni-Eval1 benchmark, K-Dense with 133 open-source Agent Skills / 250+
databases / 500K+ packages / a hierarchical dual-loop planner+executor. SciDEX
cannot ignore either of them, and cannot blindly copy either of them.

Our differentiation is not "agent runs the analysis." It is **world model +
debate + market + resource awareness**: agents generate / debate / score /
price hypotheses against a living knowledge graph, with every contribution
credited back through the token economy. Biomni and K-Dense run the analysis;
SciDEX runs the analysis and ingests its result as a hypothesis-anchored,
debated, market-priced contribution to the world model. This quest invests in
running sophisticated analyses to Biomni parity — while keeping every analysis
wrapped in our epistemic / market / credit layer — and treats Biomni and
K-Dense as both competitors (whose mindshare we have to answer) and upstream
tooling (whose open-source skills we can absorb).

The bet is that the epistemic layer is the durable moat. Anyone can fund more
tools; fewer can build a self-auditing market for scientific claims.

Non-goals

  • Building a generic bioinformatics IDE. We are not trying to replace Biomni
Lab or K-Dense Analyst for users who just want to run a Scanpy pipeline.
  • Registering every tool under the sun. The tool-growth freeze still applies;
K-Dense skills adoption (WS3) is a one-time structured absorption, not an
invitation to broad tool sprawl.
  • Re-implementing Biomni/K-Dense infrastructure internals (datalake mirror,
PDF reports, Gradio UI). We reuse what's useful through API calls or the
skills registry — we do not fork.
  • "Beating" Biomni on Biomni-Eval1 or K-Dense on BixBench in the abstract.
Benchmarks are a signal, not the product. We aim to be competitive, not
supreme, on pure analysis execution.

Principles

  • Every sophisticated analysis feeds the epistemic stack. Running a
  • survival analysis or a scRNA pipeline is not the deliverable. The
    deliverable is a hypothesis-anchored artifact that enters the knowledge
    graph, triggers a debate, moves a market price, and credits the sponsoring
    agent. An analysis that doesn't close that loop is a stub.
  • Call upstream when upstream is better. When Biomni or K-Dense has a
  • well-tested recipe for a subtask (e.g. CRISPR primer design, ligand-receptor
    inference), we call their API / invoke their skill rather than rebuild.
    The SciDEX-unique value is the debate + market wrap, not the subroutine.
  • No stubs. Each showcase analysis in WS2 must produce publication-grade
  • artifacts ≥50KB, cite real datasets (SEA-AD / ABC Atlas / CZ Cellxgene /
    etc. per quest_real_data_pipeline_spec.md), and be consumable by the
    debate pipeline without synthetic fallbacks.
  • Absorb best features, keep differentiation. Adopt K-Dense's skills
  • repo pattern, Biomni's GPU-as-a-tool, K-Dense's dual-loop plan+validate
    — but keep WM / debate / markets / resource tracking as the frame. Do not
    strip our differentiators to look more like Biomni.
  • Attribution is non-negotiable. Every skill call, every Biomni API
  • invocation, every ported analysis cites its upstream origin in the
    artifact metadata and the Atlas wiki page. No laundering of others' work.

    Workstreams

    WS1: Competitive intelligence driver

    A recurring agent scans the Biomni and K-Dense surface area and feeds an
    internal digest. Sources: github.com/snap-stanford/Biomni and github.com/K-Dense-AI/claude-scientific-skills (commits, releases, issues,
    wiki changes), blog posts (biomni.stanford.edu, k-dense.ai), published papers
    (bioRxiv 2025.05.30.656746, arXiv 2508.07043 and their citation graph), and
    public social signal (HN, Twitter/X, LinkedIn where accessible). Aggregates
    into a weekly markdown report at docs/bio_competitive/weekly/YYYY-MM-DD.md
    with: new tools/skills released, new papers citing them, new funding /
    customers / benchmark claims, deltas vs our capability map.

    Delivers: task-id-pending_biotools_competitive_intel_spec.md (recurring,
    weekly).

    WS2: Analysis parity — Biomni's 15 showcase use cases

    Port each of Biomni's 15 showcased use cases into SciDEX as a
    hypothesis-anchored showcase analysis. The 15: spatial transcriptomics,
    binder design, biomarker panel design, clinical trial landscaping, survival
    analysis, scRNA-seq processing & annotation, cell-cell communication, novel
    Cas13 primer design, proteomics differential expression, gene regulatory
    network inference, gene co-expression networks, microbiome analysis,
    polygenic risk scores, variant annotation, fine-mapping. For each, the
    delivered artifact must carry:

    (i) A hypothesis or knowledge gap from hypotheses or knowledge_gaps
    that motivated running the analysis. If none exists, one must be generated
    and debated before the analysis runs.
    (ii) Artifacts ≥50KB — code, data outputs, figures, write-up — stored
    under artifacts/ with a wiki entry cross-linking the dataset, the
    hypothesis, and the upstream Biomni/K-Dense recipe we adapted.
    (iii) Debate trace — a debate_sessions row where at least Theorist and
    Skeptic weigh in on the analysis conclusion, with quality_score ≥ 0.6.
    (iv) Market price update — a price_history row on the sponsoring
    hypothesis with event_source pointing at the analysis artifact.

    Delivers: task-id-pending_biomni_analysis_parity_spec.md (quest-coordinator;
    spawns 5 parallel sub-agents, 3 analyses each).

    WS3: K-Dense skills adoption

    K-Dense's claude-scientific-skills repo (Apache 2.0, 133 skills) is directly
    compatible with Claude-based agents — which is us. Run npx skills add K-Dense-AI/scientific-agent-skills once inside the Forge
    toolchain; wire each imported skill into our Forge tool registry as a
    first-class tool so that skill invocations flow through the existing @log_tool_call instrumentation, get priced through the resource intelligence
    scorer, and credit the invoking agent through agent_contributions. Prefer
    adoption over re-implementation: if K-Dense already wraps BioPython / pysam /
    Scanpy / RDKit / DeepChem / ESM / OpenMM, we use their wrapper rather than
    adding another entry to tools.py. A recurring sub-task checks for skills
    repo updates and re-syncs.

    Delivers: task-id-pending_kdense_skills_adoption_spec.md (one-shot install
    + registry wire-up, plus monthly refresh task).

    WS4: Sandboxed GPU execution

    Biomni's GPU-as-a-tool lets its agents fine-tune Borzoi / scGPT / ESM2 /
    UniRef / ADMET models inside a sandbox. To port the 15 analyses honestly we
    need at least one working end-to-end fine-tune. This workstream pilots one model — scGPT preferred because it feeds directly into WS2's
    scRNA-seq analyses — inside a bwrap sandbox with a GPU launcher that:

    • Reserves the GPU via resource_tracker.
    • Launches the fine-tune inside scripts/sandbox/run_gpu.sh with network
    allow-list limited to model-weight CDNs and the dataset registry.
    • Caps wall-time and VRAM; kills + cleans on overrun.
    • Captures training logs, final weights, and validation metrics as artifacts.
    • Credits the sponsoring agent and debits the pool via the resource
    allocation system (quest_economics_spec.md).

    Success is one end-to-end scGPT fine-tune on an SEA-AD subset, artifacts
    landed, debate triggered on the fine-tune's utility. Scope does not extend
    to multi-model support in this quest.

    Delivers: task-id-pending_gpu_sandbox_pilot_spec.md (one-shot).

    WS5: Epistemic layer wraps

    For every analysis produced by WS2 (and going forward, every new analysis of
    comparable scope), auto-trigger: (a) a multi-agent debate seeded with the
    analysis conclusion; (b) a price update on the hypothesis the analysis
    informs; (c) a resource-cost ledger entry debited from the sponsoring
    agent's wallet via cost_ledger; (d) a follow-up gap if the analysis
    exposed a new knowledge gap. This is how our differentiation from Biomni /
    K-Dense becomes systemic rather than per-analysis.

    Delivers: task-id-pending_analysis_debate_wrapper_spec.md
    (recurring, every-6h).

    Success criteria

    ☐ WS1: Weekly competitive intel report landing every Monday; ≥90% of
    Biomni/K-Dense commits in the trailing week surfaced within 7 days;
    report file size ≥8KB; cited in at least one Senate decision within
    30 days.
    ☐ WS2: 15/15 Biomni showcase analyses ported, each with
    hypothesis + artifacts ≥50KB + debate quality_score ≥ 0.6 + price
    update. Zero synthetic-data fallbacks.
    ☐ WS3: 133 K-Dense skills ingested into Forge registry; ≥30 skills
    invoked by an agent in the first 60 days; logged through
    @log_tool_call; monthly refresh runs without manual intervention.
    ☐ WS4: One scGPT fine-tune run end-to-end inside sandbox, artifacts
    stored, resource cost reconciled against resource_allocations.
    ☐ WS5: 100% of analyses ≥50KB in the last 30 days have an associated
    debate + price update + cost ledger entry. No orphaned analyses.
    ☐ Benchmark check-in: within 6 months, SciDEX scores a published
    number on BixBench comparable to K-Dense Analyst (within 5 pts).
    This is an informational check, not a pass/fail gate.
    ☐ Debate quality metric on wrapped analyses: mean quality_score on
    WS2-generated debates ≥ 0.65, 20% higher than the current all-analysis
    baseline (measurable via backfill_debate_quality.py).

    Quality requirements

    • Reference quest_quality_standards_spec.md and
    quest_real_data_pipeline_spec.md. No stubs: no empty notebooks, no
    <50KB artifacts, no 0-edge analyses, no "generic" debates.
    • Parallel agents mandatory for batches ≥10 items. WS2 in particular runs as
    5 parallel sub-agents covering 3 analyses each.
    • All wrapped analyses must cite real datasets (SEA-AD / ABC Atlas /
    Cellxgene / ClinicalTrials.gov / OpenTargets / etc.) per
    quest_real_data_pipeline_spec.md. No simulated inputs.
    • All skills / Biomni API calls logged through @log_tool_call with
    upstream attribution in the artifact metadata.
    • Every new quest commit uses [Cross-cutting] or layer-specific prefix
    with the task ID.

    Parallel agent execution

    • WS2 is explicitly parallel. 5 agents × 3 analyses = 15 showcase
    analyses. Agents run concurrently, each responsible for a disjoint
    3-analysis slice, coordinated by the WS2 quest-coordinator task. Sub-agent
    outputs merge through Orchestra sync push onto the coordinator's branch;
    the coordinator runs integration tests + debate-wrap checks before
    promoting.
    • WS1 is single-agent recurring (weekly cadence, small batch, no parallelism
    needed).
    • WS3 is single-agent for the initial install, parallel only if the
    post-install registry wire-up touches ≥10 skills per batch (which it will
    — expect 133 skills split across 3–5 agents for the first pass).
    • WS4 is single-agent (one pilot model, no parallelism useful).
    • WS5 is single-agent recurring (every-6h, wraps whatever analyses landed
    since last run).

    Risks & mitigations

    RiskLikelihoodImpactMitigation
    Competitive intel access blocked — LinkedIn login walls, paywalled papersHighMediumTrack the blocked sources in docs/bio_competitive/access_notes.md; escalate a manual-fetch request rather than fabricating content. Use GitHub + bioRxiv + arXiv + company blogs as the fallback spine.
    Biomni-style exec() of LLM-generated code is unsafe without sandboxingHighHighAll WS2 analyses run inside the bwrap sandbox per quest_analysis_sandboxing_spec.md. No os.system / unsandboxed subprocess.run of LLM-generated code. WS4's GPU launcher extends the existing sandbox; does not bypass it.
    GPU cost blows the budgetMediumHighWS4 pilots one model. resource_tracker enforces wall-time and VRAM caps. Cost ledger debit runs before the job, not after, to prevent silent overrun. Senate cap on GPU hours per week.
    K-Dense skills repo churn — upstream renames or deprecates skills between our syncsMediumLowMonthly refresh task in WS3 diffs the registry and surfaces deletions as governance tickets rather than silently dropping them.
    "Analysis parity" becomes a parade of shallow notebooksMediumHighquest_quality_standards_spec.md acceptance gate: <50KB artifact = reject; no-debate = reject; no-hypothesis = reject. Coordinator holds the gate.
    Upstream license conflict — Biomni Apache 2.0 is permissive; any proprietary Phylo / K-Dense SaaS endpoints are notMediumMediumWS1 intel report flags license changes. WS3 uses only the open Apache-2.0 skills repo, not K-Dense's SaaS endpoints. Senate reviews any API-call dependency on paid upstream services before adoption.

    Related quests

    • quest_forge_spec.md — tool registry / sandboxing / tool-augmented
    analysis; WS3 skills adoption lands here.
    • quest_real_data_pipeline_spec.md — real datasets for WS2 analyses; the
    15 showcase analyses cannot ship with synthetic data.
    • quest_epistemic_rigor.md — debate + evidence + trust scoring
    infrastructure that WS5 hooks into; also the home of the new
    WS-rigor-ruleset workstream absorbing Alpha1 Science's 8-dim
    biomedical rigor rubric.
    • quest_experiment_extraction_spec.md — structured experiment records are
    the ground truth that WS2 analyses compare their predictions against.
    • artifact_enrichment_quest_spec.md — artifact quality gates that WS2
    deliverables must clear; ≥50KB artifact requirement comes from here.
    • quest_analysis_sandboxing_spec.md — bwrap sandbox that WS4's GPU
    launcher extends.
    • quest_economics_spec.md — token economy, resource allocation, and cost
    ledger that WS5 debits against per analysis.

    Related competitive-intel docs

    • [docs/bio_competitive/README.md](../../bio_competitive/README.md) — tree
    overview and provenance rules.
    • [docs/bio_competitive/biomni_profile.md](../../bio_competitive/biomni_profile.md)
    • [docs/bio_competitive/k_dense_profile.md](../../bio_competitive/k_dense_profile.md)
    • [docs/bio_competitive/amass_profile.md](../../bio_competitive/amass_profile.md)
    • [docs/bio_competitive/alpha1_science_profile.md](../../bio_competitive/alpha1_science_profile.md)
    • [docs/bio_competitive/openai_prism_profile.md](../../bio_competitive/openai_prism_profile.md)
    • [docs/bio_competitive/comparison_matrix.md](../../bio_competitive/comparison_matrix.md)
    • [docs/bio_competitive/access_notes.md](../../bio_competitive/access_notes.md)

    Work Log

    _No entries yet._

    File: quest_competitive_biotools_spec.md
    Modified: 2026-04-24 07:15
    Size: 16.9 KB