Quest: Competitive Biotools — Compete, Learn, Co-Adapt with Biomni + K-Dense
Layer: Cross-cutting
Priority: P94
Status: active
Tracked competitors
- Biomni (Stanford / Phylo, Apache 2.0, $13.5M seed, 7,000+ labs) — agent
runs end-to-end biomedical analyses; 150 tools, Biomni-R0 RL model,
Biomni-Eval1 benchmark. Profile:
docs/bio_competitive/biomni_profile.md.
- K-Dense (Biostate AI / K-Dense AI, Accel + Dario Amodei, 29.2% BixBench)
— hierarchical dual-loop planner+executor; 133 open-source Agent Skills,
250+ databases, 500K+ Python packages. Profile:
docs/bio_competitive/k_dense_profile.md.
- Amass Tech — enterprise scientific-intelligence SaaS (40M+155M+235M
corpus, GEMA citation-backed Q&A). Profile:
docs/bio_competitive/amass_profile.md.
- Amazon Bio Discovery (AWS, launched April 2026) — enterprise agentic AI
for drug development. Tracked inside
amass_profile.md.
- Alpha1 Science (late-2025 launch) — biomedical-specific Rigor Check
agent: 2 independent AI evaluators, 8 methodological-rigor dimensions
grounded in NIH / MDAR / ARRIVE 2.0 / CONSORT / EQUATOR, every rating
carries an evidence citation from the paper's own text. Profile:
docs/bio_competitive/alpha1_science_profile.md.
- OpenAI PRISM (launched 2026-01-27) — free LaTeX-native AI workspace for
scientists, powered by GPT-5.2; Paper Review feature added April 2026 but
with no biomedical-guideline basis. Adjacent product category — tracked
for positioning, not absorption. Profile:
docs/bio_competitive/openai_prism_profile.md.
Vision
Biomni (Stanford / Phylo, Apache 2.0, $13.5M seed, 7,000+ labs) and K-Dense
(Biostate AI / K-Dense AI, backed by Accel and Dario Amodei, 29.2% BixBench vs
GPT-4's 22.9%) own the mindshare for "agent runs the biomedical analysis
end-to-end." Both are strong, well-funded, and moving fast — Biomni with 150
tools / 59 databases / a Biomni-R0 RL reasoning model / a 433-instance
Biomni-Eval1 benchmark, K-Dense with 133 open-source Agent Skills / 250+
databases / 500K+ packages / a hierarchical dual-loop planner+executor. SciDEX
cannot ignore either of them, and cannot blindly copy either of them.
Our differentiation is not "agent runs the analysis." It is **world model +
debate + market + resource awareness**: agents generate / debate / score /
price hypotheses against a living knowledge graph, with every contribution
credited back through the token economy. Biomni and K-Dense run the analysis;
SciDEX runs the analysis and ingests its result as a hypothesis-anchored,
debated, market-priced contribution to the world model. This quest invests in
running sophisticated analyses to Biomni parity — while keeping every analysis
wrapped in our epistemic / market / credit layer — and treats Biomni and
K-Dense as both competitors (whose mindshare we have to answer) and upstream
tooling (whose open-source skills we can absorb).
The bet is that the epistemic layer is the durable moat. Anyone can fund more
tools; fewer can build a self-auditing market for scientific claims.
Non-goals
- Building a generic bioinformatics IDE. We are not trying to replace Biomni
Lab or K-Dense Analyst for users who just want to run a Scanpy pipeline.
- Registering every tool under the sun. The tool-growth freeze still applies;
K-Dense skills adoption (WS3) is a one-time structured absorption, not an
invitation to broad tool sprawl.
- Re-implementing Biomni/K-Dense infrastructure internals (datalake mirror,
PDF reports, Gradio UI). We reuse what's useful through API calls or the
skills registry — we do not fork.
- "Beating" Biomni on Biomni-Eval1 or K-Dense on BixBench in the abstract.
Benchmarks are a signal, not the product. We aim to be competitive, not
supreme, on pure analysis execution.
Principles
Every sophisticated analysis feeds the epistemic stack. Running a
survival analysis or a scRNA pipeline is not the deliverable. The
deliverable is a hypothesis-anchored artifact that enters the knowledge
graph, triggers a debate, moves a market price, and credits the sponsoring
agent. An analysis that doesn't close that loop is a stub.
Call upstream when upstream is better. When Biomni or K-Dense has a
well-tested recipe for a subtask (e.g. CRISPR primer design, ligand-receptor
inference), we call their API / invoke their skill rather than rebuild.
The SciDEX-unique value is the debate + market wrap, not the subroutine.
No stubs. Each showcase analysis in WS2 must produce publication-grade
artifacts ≥50KB, cite real datasets (SEA-AD / ABC Atlas / CZ Cellxgene /
etc. per
quest_real_data_pipeline_spec.md), and be consumable by the
debate pipeline without synthetic fallbacks.
Absorb best features, keep differentiation. Adopt K-Dense's skills
repo pattern, Biomni's GPU-as-a-tool, K-Dense's dual-loop plan+validate
— but keep WM / debate / markets / resource tracking as the frame. Do not
strip our differentiators to look more like Biomni.
Attribution is non-negotiable. Every skill call, every Biomni API
invocation, every ported analysis cites its upstream origin in the
artifact metadata and the Atlas wiki page. No laundering of others' work.
Workstreams
WS1: Competitive intelligence driver
A recurring agent scans the Biomni and K-Dense surface area and feeds an
internal digest. Sources: github.com/snap-stanford/Biomni and
github.com/K-Dense-AI/claude-scientific-skills (commits, releases, issues,
wiki changes), blog posts (biomni.stanford.edu, k-dense.ai), published papers
(bioRxiv 2025.05.30.656746, arXiv 2508.07043 and their citation graph), and
public social signal (HN, Twitter/X, LinkedIn where accessible). Aggregates
into a weekly markdown report at docs/bio_competitive/weekly/YYYY-MM-DD.md
with: new tools/skills released, new papers citing them, new funding /
customers / benchmark claims, deltas vs our capability map.
Delivers: task-id-pending_biotools_competitive_intel_spec.md (recurring,
weekly).
WS2: Analysis parity — Biomni's 15 showcase use cases
Port each of Biomni's 15 showcased use cases into SciDEX as a
hypothesis-anchored showcase analysis. The 15: spatial transcriptomics,
binder design, biomarker panel design, clinical trial landscaping, survival
analysis, scRNA-seq processing & annotation, cell-cell communication, novel
Cas13 primer design, proteomics differential expression, gene regulatory
network inference, gene co-expression networks, microbiome analysis,
polygenic risk scores, variant annotation, fine-mapping. For each, the
delivered artifact must carry:
(i) A hypothesis or knowledge gap from hypotheses or knowledge_gaps
that motivated running the analysis. If none exists, one must be generated
and debated before the analysis runs.
(ii) Artifacts ≥50KB — code, data outputs, figures, write-up — stored
under artifacts/ with a wiki entry cross-linking the dataset, the
hypothesis, and the upstream Biomni/K-Dense recipe we adapted.
(iii) Debate trace — a debate_sessions row where at least Theorist and
Skeptic weigh in on the analysis conclusion, with quality_score ≥ 0.6.
(iv) Market price update — a price_history row on the sponsoring
hypothesis with event_source pointing at the analysis artifact.
Delivers: task-id-pending_biomni_analysis_parity_spec.md (quest-coordinator;
spawns 5 parallel sub-agents, 3 analyses each).
WS3: K-Dense skills adoption
K-Dense's claude-scientific-skills repo (Apache 2.0, 133 skills) is directly
compatible with Claude-based agents — which is us. Run
npx skills add K-Dense-AI/scientific-agent-skills once inside the Forge
toolchain; wire each imported skill into our Forge tool registry as a
first-class tool so that skill invocations flow through the existing
@log_tool_call instrumentation, get priced through the resource intelligence
scorer, and credit the invoking agent through agent_contributions. Prefer
adoption over re-implementation: if K-Dense already wraps BioPython / pysam /
Scanpy / RDKit / DeepChem / ESM / OpenMM, we use their wrapper rather than
adding another entry to tools.py. A recurring sub-task checks for skills
repo updates and re-syncs.
Delivers: task-id-pending_kdense_skills_adoption_spec.md (one-shot install
+ registry wire-up, plus monthly refresh task).
WS4: Sandboxed GPU execution
Biomni's GPU-as-a-tool lets its agents fine-tune Borzoi / scGPT / ESM2 /
UniRef / ADMET models inside a sandbox. To port the 15 analyses honestly we
need at least one working end-to-end fine-tune. This workstream pilots
one model — scGPT preferred because it feeds directly into WS2's
scRNA-seq analyses — inside a bwrap sandbox with a GPU launcher that:
- Reserves the GPU via
resource_tracker.
- Launches the fine-tune inside
scripts/sandbox/run_gpu.sh with network
allow-list limited to model-weight CDNs and the dataset registry.
- Caps wall-time and VRAM; kills + cleans on overrun.
- Captures training logs, final weights, and validation metrics as artifacts.
- Credits the sponsoring agent and debits the pool via the resource
allocation system (
quest_economics_spec.md).
Success is one end-to-end scGPT fine-tune on an SEA-AD subset, artifacts
landed, debate triggered on the fine-tune's utility. Scope does not extend
to multi-model support in this quest.
Delivers: task-id-pending_gpu_sandbox_pilot_spec.md (one-shot).
WS5: Epistemic layer wraps
For every analysis produced by WS2 (and going forward, every new analysis of
comparable scope), auto-trigger: (a) a multi-agent debate seeded with the
analysis conclusion; (b) a price update on the hypothesis the analysis
informs; (c) a resource-cost ledger entry debited from the sponsoring
agent's wallet via cost_ledger; (d) a follow-up gap if the analysis
exposed a new knowledge gap. This is how our differentiation from Biomni /
K-Dense becomes systemic rather than per-analysis.
Delivers: task-id-pending_analysis_debate_wrapper_spec.md
(recurring, every-6h).
Success criteria
☐ WS1: Weekly competitive intel report landing every Monday; ≥90% of
Biomni/K-Dense commits in the trailing week surfaced within 7 days;
report file size ≥8KB; cited in at least one Senate decision within
30 days.
☐ WS2: 15/15 Biomni showcase analyses ported, each with
hypothesis + artifacts ≥50KB + debate
quality_score ≥ 0.6 + price
update. Zero synthetic-data fallbacks.
☐ WS3: 133 K-Dense skills ingested into Forge registry; ≥30 skills
invoked by an agent in the first 60 days; logged through
@log_tool_call; monthly refresh runs without manual intervention.
☐ WS4: One scGPT fine-tune run end-to-end inside sandbox, artifacts
stored, resource cost reconciled against
resource_allocations.
☐ WS5: 100% of analyses ≥50KB in the last 30 days have an associated
debate + price update + cost ledger entry. No orphaned analyses.
☐ Benchmark check-in: within 6 months, SciDEX scores a published
number on BixBench comparable to K-Dense Analyst (within 5 pts).
This is an informational check, not a pass/fail gate.
☐ Debate quality metric on wrapped analyses: mean quality_score on
WS2-generated debates ≥ 0.65, 20% higher than the current all-analysis
baseline (measurable via
backfill_debate_quality.py).
Quality requirements
- Reference
quest_quality_standards_spec.md and
quest_real_data_pipeline_spec.md. No stubs: no empty notebooks, no
<50KB artifacts, no 0-edge analyses, no "generic" debates.
- Parallel agents mandatory for batches ≥10 items. WS2 in particular runs as
5 parallel sub-agents covering 3 analyses each.
- All wrapped analyses must cite real datasets (SEA-AD / ABC Atlas /
Cellxgene / ClinicalTrials.gov / OpenTargets / etc.) per
quest_real_data_pipeline_spec.md. No simulated inputs.
- All skills / Biomni API calls logged through
@log_tool_call with
upstream attribution in the artifact metadata.
- Every new quest commit uses
[Cross-cutting] or layer-specific prefix
with the task ID.
Parallel agent execution
- WS2 is explicitly parallel. 5 agents × 3 analyses = 15 showcase
analyses. Agents run concurrently, each responsible for a disjoint
3-analysis slice, coordinated by the WS2 quest-coordinator task. Sub-agent
outputs merge through Orchestra
sync push onto the coordinator's branch;
the coordinator runs integration tests + debate-wrap checks before
promoting.
- WS1 is single-agent recurring (weekly cadence, small batch, no parallelism
needed).
- WS3 is single-agent for the initial install, parallel only if the
post-install registry wire-up touches ≥10 skills per batch (which it will
— expect 133 skills split across 3–5 agents for the first pass).
- WS4 is single-agent (one pilot model, no parallelism useful).
- WS5 is single-agent recurring (every-6h, wraps whatever analyses landed
since last run).
Risks & mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|
| Competitive intel access blocked — LinkedIn login walls, paywalled papers | High | Medium | Track the blocked sources in docs/bio_competitive/access_notes.md; escalate a manual-fetch request rather than fabricating content. Use GitHub + bioRxiv + arXiv + company blogs as the fallback spine. |
Biomni-style exec() of LLM-generated code is unsafe without sandboxing | High | High | All WS2 analyses run inside the bwrap sandbox per quest_analysis_sandboxing_spec.md. No os.system / unsandboxed subprocess.run of LLM-generated code. WS4's GPU launcher extends the existing sandbox; does not bypass it. |
| GPU cost blows the budget | Medium | High | WS4 pilots one model. resource_tracker enforces wall-time and VRAM caps. Cost ledger debit runs before the job, not after, to prevent silent overrun. Senate cap on GPU hours per week. |
| K-Dense skills repo churn — upstream renames or deprecates skills between our syncs | Medium | Low | Monthly refresh task in WS3 diffs the registry and surfaces deletions as governance tickets rather than silently dropping them. |
| "Analysis parity" becomes a parade of shallow notebooks | Medium | High | quest_quality_standards_spec.md acceptance gate: <50KB artifact = reject; no-debate = reject; no-hypothesis = reject. Coordinator holds the gate. |
| Upstream license conflict — Biomni Apache 2.0 is permissive; any proprietary Phylo / K-Dense SaaS endpoints are not | Medium | Medium | WS1 intel report flags license changes. WS3 uses only the open Apache-2.0 skills repo, not K-Dense's SaaS endpoints. Senate reviews any API-call dependency on paid upstream services before adoption. |
Related quests
quest_forge_spec.md — tool registry / sandboxing / tool-augmented
analysis; WS3 skills adoption lands here.
quest_real_data_pipeline_spec.md — real datasets for WS2 analyses; the
15 showcase analyses cannot ship with synthetic data.
quest_epistemic_rigor.md — debate + evidence + trust scoring
infrastructure that WS5 hooks into; also the home of the new
WS-rigor-ruleset workstream absorbing Alpha1 Science's 8-dim
biomedical rigor rubric.
quest_experiment_extraction_spec.md — structured experiment records are
the ground truth that WS2 analyses compare their predictions against.
artifact_enrichment_quest_spec.md — artifact quality gates that WS2
deliverables must clear; ≥50KB artifact requirement comes from here.
quest_analysis_sandboxing_spec.md — bwrap sandbox that WS4's GPU
launcher extends.
quest_economics_spec.md — token economy, resource allocation, and cost
ledger that WS5 debits against per analysis.
Related competitive-intel docs
- [
docs/bio_competitive/README.md](../../bio_competitive/README.md) — tree
overview and provenance rules.
- [
docs/bio_competitive/biomni_profile.md](../../bio_competitive/biomni_profile.md)
- [
docs/bio_competitive/k_dense_profile.md](../../bio_competitive/k_dense_profile.md)
- [
docs/bio_competitive/amass_profile.md](../../bio_competitive/amass_profile.md)
- [
docs/bio_competitive/alpha1_science_profile.md](../../bio_competitive/alpha1_science_profile.md)
- [
docs/bio_competitive/openai_prism_profile.md](../../bio_competitive/openai_prism_profile.md)
- [
docs/bio_competitive/comparison_matrix.md](../../bio_competitive/comparison_matrix.md)
- [
docs/bio_competitive/access_notes.md](../../bio_competitive/access_notes.md)
Work Log
_No entries yet._