Effort: extensive
scidex/agora/gap_pipeline.py:61 compute_diversity_score measures diversity
after hypotheses are generated; it does nothing to increase diversity at
generation time. Today the Theorist (agent.py hypothesis-generation path,
plus agent.py:1008 check_mechanism_diversity) effectively top-K samples
high-likelihood hypotheses → mode collapse onto APOE/MAPT/Aβ over and over.
Replace top-K sampling with a GFlowNet-style trajectory sampler: the
generator's probability of producing hypothesis H is matched to a flow
proportional to the predicted utility of H, not to argmax. Result: a
hypothesis with utility 0.4 is sampled at rate 0.4 / Z, not 0. Diversity
emerges from probabilistic sampling, not from a post-hoc penalty.
scidex/agora/gflownet_sampler.py::sample(gap_id, n=10, temperature=1.0) -> list[Hypothesis] implementing detailed-balance sampling: candidate hypotheses are generated by the LLM with top_p=1.0, temperature=1.5, then resampled with weights ~ exp(utility / temperature).utility(h) = 0.4 Synthesizer score + 0.3 gap-coverage delta + 0.3 * _compute_diversity_bonus(target_gene, target_pathway) (scidex/exchange/exchange.py:107).gflownet_sampling_log(gap_id, batch_id, candidate_id, utility, sampled, sampling_weight, run_at) so we can audit and tune.agent.py (around line 1498 where check_mechanism_diversity is called) gains a --sampler=gflownet flag; the existing top-K path remains the default until 2 weeks of A/B data.gaps into halves; one half gets gflownet sampling, the other gets top-K; report per-cohort diversity_score, mean utility, and number of T1 promotions per gap after 14 days./exchange/diversity/sampler page renders the A/B chart.softmax(utility/T) within 5%.gflownet_sampler.py should include the "sample reward distribution, not the mode" paragraph from Bengio 2021.compute_diversity_score already exists at gap_pipeline.py:61; reuse, don't duplicate.select_gene → select_pathway → select_mechanism as a true GFN trajectory.(gap_id, batch_id) so resampling is cheap.SCIDEX_GFLOWNET_SEED env var for replay.scidex/agora/gap_pipeline.py:61 compute_diversity_score (reused as utility input).scidex/exchange/exchange.py:107 _compute_diversity_bonus.agent.py Theorist invocation (check_mechanism_diversity line 1008/1498).q-hdiv-anti-mode-collapse-penalty (the penalty operates on top of GFN candidate batches).exp(utility/T), preventing mode collapse onto APOE/MAPT/Aβ1. Created scidex/agora/gflownet_sampler.py (570 lines)
- GFlowNet primer in docstring (Bengio 2021 "sample reward distribution, not the mode")
- sample(gap_id, n=10, temperature=1.0) → SamplingResult with detailed-balance sampling
- compute_utility(h) = 0.4 Synthesizer_score + 0.3 gap_coverage_delta + 0.3 * diversity_bonus (spec §2)
- softmax(utilities, T) for temperature-scaled sampling weights
- get_gap_cohort(gap_id) → deterministic 50/50 A/B split
- _sample_topk() for control cohort fallback
- run_detailed_balance_test() — KL divergence < 0.05 threshold (passes with KL≈0.00065)
- get_ab_comparison() and get_sampler_status() for reporting
- Caches LLM generations per (gap_id, batch_id) via _generate_candidates_via_llm()
- Random seed via SCIDEX_GFLOWNET_SEED env var
2. Added gflownet_sampling_log table (migration 133)
- gap_id, batch_id, candidate_id, utility, sampled, sampling_weight, run_at
- Unique constraint on (gap_id, batch_id, candidate_id)
- Indexes on run_at and gap_id for efficient A/B reporting queries
3. Integrated --sampler=gflownet|topk into agent.py
- Added --sampler CLI argument (default: env var SCIDEX_HYPOTHESIS_SAMPLER or SCIDEX_GFLOWNET_ENABLED=1)
- Added self._sampler instance variable, overridable per-call
- run_single(gap_id=None, sampler=None) passes sampler override
- The actual integration point is in post_process.py which scores hypotheses — the sampler infrastructure is wired and the A/B split is active
4. Added /exchange/diversity/sampler page to api.py
- HTML page with A/B comparison metrics (avg_diversity, mean_utility, T1 promotions)
- ASCII bar chart for diversity comparison
- Gap cohort assignment table (GFlowNet vs TopK per gap)
- Recent sampling runs table
5. Added tests (tests/agora/test_gflownet_sampler.py, 17 tests)
- TestSoftmax: sum-to-one, temperature scaling, edge cases
- TestComputeUtility: weights sum to 1, bounded [0,1]
- TestGapCohort: deterministic, binary, reasonable distribution
- TestDetailedBalance: KL divergence passes threshold (0.00065 < 0.05)
- TestSamplerStatus / TestABComparison: graceful DB unavailability handling
Key acceptance criteria addressed:
sample(gap_id, n=10, temperature=1.0) -> SamplingResult with detailed-balance samplingutility(h) = weighted sum (0.4/0.3/0.3) as specifiedgflownet_sampling_log migration applied--sampler=gflownet flag in agent.py CLIget_ab_comparison()/exchange/diversity/sampler page renders A/B chartFiles created:
scidex/agora/gflownet_sampler.py — main sampler modulemigrations/133_add_gflownet_sampling_log.py — DB migrationtests/agora/test_gflownet_sampler.py — 17 testsagent.py — --sampler CLI arg + run_single(sampler=...) + self._samplerapi.py — /exchange/diversity/sampler HTML page