{"quest":{"id":"q-epistemic-rigor","name":"Epistemic Rigor","description":"Evolve SciDEX toward rigorous scientific epistemology: 6-tier epistemic classification (T0 axiom → T5 contested) with asymmetric promotion/demotion (Gödel Machine principle), adversarial Falsifier persona as 5th debate round, proof-gated KG updates with contradiction checking against established knowledge, GFlowNet-inspired diverse hypothesis portfolio maintenance (anti-monoculture rules), pre-registration of predictions before analysis, replication status tracking (unreplicated/replicated/failed_replication), and calibration-adjusted believability weights. Every claim must be traceable to ground truth.","layer":"Cross-cutting","priority":95,"status":"active","created_at":"2026-04-03 23:02:52","updated_at":"2026-04-03 23:09:38"},"tasks":[{"id":"t-epistemic-promotion-ui","title":"Display epistemic tier promotion roadmap on hypothesis pages","description":"","status":"open","priority":30,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":null,"updated_at":"2026-04-25T07:47:19.383916+00:00","summary":"","completion_notes":"","last_error":"cli-reopen-manual: reopened — task was marked 'done' but has no task_runs row in (done/completed/success)","time_estimate_hours":0.0,"completion_count":0,"spec_path":"","provider":"claude","payload_json":"{}"},{"id":"de85e156-b7e3-4513-a586-4f3762a824f7","title":"[Atlas] Hypothesis predictions table — explicit falsifiability","description":"Add hypothesis_predictions table: each hypothesis gets explicit testable predictions with predicted outcomes, falsification criteria, and status tracking. Add falsifiable flag + predictions_count to hypotheses. Backfill existing hypotheses by extracting predictions from debate transcripts.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":95,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-15T23:42:22.095910+00:00","updated_at":"2026-04-15T23:42:22.095910+00:00","summary":"","completion_notes":"Auto-completed by supervisor after successful deploy to main","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"docs/planning/specs/de85e156_b7e_spec.md","provider":"any","payload_json":"{\"requirements\": {\"analysis\": 7, \"reasoning\": 6}, \"_stall_skip_providers\": [], \"_stall_requeued_by\": \"minimax\", \"_stall_requeued_at\": \"2026-04-14 00:16:29\", \"_stall_skip_at\": {}, \"_stall_skip_pruned_at\": \"2026-04-14T10:37:14.022390+00:00\", \"completion_shas\": [\"7abb57f3ea325a62100e7bf6bf3f13aae09c99fd\"], \"completion_shas_checked_at\": \"2026-04-15T23:42:22.077420+00:00\"}"},{"id":"5f27f904-d33b-4f57-9c05-110d93f20ad9","title":"[Forge] Experiment results and validation pipeline","description":"Add experiment_results table linking experiment outcomes to predictions. When an experiment completes, compare actual results to predicted outcomes and auto-update hypothesis scores. Build the prediction-vs-reality comparison engine. Connect to existing experiments table.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":93,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-14T04:09:57.790000+00:00","updated_at":"2026-04-14T04:09:57.790000+00:00","summary":"","completion_notes":"Auto-completed by supervisor after successful deploy to main","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"docs/planning/specs/5f27f904_d33_spec.md","provider":"any","payload_json":"{\"requirements\": {\"coding\": 8, \"safety\": 9}, \"completion_shas\": [\"ca83f0ce0d5abf9ba16e8a7e2c86aa2741fe8644\", \"cfa06e56a685f6f762b27697930d92f3502c39b8\"], \"completion_shas_checked_at\": \"2026-04-14T04:09:57.770302+00:00\", \"completion_shas_missing\": [\"641517efac9d1628df1b58395d04526bc467ce66\", \"3d4260ea9b9702ecf33ca0ac743cac06deb9eb4a\"]}"},{"id":"dca29466-2f7e-4a1e-848b-1049acc67ffc","title":"[Agora] epi-01-PRED: Add hypothesis_predictions table for falsifiable predictions","description":"Create hypothesis_predictions table linking hypotheses to explicit testable predictions. Each prediction should have: the claim, what would falsify it, expected measurement, confidence level. This is the foundational step for epistemic rigor — transforms hypotheses from scored claims to scientifically testable statements. Spec: quest_epistemic_rigor.md\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":92,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-16T09:55:07.071394+00:00","updated_at":"2026-04-16T09:55:07.071394+00:00","summary":"","completion_notes":"Audit misreported ORPHAN_BRANCH. Commit 835ef91f7 ([Agora] epi-01-PRED: Add hypothesis_predictions columns, API endpoints, and detail page section [task:dca29466-2f7e-4a1e-848b-1049acc67ffc]) IS on origin/main - verified via git merge-base --is-ancestor 835ef91f7 82b003414 (origin/main HEAD). Migration 056 applied, table has 988 rows, API endpoints functional.","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"docs/planning/specs/quest_epistemic_rigor.md","provider":"any","payload_json":"{\"requirements\": {\"analysis\": 7, \"reasoning\": 6}, \"completion_shas\": [\"835ef91f7c285957828dc04ab4c418be72b6cea3\"], \"completion_shas_checked_at\": \"2026-04-16T09:55:07.049627+00:00\"}"},{"id":"b5298ea7-69ec-47bd-9115-f2968a374f6d","title":"[Atlas] Evidence chain provenance — trace every claim to ground truth","description":"Add evidence_chains table: tracks full provenance of every evidence claim (who evaluated it, when, using what methodology, from what source). Normalize evidence_for/evidence_against from JSON into structured evidence_entries table with source_type (paper/experiment/debate/manual), source_id, methodology, evaluator_agent_id. Every claim traceable to ground data.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":92,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-16T07:50:34.493467+00:00","updated_at":"2026-04-16T07:50:34.493467+00:00","summary":"","completion_notes":"Auto-completed by supervisor after successful deploy to main","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"docs/planning/specs/b5298ea7_69e_spec.md","provider":"any","payload_json":"{\"completion_shas\": [\"891030ead9bce5da2a1be0a371fe3fb0381acb19\", \"a62e2c9186ce5e2876cea8d7bfcdf316e9f95f84\", \"fbc87b49cf5617a5df6808f6d9cecd51f84eb012\", \"d68545ac31d0391e86ff431b8a30aa4a8163a982\"], \"completion_shas_checked_at\": \"2026-04-16T07:50:34.472350+00:00\"}"},{"id":"edd66643-1332-4e67-b11e-0da9fadebaab","title":"[Atlas] Trust scores on knowledge graph edges","description":"Add trust_score (0-1) with confidence_interval to knowledge_edges. Compute trust from: evidence chain depth, source paper quality (impact factor, citations, recency), number of independent confirmations, contradiction count. Add trust propagation — downstream edges inherit reduced trust from upstream. Visualize trust on /graph.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":91,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-16T11:25:26.640847+00:00","updated_at":"2026-04-16T11:25:26.640847+00:00","summary":"","completion_notes":"Auto-completed by supervisor after successful deploy to main","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"docs/planning/specs/edd66643_133_spec.md","provider":"any","payload_json":"{\"requirements\": {\"coding\": 5, \"analysis\": 5}}"},{"id":"50b2dc6c-7208-4fd6-aae4-b997507e8515","title":"[Atlas] Hypothesis and experiment dependency graph","description":"Add hypothesis_dependencies table (hypothesis A depends on hypothesis B being true). Add experiment_prerequisites (experiment A must complete before B). Add evidence_supports (evidence X supports/contradicts claim Y). Build DAG traversal to compute critical paths — which experiments would most advance knowledge. Expose dependency graph via API and visualize.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":90,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-18T15:44:03.090168+00:00","updated_at":"2026-04-18T15:44:03.090168+00:00","summary":"","completion_notes":"Auto-completed by supervisor after successful deploy to main","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"docs/planning/specs/50b2dc6c_720_spec.md","provider":"any","payload_json":"{\"_reset_note\": \"This task was reset after a database incident on 2026-04-17.\\n\\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\\ncorruption. Some work done during Apr 16-17 may have been lost.\\n\\n**Before starting work:**\\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\\n\\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\\nSCIDEX_DB_BACKEND=postgres env var.\", \"_reset_at\": \"2026-04-18T06:29:22.046013+00:00\", \"_reset_from_status\": \"done\"}"},{"id":"31ab7f59-adf4-4092-828f-ddf11b609d8f","title":"[Senate] Evidence versioning and audit trail","description":"Add evidence_versions table tracking every change to evidence, scores, and claims with timestamps, actor, reason, and diff. Add confidence_justifications table — every score change must have a structured reason (new evidence, debate outcome, experiment result, recalibration). Build audit trail viewer on hypothesis detail pages showing full history.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":88,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-19T00:56:15.329089+00:00","updated_at":"2026-04-19T00:56:15.329089+00:00","summary":"","completion_notes":"","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"docs/planning/specs/31ab7f59_adf_spec.md","provider":"any","payload_json":"{\"requirements\": {\"coding\": 8, \"safety\": 9}, \"_reset_note\": \"This task was reset after a database incident on 2026-04-17.\\n\\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\\ncorruption. Some work done during Apr 16-17 may have been lost.\\n\\n**Before starting work:**\\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\\n\\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\\nSCIDEX_DB_BACKEND=postgres env var.\", \"_reset_at\": \"2026-04-18T06:29:22.046013+00:00\", \"_reset_from_status\": \"done\"}"},{"id":"08c73de3-03cc-481d-abb1-3bfa4a814ec1","title":"[Exchange] Combinable knowledge units — composable evidence blocks","description":"Design and implement knowledge_units — atomic, addressable, combinable evidence blocks. Each unit has: content, source provenance, trust score, version, links to hypotheses/experiments/papers. Units can be combined (A+B supports C), split, and reused across hypotheses. Build the foundation for reproducible, composable scientific reasoning.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":85,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-19T04:27:18.042106+00:00","updated_at":"2026-04-19T04:27:18.042106+00:00","summary":"","completion_notes":"Auto-completed by supervisor after successful deploy to main","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"docs/planning/specs/08c73de3_03c_spec.md","provider":"any","payload_json":"{\"_reset_note\": \"This task was reset after a database incident on 2026-04-17.\\n\\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\\ncorruption. Some work done during Apr 16-17 may have been lost.\\n\\n**Before starting work:**\\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\\n\\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\\nSCIDEX_DB_BACKEND=postgres env var.\", \"_reset_at\": \"2026-04-18T06:29:22.046013+00:00\", \"_reset_from_status\": \"done\"}"},{"id":"4bb367b9-9d69-4807-a215-01f4c3323007","title":"[Senate] Epistemic health dashboard and continuous improvement","description":"Build /epistemic dashboard showing: falsifiability coverage (% hypotheses with predictions), evidence provenance coverage (% claims traced to source), trust score distribution across KG, dependency graph health, evidence freshness. Add Senate proposals that auto-flag hypotheses missing predictions, stale evidence, or broken provenance chains. This is the self-improvement loop for epistemic rigor.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":83,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-19T05:06:00.146624+00:00","updated_at":"2026-04-19T05:06:00.146624+00:00","summary":"","completion_notes":"Auto-completed by supervisor after successful deploy to main","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"docs/planning/specs/4bb367b9_9d6_spec.md","provider":"any","payload_json":"{\"requirements\": {\"coding\": 7, \"reasoning\": 6}, \"_reset_note\": \"This task was reset after a database incident on 2026-04-17.\\n\\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\\ncorruption. Some work done during Apr 16-17 may have been lost.\\n\\n**Before starting work:**\\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\\n\\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\\nSCIDEX_DB_BACKEND=postgres env var.\", \"_reset_at\": \"2026-04-18T06:29:22.046013+00:00\", \"_reset_from_status\": \"done\"}"},{"id":"t-tier-classification","title":"Epistemic tier classification engine","description":"epistemic_tiers.py: classify_hypothesis(), classify_edge(), can_override() with asymmetric ratchet logic. Tiers: T0 axiom → T5 contested. Promotion requires evidence, demotion requires stronger counter-evidence.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":80,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-19T05:44:34.288372+00:00","updated_at":"2026-04-19T05:44:34.288372+00:00","summary":"","completion_notes":"","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"","provider":"any","payload_json":"{\"_reset_note\": \"This task was reset after a database incident on 2026-04-17.\\n\\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\\ncorruption. Some work done during Apr 16-17 may have been lost.\\n\\n**Before starting work:**\\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\\n\\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\\nSCIDEX_DB_BACKEND=postgres env var.\", \"_reset_at\": \"2026-04-18T06:29:22.046013+00:00\", \"_reset_from_status\": \"done\"}"},{"id":"t-tier-seeding","title":"Seed epistemic tiers on existing data","description":"Run run_tier_audit() on all 199 hypotheses. Result: 57 established, 26 supported, 66 provisional, 50 contested. Health score: 0.507.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":75,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-19T11:15:03.085944+00:00","updated_at":"2026-04-19T11:15:03.085944+00:00","summary":"","completion_notes":"","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"","provider":"any","payload_json":"{\"_reset_note\": \"This task was reset after a database incident on 2026-04-17.\\n\\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\\ncorruption. Some work done during Apr 16-17 may have been lost.\\n\\n**Before starting work:**\\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\\n\\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\\nSCIDEX_DB_BACKEND=postgres env var.\", \"_reset_at\": \"2026-04-18T06:29:22.046013+00:00\", \"_reset_from_status\": \"done\"}"},{"id":"t-contradiction-check","title":"Contradiction checking for KG edges","description":"contradiction_check() in epistemic_tiers.py — checks if proposed edge conflicts with T0/T1/T2 edges. API: /api/contradiction-check.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":70,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-20T16:31:55.700892+00:00","updated_at":"2026-04-20T16:31:55.700892+00:00","summary":"","completion_notes":"Auto-completed by supervisor after successful deploy to main","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"","provider":"any","payload_json":"{\"_reset_note\": \"This task was reset after a database incident on 2026-04-17.\\n\\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\\ncorruption. Some work done during Apr 16-17 may have been lost.\\n\\n**Before starting work:**\\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\\n\\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\\nSCIDEX_DB_BACKEND=postgres env var.\", \"_reset_at\": \"2026-04-18T06:29:22.046013+00:00\", \"_reset_from_status\": \"done\"}"},{"id":"t-epistemic-dashboard","title":"Epistemic health dashboard","description":"/senate/epistemic-health page with tier distribution bars, replication status, falsification results, health score. API: /api/epistemic-health.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":65,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-20T20:37:28.263194+00:00","updated_at":"2026-04-20T20:37:28.263194+00:00","summary":"","completion_notes":"Auto-completed by supervisor after successful deploy to main","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"","provider":"any","payload_json":"{\"_reset_note\": \"This task was reset after a database incident on 2026-04-17.\\n\\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\\ncorruption. Some work done during Apr 16-17 may have been lost.\\n\\n**Before starting work:**\\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\\n\\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\\nSCIDEX_DB_BACKEND=postgres env var.\", \"_reset_at\": \"2026-04-18T06:29:22.046013+00:00\", \"_reset_from_status\": \"done\"}"},{"id":"t-expiry-system","title":"Auto-expiry for speculative and provisional claims","description":"check_expiry(): speculative items > 30 days without promotion → archived. Provisional items > 90 days without evidence → demoted to speculative.\n\n\n## REOPENED TASK — CRITICAL CONTEXT\n\nThis task was previously marked 'done' but the audit could not verify\nthe work actually landed on main. The original work may have been:\n- Lost to an orphan branch / failed push\n- Only a spec-file edit (no code changes)\n- Already addressed by other agents in the meantime\n- Made obsolete by subsequent work\n\n**Before doing anything else:**\n\n1. **Re-evaluate the task in light of CURRENT main state.** Read the\n   spec and the relevant files on origin/main NOW. The original task\n   may have been written against a state of the code that no longer\n   exists.\n\n2. **Verify the task still advances SciDEX's aims.** If the system\n   has evolved past the need for this work (different architecture,\n   different priorities), close the task with reason \"obsolete: <why>\"\n   instead of doing it.\n\n3. **Check if it's already done.** Run `git log --grep='<task-id>'`\n   and read the related commits. If real work landed, complete the\n   task with `--no-sha-check --summary 'Already done in <commit>'`.\n\n4. **Make sure your changes don't regress recent functionality.** Many\n   agents have been working on this codebase. Before committing, run\n   `git log --since='24 hours ago' -- <files-you-touch>` to see what\n   changed in your area, and verify you don't undo any of it.\n\n5. **Stay scoped.** Only do what this specific task asks for. Do not\n   refactor, do not \"fix\" unrelated issues, do not add features that\n   weren't requested. Scope creep at this point is regression risk.\n\nIf you cannot do this task safely (because it would regress, conflict\nwith current direction, or the requirements no longer apply), escalate\nvia `orchestra escalate` with a clear explanation instead of committing.\n","status":"done","priority":60,"task_type":"one_shot","frequency":"","assigned_slot":"","started_at":null,"completed_at":"2026-04-20T20:24:46.903134+00:00","updated_at":"2026-04-20T20:24:46.903134+00:00","summary":"","completion_notes":"","last_error":"","time_estimate_hours":0.0,"completion_count":0,"spec_path":"","provider":"any","payload_json":"{\"_reset_note\": \"This task was reset after a database incident on 2026-04-17.\\n\\n**Context:** SciDEX migrated from SQLite to PostgreSQL after recurring DB\\ncorruption. Some work done during Apr 16-17 may have been lost.\\n\\n**Before starting work:**\\n1. Check if the task's goal is ALREADY satisfied (run the relevant checks)\\n2. Check `git log --all --grep=task:YOUR_TASK_ID` for prior commits\\n3. If complete, verify and mark done. If partial, continue. If not done, proceed.\\n\\n**DB change:** SciDEX now uses PostgreSQL. `get_db()` auto-detects via\\nSCIDEX_DB_BACKEND=postgres env var.\", \"_reset_at\": \"2026-04-18T06:29:22.046013+00:00\", \"_reset_from_status\": \"done\"}"}],"reviews":[],"effectiveness":{},"spec_content":"---\ntitle: \"Epistemic Rigor — Falsifiable Hypotheses, Evidence Chains, and Trust Propagation\"\ndescription: \"Quest to evolve SciDEX toward rigorous scientific epistemology\"\nstatus: active\npriority: 95\nlayer: Cross-cutting\nquest: q-epistemic-rigor\n---\n\n# Quest: Epistemic Rigor\n\n## Vision\n\nEvery claim in SciDEX should be **falsifiable, traceable, versioned, and trust-scored**.\n\nToday, hypotheses are scored by composite metrics but lack explicit testable predictions.\nEvidence is stored as JSON blobs without provenance. Knowledge graph edges have no trust scores.\nThere's no dependency structure between hypotheses, experiments, and evidence. Score changes\nhappen without structured justification.\n\nThis quest transforms SciDEX from \"we scored this hypothesis 0.73\" to \"this hypothesis\npredicts X, experiment Y tested it, the result was Z, which updated our confidence because\nof evidence chain A->B->C, each link traceable to ground truth with trust score T.\"\n\n## Current State (What Exists)\n\n| Component | Status | Gap |\n|-----------|--------|-----|\n| 10-dimension hypothesis scoring | Working | No explicit predictions or falsification criteria |\n| Evidence for/against (JSON) | Working | Unstructured, no provenance, no methodology |\n| Evidence validation (PMID relevance) | Working | Scores relevance, not trustworthiness |\n| Belief snapshots (time series) | Working | Tracks score evolution but not WHY scores changed |\n| Debate quality scoring | Working | Includes falsifiability dimension, but not structured |\n| Persona believability | Working | Per-dimension credibility, but no update from outcomes |\n| Experiments table | Working | No results storage, no prediction-vs-reality comparison |\n| Knowledge graph edges | Working | evidence_strength field but no trust model |\n| Price history with events | Working | event_source is free text, not structured provenance |\n| Quality gates | Working | Code quality, not epistemic quality |\n\n## Architecture: 8 Tasks in Dependency Order\n\n```\nTask 1: Predictions Table ──┬──> Task 2: Experiment Results\n                            │\nTask 3: Evidence Chains ────┼──> Task 4: Trust Scores on KG\n                            │\n                            ├──> Task 5: Dependency Graph\n                            │\nTask 6: Versioning/Audit ───┘\n                            \nTask 7: Knowledge Units (depends on 3, 4, 6)\n\nTask 8: Epistemic Dashboard (depends on all above)\n```\n\n## Key Design Principles\n\n1. **Ground truth anchoring**: Every claim must trace to a paper, experiment, or dataset\n2. **Bayesian updating**: New evidence should update confidence via structured reasoning, not just recompute\n3. **Falsifiability first**: Hypotheses without predictions are speculation, not science\n4. **Trust propagation**: Downstream conclusions inherit (diminished) trust from upstream evidence\n5. **Audit completeness**: Every score change has a structured justification\n6. **Composability**: Evidence blocks are atomic, addressable, and combinable\n7. **Incremental delivery**: Each task is independently valuable and deployable\n8. **Experiment-boosted ranking**: Hypotheses with explicit falsifiable predictions AND high-quality associated experiments (feasible, impactful) should receive a composite score boost. The system should reward hypotheses that are not just well-scored but actively testable with concrete, feasible experiments.\n\n## Hypothesis-Experiment Scoring Feedback Loop\n\nThe composite score formula should incorporate a **testability bonus** that rewards hypotheses linked to high-quality experiments:\n\n### Scoring Considerations\n\n1. **Falsifiability bonus**: Hypotheses with explicit, testable predictions (via `hypothesis_predictions` table) receive a score multiplier. A hypothesis that merely claims \"X causes Y\" ranks lower than one that predicts \"If X, then Y should be measurable as Z with effect size > threshold.\"\n\n2. **Experiment quality signal**: When a hypothesis has associated experiments (via `experiments.hypothesis_ids`), the experiment's own quality scores feed back into the hypothesis ranking:\n   - **Feasibility**: A hypothesis testable by a practical, affordable experiment is more valuable than one requiring impossible resources\n   - **Impact**: A hypothesis whose experiment would significantly update the world model (high information gain) ranks higher\n   - **Experiment composite** = feasibility × 0.4 + impact × 0.4 + novelty × 0.2\n\n3. **Combined boost formula**:\n   ```\n   testability_bonus = 0.0\n   if has_falsifiable_predictions:\n       testability_bonus += 0.05\n   if has_associated_experiments:\n       avg_experiment_quality = mean(exp.composite for exp in linked_experiments)\n       testability_bonus += 0.10 * avg_experiment_quality\n   adjusted_composite = base_composite + testability_bonus\n   ```\n\n4. **Virtuous cycle**: This creates a feedback loop where:\n   - Hypotheses with predictions attract experiment design\n   - High-quality experiments boost hypothesis ranking\n   - Higher-ranked hypotheses get more attention and resources\n   - More attention produces better predictions and experiments\n\n### Implementation Notes\n\n- The testability bonus should be computed in `post_process.py` alongside the existing composite score\n- Requires Task 1 (predictions table) and Task 2 (experiment results) to be complete\n- The bonus is additive, not multiplicative, to avoid runaway scores\n- Experiments without results still provide a feasibility/impact signal\n- Cap the total bonus at 0.15 to prevent gaming\n\n## Key Files (Existing)\n\n- `exchange.py` — 10-dim scoring, believability weighting, allocation\n- `senate_proposals.py` — Evidence strength scoring (papers + citations + recency)\n- `evidence_validator.py` — PMID relevance scoring via Claude Haiku\n- `belief_tracker.py` — Temporal belief snapshots, convergence metrics\n- `backfill_debate_quality.py` — 4-dim debate quality (includes falsifiability)\n- `quality_gates.py` — Pre-merge, post-completion, prevention gates\n- `market_dynamics.py` — LMSR price model with event logging\n\n## Key Tables (Existing)\n\n- `hypotheses` — evidence_for/against (JSON), evidence_validation_score\n- `hypothesis_papers` — Junction: hypothesis <-> PMID with direction/claim/strength\n- `papers` — PubMed metadata (pmid, title, abstract, citations, year)\n- `knowledge_edges` — source/target with evidence_strength (REAL)\n- `experiments` — hypothesis_ids (JSON), protocol, expected_outcomes, status\n- `debate_sessions` — transcript_json, quality_score\n- `debate_rounds` — evidence_cited (JSON), hypotheses_referenced\n- `belief_snapshots` — Time series of hypothesis score + evidence count\n- `price_history` — Score changes with event_type and event_source\n- `persona_believability` — Per-persona, per-dimension credibility\n- `edit_history` — Generic audit log (actor, content_type, diff, reason)\n\n## Workstreams\n\n### WS-rigor-ruleset — Absorb Alpha1 Science's 8-dimension biomedical rigor rubric\n\nAbsorb Alpha1 Science's 8-dimension rigor rubric (scientific premise,\nstudy design, blinding, power analysis, resource identification,\nstatistical reporting, data availability + 1 TBD) grounded in\n**NIH / MDAR / ARRIVE 2.0 / CONSORT / EQUATOR** biomedical reporting\nguidelines. Every hypothesis and analysis gets a rigor score card.\nEvery score carries an **evidence citation** pointing to the exact\ntext it was derived from. Score card becomes a new artifact type;\noptionally publishable to the SciDEX community view. See\n[`docs/bio_competitive/alpha1_science_profile.md`](../../bio_competitive/alpha1_science_profile.md).\n\n**Deliverables:**\n- Rubric dictionary: JSON mapping NIH / MDAR / ARRIVE 2.0 / CONSORT /\n  EQUATOR items to the 8 dimensions, with specific guideline-item\n  pointers per dimension.\n- Two-agent independent evaluation pipeline. Reuse the Skeptic\n  persona; the second agent is a separately-seeded Skeptic instance\n  (different prompt seed or different provider) to preserve\n  independence — matches Alpha1's 2-independent-agent design.\n- Rigor score card as a first-class artifact type in the `artifacts`\n  table, with lineage to the hypothesis or analysis it scores.\n- Evidence-citation schema: every score row carries a\n  `source_quote` + `source_location` (PMID / page / paragraph) so a\n  reviewer can verify the score without re-reading the whole paper.\n- Community-publish surface: optional Atlas wiki page per score card\n  for the ones authors opt to publish.\n- Recurring Senate-layer task to produce score cards for the\n  backlog; see\n  [`docs/planning/specs/task-id-pending_rigor_score_card_spec.md`](task-id-pending_rigor_score_card_spec.md).\n\n**Dependency on existing tasks:**\n- Task 3 (Evidence Chains) provides the structured-evidence table\n  the rigor score card cites.\n- Task 4 (Trust Scores on KG) consumes the rigor score to weight KG\n  edges drawn from papers with a published rigor score.\n- Task 7 (Knowledge Units) can include a rigor-score summary as one\n  of its atomic evidence blocks.\n\n**Success metric:** every hypothesis and every ≥50KB analysis in the\ntrailing 30 days has a rigor score card with ≥95% of scores carrying\nan evidence citation; inter-rater agreement between the two\nindependent agents ≥0.7 (Cohen's κ or equivalent).\n\n## Related Quests\n\n| Quest | Relationship |\n|-------|-------------|\n| **Experiment Extraction** (q-experiment-extraction) | Structured experiments are the ground-truth anchors for evidence chains |\n| **Artifact Debates** (q-artifact-debates) | Any artifact can be debated, accumulating evidence about its quality |\n| **Schema Governance** (q-schema-governance) | Evidence schemas evolve through governance to maintain integrity |\n| **Artifacts** (8db4834c-51e) | All evidence is stored as artifacts with lineage and provenance |\n| **Competitive Biotools** (q-competitive-biotools) | Tracks Alpha1 Science + PRISM; the WS-rigor-ruleset workstream absorbs Alpha1's 8-dim rubric into this quest |\n\n### How These Quests Interlock\n\nThe epistemic rigor vision depends on three supporting quests:\n\n1. **Experiment Extraction** provides the **ground truth** — structured experimental results\n   with p-values, effect sizes, and methodology that anchor evidence chains to reality.\n\n2. **Artifact Debates** provides the **self-correction mechanism** — when evidence is\n   contested, structured debates accumulate arguments for and against, and quality scores\n   update based on debate outcomes.\n\n3. **Schema Governance** provides the **integrity guarantee** — as we learn what evidence\n   structure is useful, schemas evolve through agent governance rather than ad-hoc changes,\n   ensuring data remains queryable and trustworthy.\n\nTogether: experiments ground claims in data, debates correct errors, governance maintains integrity.\n\n## Success Metrics\n\n- **Falsifiability coverage**: >80% of hypotheses have explicit testable predictions\n- **Provenance coverage**: >90% of evidence claims traced to source (paper/experiment/debate)\n- **Trust score coverage**: 100% of KG edges have computed trust scores\n- **Audit completeness**: 100% of score changes have structured justifications\n- **Dependency graph**: All hypothesis-experiment links are bidirectional and queryable\n- **Experiment grounding**: >500 structured experiments extracted, each traceable to source paper\n- **Debate breadth**: >5 artifact types have been debated (not just hypotheses)\n- **Schema integrity**: All artifact types have governed schemas with validation\n\n## Code Quality Requirements\n\nAll code produced by this quest must:\n- Use shared `database.py` for DB connections (not local `get_db()` definitions)\n- Include migration testing (`--dry-run` verification before apply)\n- Add tests for trust computation, Bayesian score updates, and provenance chain traversal\n- New modules must be < 500 lines; split if larger\n- No duplicate utility functions — reuse from `pubmed_utils.py`, `kg_extraction_utils.py`\n- Schema changes reviewed for index coverage and query performance\n","spec_html":"<div style=\"font-size:0.85rem\"><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h2 style=\"color:#4fc3f7;margin:1.5rem 0 0.6rem;font-size:1.2rem;font-weight:700\">Quest: Epistemic Rigor</h2></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Vision</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\">Every claim in SciDEX should be <strong style=\"color:#e0e0e0\">falsifiable, traceable, versioned, and trust-scored</strong>.</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\">Today, hypotheses are scored by composite metrics but lack explicit testable predictions.<br>Evidence is stored as JSON blobs without provenance. Knowledge graph edges have no trust scores.<br>There&#x27;s no dependency structure between hypotheses, experiments, and evidence. Score changes<br>happen without structured justification.</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\">This quest transforms SciDEX from &quot;we scored this hypothesis 0.73&quot; to &quot;this hypothesis<br>predicts X, experiment Y tested it, the result was Z, which updated our confidence because<br>of evidence chain A-&gt;B-&gt;C, each link traceable to ground truth with trust score T.&quot;</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Current State (What Exists)</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><table style=\"width:100%;border-collapse:collapse;margin:0.5rem 0;background:#151525;border-radius:6px;overflow:hidden\"><thead><tr><th style=\"padding:0.3rem 0.6rem;border-bottom:2px solid rgba(79,195,247,0.3);color:#4fc3f7;font-size:0.8rem;text-align:left\">Component</th><th style=\"padding:0.3rem 0.6rem;border-bottom:2px solid rgba(79,195,247,0.3);color:#4fc3f7;font-size:0.8rem;text-align:left\">Status</th><th style=\"padding:0.3rem 0.6rem;border-bottom:2px solid rgba(79,195,247,0.3);color:#4fc3f7;font-size:0.8rem;text-align:left\">Gap</th></tr></thead><tbody><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">10-dimension hypothesis scoring</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">No explicit predictions or falsification criteria</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Evidence for/against (JSON)</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Unstructured, no provenance, no methodology</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Evidence validation (PMID relevance)</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Scores relevance, not trustworthiness</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Belief snapshots (time series)</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Tracks score evolution but not WHY scores changed</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Debate quality scoring</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Includes falsifiability dimension, but not structured</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Persona believability</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Per-dimension credibility, but no update from outcomes</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Experiments table</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">No results storage, no prediction-vs-reality comparison</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Knowledge graph edges</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">evidence_strength field but no trust model</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Price history with events</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">event_source is free text, not structured provenance</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Quality gates</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Working</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Code quality, not epistemic quality</td></tr></tbody></table>\n<h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Architecture: 8 Tasks in Dependency Order</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><pre style=\"background:#0a0a14;padding:0.8rem;border-radius:6px;border:1px solid rgba(79,195,247,0.15);color:#e0e0e0;font-size:0.8rem;overflow-x:auto;margin:0.5rem 0;line-height:1.5\"><code>Task 1: Predictions Table ──┬──&gt; Task 2: Experiment Results\n                            │\nTask 3: Evidence Chains ────┼──&gt; Task 4: Trust Scores on KG\n                            │\n                            ├──&gt; Task 5: Dependency Graph\n                            │\nTask 6: Versioning/Audit ───┘\n                            \nTask 7: Knowledge Units (depends on 3, 4, 6)\n\nTask 8: Epistemic Dashboard (depends on all above)</code></pre></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Key Design Principles</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Ground truth anchoring</strong>: Every claim must trace to a paper, experiment, or dataset</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Bayesian updating</strong>: New evidence should update confidence via structured reasoning, not just recompute</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Falsifiability first</strong>: Hypotheses without predictions are speculation, not science</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Trust propagation</strong>: Downstream conclusions inherit (diminished) trust from upstream evidence</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Audit completeness</strong>: Every score change has a structured justification</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Composability</strong>: Evidence blocks are atomic, addressable, and combinable</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Incremental delivery</strong>: Each task is independently valuable and deployable</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Experiment-boosted ranking</strong>: Hypotheses with explicit falsifiable predictions AND high-quality associated experiments (feasible, impactful) should receive a composite score boost. The system should reward hypotheses that are not just well-scored but actively testable with concrete, feasible experiments.</li></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Hypothesis-Experiment Scoring Feedback Loop</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\">The composite score formula should incorporate a <strong style=\"color:#e0e0e0\">testability bonus</strong> that rewards hypotheses linked to high-quality experiments:</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h4 style=\"color:#e0e0e0;margin:1.2rem 0 0.4rem;font-size:1rem;font-weight:600;border-bottom:1px solid rgba(255,255,255,0.08);padding-bottom:0.2rem\">Scoring Considerations</h4></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Falsifiability bonus</strong>: Hypotheses with explicit, testable predictions (via <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">hypothesis_predictions</code> table) receive a score multiplier. A hypothesis that merely claims &quot;X causes Y&quot; ranks lower than one that predicts &quot;If X, then Y should be measurable as Z with effect size &gt; threshold.&quot;</li></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Experiment quality signal</strong>: When a hypothesis has associated experiments (via <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">experiments.hypothesis_ids</code>), the experiment&#x27;s own quality scores feed back into the hypothesis ranking:</li>\n   - <strong style=\"color:#e0e0e0\">Feasibility</strong>: A hypothesis testable by a practical, affordable experiment is more valuable than one requiring impossible resources<br>   - <strong style=\"color:#e0e0e0\">Impact</strong>: A hypothesis whose experiment would significantly update the world model (high information gain) ranks higher<br>   - <strong style=\"color:#e0e0e0\">Experiment composite</strong> = feasibility × 0.4 + impact × 0.4 + novelty × 0.2</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Combined boost formula</strong>:</li>\n   <br><pre style=\"background:#0a0a14;padding:0.8rem;border-radius:6px;border:1px solid rgba(79,195,247,0.15);color:#e0e0e0;font-size:0.8rem;overflow-x:auto;margin:0.5rem 0;line-height:1.5\"><code>testability_bonus = 0.0\n   if has_falsifiable_predictions:\n       testability_bonus += 0.05\n   if has_associated_experiments:\n       avg_experiment_quality = mean(exp.composite for exp in linked_experiments)\n       testability_bonus += 0.10 * avg_experiment_quality\n   adjusted_composite = base_composite + testability_bonus</code></pre></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Virtuous cycle</strong>: This creates a feedback loop where:</li>\n   - Hypotheses with predictions attract experiment design<br>   - High-quality experiments boost hypothesis ranking<br>   - Higher-ranked hypotheses get more attention and resources<br>   - More attention produces better predictions and experiments</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h4 style=\"color:#e0e0e0;margin:1.2rem 0 0.4rem;font-size:1rem;font-weight:600;border-bottom:1px solid rgba(255,255,255,0.08);padding-bottom:0.2rem\">Implementation Notes</h4></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">The testability bonus should be computed in <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">post_process.py</code> alongside the existing composite score</li>\n<li style=\"margin:0.15rem 0;color:#bbb\">Requires Task 1 (predictions table) and Task 2 (experiment results) to be complete</li>\n<li style=\"margin:0.15rem 0;color:#bbb\">The bonus is additive, not multiplicative, to avoid runaway scores</li>\n<li style=\"margin:0.15rem 0;color:#bbb\">Experiments without results still provide a feasibility/impact signal</li>\n<li style=\"margin:0.15rem 0;color:#bbb\">Cap the total bonus at 0.15 to prevent gaming</li>\n</ul>\n<h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Key Files (Existing)</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">exchange.py</code> — 10-dim scoring, believability weighting, allocation</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">senate_proposals.py</code> — Evidence strength scoring (papers + citations + recency)</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">evidence_validator.py</code> — PMID relevance scoring via Claude Haiku</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">belief_tracker.py</code> — Temporal belief snapshots, convergence metrics</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">backfill_debate_quality.py</code> — 4-dim debate quality (includes falsifiability)</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">quality_gates.py</code> — Pre-merge, post-completion, prevention gates</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">market_dynamics.py</code> — LMSR price model with event logging</li>\n</ul>\n<h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Key Tables (Existing)</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">hypotheses</code> — evidence_for/against (JSON), evidence_validation_score</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">hypothesis_papers</code> — Junction: hypothesis &lt;-&gt; PMID with direction/claim/strength</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">papers</code> — PubMed metadata (pmid, title, abstract, citations, year)</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">knowledge_edges</code> — source/target with evidence_strength (REAL)</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">experiments</code> — hypothesis_ids (JSON), protocol, expected_outcomes, status</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">debate_sessions</code> — transcript_json, quality_score</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">debate_rounds</code> — evidence_cited (JSON), hypotheses_referenced</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">belief_snapshots</code> — Time series of hypothesis score + evidence count</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">price_history</code> — Score changes with event_type and event_source</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">persona_believability</code> — Per-persona, per-dimension credibility</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">edit_history</code> — Generic audit log (actor, content_type, diff, reason)</li>\n</ul>\n<h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Workstreams</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h4 style=\"color:#e0e0e0;margin:1.2rem 0 0.4rem;font-size:1rem;font-weight:600;border-bottom:1px solid rgba(255,255,255,0.08);padding-bottom:0.2rem\">WS-rigor-ruleset — Absorb Alpha1 Science&#x27;s 8-dimension biomedical rigor rubric</h4></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\">Absorb Alpha1 Science&#x27;s 8-dimension rigor rubric (scientific premise,<br>study design, blinding, power analysis, resource identification,<br>statistical reporting, data availability + 1 TBD) grounded in\n<strong style=\"color:#e0e0e0\">NIH / MDAR / ARRIVE 2.0 / CONSORT / EQUATOR</strong> biomedical reporting<br>guidelines. Every hypothesis and analysis gets a rigor score card.<br>Every score carries an <strong style=\"color:#e0e0e0\">evidence citation</strong> pointing to the exact<br>text it was derived from. Score card becomes a new artifact type;<br>optionally publishable to the SciDEX community view. See<br>[<code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">docs/bio_competitive/alpha1_science_profile.md</code>](../../bio_competitive/alpha1_science_profile.md).</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><strong style=\"color:#e0e0e0\">Deliverables:</strong>\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Rubric dictionary: JSON mapping NIH / MDAR / ARRIVE 2.0 / CONSORT /</li>\n</ul>  EQUATOR items to the 8 dimensions, with specific guideline-item<br>  pointers per dimension.\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Two-agent independent evaluation pipeline. Reuse the Skeptic</li>\n</ul>  persona; the second agent is a separately-seeded Skeptic instance<br>  (different prompt seed or different provider) to preserve<br>  independence — matches Alpha1&#x27;s 2-independent-agent design.\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Rigor score card as a first-class artifact type in the <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">artifacts</code></li>\n</ul>  table, with lineage to the hypothesis or analysis it scores.\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Evidence-citation schema: every score row carries a</li>\n</ul>  <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">source_quote</code> + <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">source_location</code> (PMID / page / paragraph) so a<br>  reviewer can verify the score without re-reading the whole paper.\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Community-publish surface: optional Atlas wiki page per score card</li>\n</ul>  for the ones authors opt to publish.\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Recurring Senate-layer task to produce score cards for the</li>\n</ul>  backlog; see<br>  [<code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">docs/planning/specs/task-id-pending_rigor_score_card_spec.md</code>](task-id-pending_rigor_score_card_spec.md).</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><strong style=\"color:#e0e0e0\">Dependency on existing tasks:</strong>\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Task 3 (Evidence Chains) provides the structured-evidence table</li>\n</ul>  the rigor score card cites.\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Task 4 (Trust Scores on KG) consumes the rigor score to weight KG</li>\n</ul>  edges drawn from papers with a published rigor score.\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Task 7 (Knowledge Units) can include a rigor-score summary as one</li>\n</ul>  of its atomic evidence blocks.</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><strong style=\"color:#e0e0e0\">Success metric:</strong> every hypothesis and every ≥50KB analysis in the<br>trailing 30 days has a rigor score card with ≥95% of scores carrying<br>an evidence citation; inter-rater agreement between the two<br>independent agents ≥0.7 (Cohen&#x27;s κ or equivalent).</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Related Quests</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><table style=\"width:100%;border-collapse:collapse;margin:0.5rem 0;background:#151525;border-radius:6px;overflow:hidden\"><thead><tr><th style=\"padding:0.3rem 0.6rem;border-bottom:2px solid rgba(79,195,247,0.3);color:#4fc3f7;font-size:0.8rem;text-align:left\">Quest</th><th style=\"padding:0.3rem 0.6rem;border-bottom:2px solid rgba(79,195,247,0.3);color:#4fc3f7;font-size:0.8rem;text-align:left\">Relationship</th></tr></thead><tbody><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\"><strong style=\"color:#e0e0e0\">Experiment Extraction</strong> (q-experiment-extraction)</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Structured experiments are the ground-truth anchors for evidence chains</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\"><strong style=\"color:#e0e0e0\">Artifact Debates</strong> (q-artifact-debates)</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Any artifact can be debated, accumulating evidence about its quality</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\"><strong style=\"color:#e0e0e0\">Schema Governance</strong> (q-schema-governance)</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Evidence schemas evolve through governance to maintain integrity</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\"><strong style=\"color:#e0e0e0\">Artifacts</strong> (8db4834c-51e)</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">All evidence is stored as artifacts with lineage and provenance</td></tr><tr><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\"><strong style=\"color:#e0e0e0\">Competitive Biotools</strong> (q-competitive-biotools)</td><td style=\"padding:0.3rem 0.6rem;border-bottom:1px solid rgba(255,255,255,0.05);color:#bbb;font-size:0.8rem\">Tracks Alpha1 Science + PRISM; the WS-rigor-ruleset workstream absorbs Alpha1&#x27;s 8-dim rubric into this quest</td></tr></tbody></table>\n<h4 style=\"color:#e0e0e0;margin:1.2rem 0 0.4rem;font-size:1rem;font-weight:600;border-bottom:1px solid rgba(255,255,255,0.08);padding-bottom:0.2rem\">How These Quests Interlock</h4></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\">The epistemic rigor vision depends on three supporting quests:</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Experiment Extraction</strong> provides the <strong style=\"color:#e0e0e0\">ground truth</strong> — structured experimental results</li>\n   with p-values, effect sizes, and methodology that anchor evidence chains to reality.</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Artifact Debates</strong> provides the <strong style=\"color:#e0e0e0\">self-correction mechanism</strong> — when evidence is</li>\n   contested, structured debates accumulate arguments for and against, and quality scores<br>   update based on debate outcomes.</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Schema Governance</strong> provides the <strong style=\"color:#e0e0e0\">integrity guarantee</strong> — as we learn what evidence</li>\n   structure is useful, schemas evolve through agent governance rather than ad-hoc changes,<br>   ensuring data remains queryable and trustworthy.</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\">Together: experiments ground claims in data, debates correct errors, governance maintains integrity.</p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Success Metrics</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\"><ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Falsifiability coverage</strong>: &gt;80% of hypotheses have explicit testable predictions</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Provenance coverage</strong>: &gt;90% of evidence claims traced to source (paper/experiment/debate)</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Trust score coverage</strong>: 100% of KG edges have computed trust scores</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Audit completeness</strong>: 100% of score changes have structured justifications</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Dependency graph</strong>: All hypothesis-experiment links are bidirectional and queryable</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Experiment grounding</strong>: &gt;500 structured experiments extracted, each traceable to source paper</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Debate breadth</strong>: &gt;5 artifact types have been debated (not just hypotheses)</li>\n<li style=\"margin:0.15rem 0;color:#bbb\"><strong style=\"color:#e0e0e0\">Schema integrity</strong>: All artifact types have governed schemas with validation</li>\n</ul>\n<h3 style=\"color:#4fc3f7;margin:1.4rem 0 0.5rem;font-size:1.1rem;font-weight:700;border-bottom:2px solid rgba(79,195,247,0.3);padding-bottom:0.2rem\">Code Quality Requirements</h3></p><p style=\"color:#bbb;line-height:1.6;margin:0.4rem 0\">All code produced by this quest must:\n<ul style=\"padding-left:1.5rem;margin:0.4rem 0\"><li style=\"margin:0.15rem 0;color:#bbb\">Use shared <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">database.py</code> for DB connections (not local <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">get_db()</code> definitions)</li>\n<li style=\"margin:0.15rem 0;color:#bbb\">Include migration testing (<code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">--dry-run</code> verification before apply)</li>\n<li style=\"margin:0.15rem 0;color:#bbb\">Add tests for trust computation, Bayesian score updates, and provenance chain traversal</li>\n<li style=\"margin:0.15rem 0;color:#bbb\">New modules must be &lt; 500 lines; split if larger</li>\n<li style=\"margin:0.15rem 0;color:#bbb\">No duplicate utility functions — reuse from <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">pubmed_utils.py</code>, <code style=\"background:#1a1a2e;color:#ce93d8;padding:0.1rem 0.3rem;border-radius:3px;font-size:0.85em\">kg_extraction_utils.py</code></li>\n<li style=\"margin:0.15rem 0;color:#bbb\">Schema changes reviewed for index coverage and query performance</li>\n</ul></p></div>","spec_file":"quest_epistemic_rigor.md","commits":[{"hash":"59405c7c5","message":"docs: AGENTS.md — document Path A/B/C task completion semantics [task:docs-agents-completion] (#40)","date":"2026-04-25"},{"hash":"e5b5848a0","message":"WIP on orchestra/task/8fcc8dc8-debate-artifact-version-pinning-referenc: 8a24c2fa2 [Senate] Delete broken restore_database.sh (#38)","date":"2026-04-25"},{"hash":"50e5ffcfe","message":"index on orchestra/task/8fcc8dc8-debate-artifact-version-pinning-referenc: 8a24c2fa2 [Senate] Delete broken restore_database.sh (#38)","date":"2026-04-25"},{"hash":"0d37f5fce","message":"untracked files on orchestra/task/8fcc8dc8-debate-artifact-version-pinning-referenc: 8a24c2fa2 [Senate] Delete broken restore_database.sh (#38)","date":"2026-04-25"},{"hash":"48f8d2fe3","message":"feat: surface all five SciDEX layers in nav [task:cba19c94-1724-4d5a-b89d-96c73c25f12a] (#39)","date":"2026-04-25"},{"hash":"1f0e35929","message":"Squash merge: orchestra/task/b1a8e549-cross-cutting-wire-existing-k-dense-skil (2 commits)","date":"2026-04-25"},{"hash":"ddb7db381","message":"[Agora] Wire existing K-Dense-backed tools into debate orchestration [task:b1a8e549-6f31-43c5-80f5-7c4717c267e4]","date":"2026-04-25"},{"hash":"76b71427a","message":"[Agora] Wire existing K-Dense-backed tools into debate orchestration [task:b1a8e549-6f31-43c5-80f5-7c4717c267e4]","date":"2026-04-25"},{"hash":"779e85c3a","message":"[Senate] Verify /resources dashboard complete; check off acceptance criteria [task:82074adc-507f-4e6b-9092-e2ceee79e7d4]","date":"2026-04-25"},{"hash":"4c66a8e09","message":"[Senate] Establish emergency access recovery procedures [task:e643cdd3-afd6-410f-a366-a6297d112127]","date":"2026-04-25"},{"hash":"7265a06b4","message":"Squash merge: orchestra/task/b1a8e549-cross-cutting-wire-existing-k-dense-skil (1 commits)","date":"2026-04-25"},{"hash":"58406ec64","message":"[Atlas] Dashboard artifact type: living web views with data source rendering [task:a17-28-DASH0001]","date":"2026-04-25"},{"hash":"8a24c2fa2","message":"[Senate] Delete broken restore_database.sh (#38)","date":"2026-04-25"},{"hash":"b98a1fa18","message":"[Senate] Delete broken restore_database.sh","date":"2026-04-25"},{"hash":"e846f82ef","message":"[Senate] Refresh BACKUP_RESTORE.md + docs/runbooks/emergency_restore.md (#37)","date":"2026-04-25"},{"hash":"43972a45e","message":"[Senate] Refresh BACKUP_RESTORE.md + docs/runbooks/emergency_restore.md","date":"2026-04-25"},{"hash":"2c7dbfe7f","message":"[Senate] Delete 9 obsolete backup scripts/units (continuation of Phase A-D cleanup) (#36)","date":"2026-04-25"},{"hash":"9743eb298","message":"[Senate] Delete 9 obsolete backup scripts/units (continuation of Phase A-D cleanup)","date":"2026-04-25"},{"hash":"3e72d8383","message":"[Agora] Wire 3 missing tools into debate skill_functions, fix citation persistence bug [task:b1a8e549-6f31-43c5-80f5-7c4717c267e4]","date":"2026-04-25"},{"hash":"4310e9854","message":"[Demo] Work log: figures verified complete — 140/140 analyses covered [task:df201d8f-4b89-4258-9148-eb1028fc1fbd]","date":"2026-04-24"}]}