SciDEX — Task: [Senate] Question emitter

Question-classified comments mint open_question artifacts that enter per-field Elo; title-hash dedup, derives_from link.

Completion Notes

Auto-completed by supervisor after successful deploy to main

Git Commits (2)

Squash merge: orchestra/task/a4c450f7-biomni-analysis-parity-port-15-use-cases (87 commits) (#717)2026-04-27

[Senate] Question emitter: question comments spawn open_question artifacts [task:4320d55a-47a6-492f-9ad6-096bf1580ac4] (#664)2026-04-27

Spec File

Goal

Close the question arm of percolation: when a comment classified as question is posted on any artifact, the system mints a new open_question artifact (the artifact_type already used by the open-question
quest b307ad54-a95), seeded with the comment text, attributed to the
comment author, and linked back to the host artifact via a typed artifact_link. The new open_question then enters the per-field Elo ranking
that quest already owns. Today every question buried in a discussion
disappears; this emitter promotes them to first-class artifacts that compete
for attention.

Acceptance Criteria

☑ New module scidex/senate/question_emitter.py with

scan_candidates, extract_question_text, emit_open_question,
run_once mirroring the action / refutation emitter shape.

☑ Selection: artifact_comments with

comment_type_labels::jsonb @> '["question"]' AND
spawned_open_question_id IS NULL. No consensus required (questions
need not be agreed-upon to be tracked).

☑ Migration migrations/20260428_question_emitter.sql:

ALTER TABLE artifact_comments ADD COLUMN IF NOT EXISTS
        spawned_open_question_id text

+ partial index.
- CREATE TABLE comment_question_emitter_runs ... mirroring
existing *_emitter_runs tables.

☑ extract_question_text(comment_content) -> str | None: returns the

first sentence ending in ? if the comment text contains one;
otherwise returns the first 280 chars. Pure helper; unit-tested.

☑ emit_open_question(...): creates an artifact via

scidex.atlas.artifact_registry.register_artifact with
artifact_type='open_question',
title=extracted_question_text,
created_by=comment.author_id,

metadata={"source_comment_id":..., "host_artifact_id":...,
                 "field": host_artifact.metadata.get("field")}

.
Inherits the host's field so the per-field Elo from
scidex/agora/open_question_tournament.py (ENTITY_TYPE='open_question')
picks it up automatically.

☑ Provenance: writes an artifact_provenance row with

action_kind='spawn_proposal' (the closest existing kind in the
check constraint, which already includes 'spawn_proposal').
Reusing it avoids another schema migration; document the choice in
the Work Log.

☑ Link: artifact_link with link_type='derives_from',

source_artifact_id=open_question_id,
target_artifact_id=host_artifact_id, lifecycle='confirmed'.

☑ De-dup: unique partial index on spawned_open_question_id.

Bonus: also de-dup by question text similarity within the same field
using a simple normalized-title hash (lowercase + strip + sha1) before
minting; if a match exists, link to the existing open_question
instead of creating a duplicate.

☑ API: POST /api/senate/question_emitter/run and

GET /api/senate/question_emitter/status.

☑ Tests: extract first-sentence vs no-question-mark, dedup-by-hash,

dry-run no-op, end-to-end emit creates artifact + provenance + link.

Approach

Read the open_question wiki backfill task (already done in this quest;

the Atlas/feat] Wiki TODO/Open-Question section parser is the pattern
for creating open_question artifacts) to copy the field-inheritance and
create-artifact call shape.

Build the title-hash dedup helper and unit-test it against synthetic

near-duplicates ("Why does X?" vs "why does x?").

Implement migration + emitter + routes + tests.

Smoke-test by classifying a synthetic question comment and verifying the

new open_question lands on the per-field Elo leaderboard.

Dependencies

q-perc-comment-classifier-v1 — supplies comment_type_labels.
Open-question quest schema (already shipped: b2d85e76 per the wiki

parser task description).

Dependents

q-perc-comment-trace-ui — surfaces "your question is now tracked open

question X, current Elo Y".

Work Log

2026-04-27 11:00 UTC — Slot 79 (minimax)

Staleness review: Task branch orchestra/task/4320d55a-question-emitter-question-comments-spawn

created 2026-04-27T10:53. Worktree was clean; rebased against origin/main (e9ab5b9aa).
Task title + acceptance criteria remain fully valid — no sibling has addressed this yet.

Spec read: understood goal (question-classified comments mint open_question artifacts entering per-field Elo).

Existing patterns studied: read action_emitter.py, refutation_emitter.py, and

open_question_miner_wiki.py. Used register_artifact + create_link from
scidex.atlas.artifact_registry (same pattern as wiki miner). Reused
spawn_proposal for action_kind (already in CHECK constraint, avoids migration).
Reused derives_from for link_type (already allowed per chk_link_type).

Migration migrations/20260428_question_emitter.sql:

- Applied ALTER TABLE artifact_comments ADD COLUMN IF NOT EXISTS spawned_open_question_id text
+ partial index idx_ac_spawned_open_question_id.
- Applied CREATE TABLE comment_question_emitter_runs (...) + index.

Module scidex/senate/question_emitter.py: implemented all four functions

(scan_candidates, extract_question_text, emit_open_question, run_once) plus
get_audit_stats and CLI. Key design decisions:
- extract_question_text: returns first ?-ending sentence, else first 280 chars.
Pure function, no DB access.
- title_hash: SHA1(normalized_lower) → 12-char hex, used for field-scoped dedup
before minting.
- emit_open_question: checks dedup via _find_existing_by_hash before creating.
Field inherited from host_artifact_metadata.field_tag or metadata.field or
defaults to "neurodegeneration".
- No consensus gate (questions don't need agreement to be tracked).

API routes in api_routes/senate.py:

- POST /api/senate/question_emitter/run — delegates to _qe.run_once
- GET /api/senate/question_emitter/status — delegates to _qe.get_audit_stats

Tests tests/test_question_emitter.py: 12 tests covering:

- extract_question_text: first ? sentence, no ?, long text, empty input
- title_hash: case insensitivity, whitespace, 12-char hex format
- emit_open_question: dry_run flag, empty content error
- scan_candidates: column shape via mock DB