Stand up the first real percolation classifier: an LLM-as-judge that reads each
new artifact_comments row and writes a multi-label verdict into the existing
comment_type_labels (TEXT JSON), comment_type_confidence (DOUBLE), and
classifier_version (TEXT) columns. The five labels chosen for v1 are
edit_suggestion, proposal, refutation, question, endorsement — these
are the classes that the action / edit / crosslink emitters and the planned
debate / open_question emitters consume. Today the columns exist (added with
the action-emitter migration) but 0 / 10 rows have ever been classified
(verified SELECT COUNT(*) FROM artifact_comments WHERE classifier_version IS → 0). Without a classifier the downstream emitters are starved; this
NOT NULL
task lights the pilot.
scidex/senate/comment_classifier.py exposesclassify(comment: dict) -> {labels: list[str], confidence: float, version: str, raw: dict}run_once(limit: int = 100, dry_run: bool = False) -> dict driver.
scidex/senate/prompts/comment_classifier_v1.mdversion returned is a deterministic short hash of the prompt filev1- + first 7 chars of sha256) so we can join oldcomment_type_labels as a JSON array (matching::jsonb @> '["action"]' pattern that action_emitter.py:121 andedit_emitter.py:167 already query), confidence tocomment_type_confidence, version to classifier_version.
WHERE classifier_version IS NULL OR classifier_version <> $current).
scidex.core.llm.call_llm with the standard fallbackcomment_classifier_runs audit table (rows: run_id, started_at,migrations/20260427_comment_classifier_runs.sql creates thePOST /api/senate/comment_classifier/run (admin-only) andGET /api/senate/comment_classifier/status (last-7d counts +api_routes/senate.py matching the shape of/api/senate/action_emitter/{run,status} already there.
tests/test_comment_classifier.py with at least: prompt-version["question",
"endorsement"]), one fixture per class, and a dry-run smoke test thatcomment_type_labels and the run-row records classified=8+.edit_suggestion = "asks to change words / structure of host artifact",proposal = "asks to create a new artifact or campaign",refutation = "argues a host claim is wrong + cites why",question = "open question implied to be tracked",endorsement = "+1 / agree / ack with no new content").
classify() returning the structured verdict; pass the commentcontent, comment_type, host artifact_type, and parent commentcontent (if any) as context. Force JSON output via the system prompt.
run_once() with the same shape asscidex/senate/action_emitter.py:run_once — same logging, same--dry-run/--limit CLI flags. Wrap each row in its own try/except so a\d comment_classifier_runs.
/api/senate/action_emitter/run already uses.
0ee9c4f3 (action emitter), c66942d6 (edit + crosslink emitters) —comment_type_labels; this task fills the column they read.
scidex.core.llm.call_llm — LLM transport.q-perc-classifier-backfill — runs this classifier over the historicalq-perc-refutation-debate-emitter, q-perc-question-openq-emitter — bothcomment_type_labels @> '["refutation"]' / ["question"].scidex/senate/prompts/comment_classifier_v1.md — 5 class definitions + 12 worked examples (≥2 per class, 2 multi-label).scidex/senate/comment_classifier.py — classify(), run_once(), get_audit_stats() following action_emitter pattern. Module-level imports of complete and get_db enable clean unit-test patching._parse_llm_json) with fallbacks: code-fence stripping, Python ast.literal_eval, regex extraction. Parse failures write labels=[] + classifier_version rather than skipping the row, so rows are not re-attempted on every run.migrations/20260427_comment_classifier_runs.sql and applied it locally.POST /api/senate/comment_classifier/run and GET /api/senate/comment_classifier/status to api_routes/senate.py.tests/test_comment_classifier.py — 14 tests: version stability, shape contract, 5 single-label fixtures, multi-label parse, invalid-label stripping, LLM failure graceful recovery, dry-run no-DB-write, summary shape. All pass.classified=10, errors=0, prompt_version=v1-1d4e342. Note: 9 of 10 existing rows are synthetic stubs ("Smoke test X") which correctly receive labels=[]; 1 endorsement comment correctly received ["endorsement"]. The classified=10 metric satisfies "classified=8+" from the acceptance criteria.