[Senate] Image quality evaluation pipeline via vision models

ID: 692f13b8-876 Priority: 72 Type: one_shot Status: open

Goal

Use GLM-4.6V-Flash (free) or sonnet-4.6 as vision evaluator. Score images on 4 dimensions (1-5): scientific_accuracy, clarity, informativeness, aesthetic_quality. Auto-regenerate below threshold (avg < 2.5). Log quality trends to data/metrics/image_quality.json. Update artifact quality_score from evaluation.

Acceptance Criteria

☐ Concrete deliverables created

☐ Work log updated with timestamped entry

Work Log

2026-04-17 03:55 PT — Slot minimax:67

Status: Task already implemented on main — verification confirms NO new work needed
Evidence: scidex/forge/image_quality_evaluator.py (601 lines) exists on origin/main, authored in commits:

- 75f3726af — [Senate] Package refactor: scidex/ package with 6 layer subpackages + re-export shims
- 90f1ffff7 — [Senate] Prioritization run 57
- bd3b63bd8 — [Forge] Supervisor quota awareness (last update to this file)

Module verification: All functions available, 4 dimensions (scientific_accuracy, clarity, informativeness, aesthetic_quality) scoring 1-5, threshold at 2.5, auto-regeneration logic, metrics logging to data/metrics/image_quality.json, quality_score update in artifacts table
Result: Task already delivered — no changes needed

2026-04-17 04:17 PT — Slot minimax:62

Status: Verified on origin/main after rebase to 8ad527919
Verification: scidex/forge/image_quality_evaluator.py confirmed present with all functions:

- evaluate_image_quality() — GLM/Sonnet vision scoring on 4 dimensions
- should_regenerate() — avg < 2.5 triggers regeneration
- regenerate_image_with_improvements() — auto-regen with improvements
- update_artifact_quality_score() — updates artifacts table quality_score
- log_quality_metrics() — logs to data/metrics/image_quality.json
- evaluate_and_improve_artifact() — full pipeline
- batch_evaluate_artifacts() — CLI batch evaluation

Spec file only updated (no code changes needed — already on main)
Result: Task already complete — spec work log updated

2026-04-20 20:52 PDT — Slot codex:41 (Watchdog repair f653b69f)

Root cause: The image quality evaluator existed, but the PostgreSQL retirement exposed SQLite-era code paths: sqlite3.Row row factory assignment, PRAGMA journal_mode, SQLite json_set(), and direct json.loads() of PostgreSQL jsonb dict values. The previous task could be verified by grep but would fail when exercised against the live PostgreSQL artifacts table.
Fix: Updated scidex/forge/image_quality_evaluator.py to use PostgreSQL-safe metadata handling: parse jsonb/text fields through helpers, remove row factory and PRAGMA calls, merge quality evaluation metadata in Python, and write metadata with an explicit ?::jsonb cast.
Tests: Added tests/test_image_quality_evaluator.py; ran pytest -q tests/test_image_quality_evaluator.py (2 passed), python3 -m py_compile scidex/forge/image_quality_evaluator.py, and a live PostgreSQL smoke inserting/updating/deleting a temporary figure artifact (quality_score=0.75, metadata quality evaluation persisted).
Result: Original task should now retry successfully instead of failing on PostgreSQL/SQLite incompatibilities.

2026-04-21 04:36 UTC — Slot codex:43 (Watchdog repair e4b26d91)

Root cause: The evaluator code path was already PostgreSQL-compatible, but the task's verification surface still had a stale legacy helper at scripts/test_image_quality_evaluator.py. Running it directly failed because the repo root was not on sys.path, and collecting it together with tests/test_image_quality_evaluator.py failed with a pytest import-file mismatch because both files imported as test_image_quality_evaluator.
Fix: Renamed the legacy helper to scripts/check_image_quality_evaluator.py and added explicit repo-root sys.path setup so the legacy image_quality_evaluator shim imports reliably. This removes the duplicate pytest module name while preserving the manual verification helper.
Tests: pytest -q tests/test_image_quality_evaluator.py scripts/check_image_quality_evaluator.py (7 passed, warnings only because the legacy helper returns booleans); python3 scripts/check_image_quality_evaluator.py (5/5 passed); python3 -m py_compile scidex/forge/image_quality_evaluator.py image_quality_evaluator.py; live PostgreSQL smoke inserted/evaluated/deleted temporary figure artifact tmp-image-quality-watchdog-e4b26d91 and verified quality_score=0.75, metadata quality_evaluation, and metrics output.
Result: Original task can retry without the verification collision/import failure; evaluator functionality is verified against the live PostgreSQL path.

Tasks using this spec (1)

[Senate] Image quality evaluation pipeline via vision models

Visual Artifacts done P72

File: 692f13b8_876_spec.md

Modified: 2026-04-25 23:40

Size: 4.8 KB