ID: 692f13b8-876 Priority: 72 Type: one_shot Status: open
Use GLM-4.6V-Flash (free) or sonnet-4.6 as vision evaluator. Score images on 4 dimensions (1-5): scientific_accuracy, clarity, informativeness, aesthetic_quality. Auto-regenerate below threshold (avg < 2.5). Log quality trends to data/metrics/image_quality.json. Update artifact quality_score from evaluation.
scidex/forge/image_quality_evaluator.py (601 lines) exists on origin/main, authored in commits:75f3726af — [Senate] Package refactor: scidex/ package with 6 layer subpackages + re-export shims90f1ffff7 — [Senate] Prioritization run 57bd3b63bd8 — [Forge] Supervisor quota awareness (last update to this file)
scidex/forge/image_quality_evaluator.py confirmed present with all functions:evaluate_image_quality() — GLM/Sonnet vision scoring on 4 dimensionsshould_regenerate() — avg < 2.5 triggers regenerationregenerate_image_with_improvements() — auto-regen with improvementsupdate_artifact_quality_score() — updates artifacts table quality_scorelog_quality_metrics() — logs to data/metrics/image_quality.jsonevaluate_and_improve_artifact() — full pipelinebatch_evaluate_artifacts() — CLI batch evaluation
sqlite3.Row row factory assignment, PRAGMA journal_mode, SQLite json_set(), and direct json.loads() of PostgreSQL jsonb dict values. The previous task could be verified by grep but would fail when exercised against the live PostgreSQL artifacts table.scidex/forge/image_quality_evaluator.py to use PostgreSQL-safe metadata handling: parse jsonb/text fields through helpers, remove row factory and PRAGMA calls, merge quality evaluation metadata in Python, and write metadata with an explicit ?::jsonb cast.tests/test_image_quality_evaluator.py; ran pytest -q tests/test_image_quality_evaluator.py (2 passed), python3 -m py_compile scidex/forge/image_quality_evaluator.py, and a live PostgreSQL smoke inserting/updating/deleting a temporary figure artifact (quality_score=0.75, metadata quality evaluation persisted).scripts/test_image_quality_evaluator.py. Running it directly failed because the repo root was not on sys.path, and collecting it together with tests/test_image_quality_evaluator.py failed with a pytest import-file mismatch because both files imported as test_image_quality_evaluator.scripts/check_image_quality_evaluator.py and added explicit repo-root sys.path setup so the legacy image_quality_evaluator shim imports reliably. This removes the duplicate pytest module name while preserving the manual verification helper.pytest -q tests/test_image_quality_evaluator.py scripts/check_image_quality_evaluator.py (7 passed, warnings only because the legacy helper returns booleans); python3 scripts/check_image_quality_evaluator.py (5/5 passed); python3 -m py_compile scidex/forge/image_quality_evaluator.py image_quality_evaluator.py; live PostgreSQL smoke inserted/evaluated/deleted temporary figure artifact tmp-image-quality-watchdog-e4b26d91 and verified quality_score=0.75, metadata quality_evaluation, and metrics output.