[Agora] Rebuild theme AG1: thin-content enrichment as a polymorphic continuous process

← All Specs

[Agora] Rebuild theme AG1: thin-content enrichment as a polymorphic continuous process

Rebuild spec — follow docs/planning/specs/rebuild_theme_template_spec.md first.

Theme anchor

  • Theme: AG1 — Thin-content enrichment
  • Layer: Agora
  • Full description: docs/design/retired_scripts_patterns.md → AG1

Why this is first on the list

~30 deleted scripts instantiated this one pattern (enrich_thin_*, enrich_under1k_, enrich_top_hypotheses, enrich_hero_hypotheses, enrich_batch2, enrich_batch3, …). That's the textbook case of "a
recurring process that never got built, respawning as variants". Every
other theme will benefit from this one being the first rebuilt as a
canonical reference.

An open recurring task already exists for this capability
([Exchange] CI: Enrich thin hypotheses — expand next 5 descriptions,
every-2h, 1b911c77-4c4e-4dbe-a320-0cb129f1c38b). That task's previous
implementation referenced scripts/deprecated/enrich_thin_hypotheses_batch2.py
which is now gone. Rebuilding theme AG1 retires that task's
script-dependency.

Template fills

  • {{THEME_ID}} = AG1
  • {{THEME_NAME}} = thin-content enrichment
  • {{LAYER}} = Agora
  • {{LAYER_SLUG}} = agora
  • {{THEME_SLUG}} = thin_enrichment
  • {{CADENCE}} = every 1h
  • {{CORE_JUDGMENT}} = "is this record's prose thin/stubby relative
to the available structured context, and if so, what is the richer
version that remains grounded in the same facts?"
  • {{GAP_PREDICATE}} = rows where
length(prose_column) < dynamic_threshold OR
prose_version < current_rubric_version — with dynamic_threshold
read from theme_config, not hardcoded.

Polymorphism (this is the defining feature)

The rebuilt process must not be hypothesis-specific. It takes (table_name, prose_column, priority_column, context_columns[]) as
config and works over:

  • hypotheses (description)
  • wiki_entities (content)
  • experiments (protocol / description)
  • analyses (summary / findings)
  • gaps (description)

One process, N content types. New content types enrolled by adding a
row to theme_config — no code change.

Discover the schema at runtime via information_schema.columns rather
than hardcoding "table has column X".

Rubric v1 sketch

The rubric asks an LLM, given (current prose, structured context,
linked KG neighborhood, citations):

  • Score the current prose for information density, grounding,
  • section coverage (0..1).
  • If below target, produce an expanded prose that strictly adds
  • information present in the context — no invention.
  • Return {score, expanded_prose, added_claims, citations_used,
  • cannot_expand_reason | null}.

    Never return text unless grounded in the provided context. If context
    is too thin to expand without inventing, return the cannot_expand_reason
    and skip the row. (Gap-predicate will re-select after context grows.)

    Outcome feedback signals (what makes this self-improving)

    • Downstream quality-rubric score on the same row over the next 24h
    (did the expansion land a higher score?).
    • Citation verification rate (did the claims added survive a
    PMID-claim check?).
    • User engagement proxies if available (time-on-page, internal clicks).

    Feedback worker writes to agora_thin_enrichment_outcome_feedback.
    Monthly meta-worker proposes rubric_v2 based on which expansions
    scored highest downstream.

    Acceptance

    All template acceptance criteria, plus:

    ☐ Runs over ≥ 3 content types (hypotheses, wiki_entities,
    experiments) in the same process.
    theme_config has rows for each content type with thinness
    threshold + priority weights.
    ☐ At least one rubric_v1 → rubric_v2 promotion path worked
    end-to-end in testing (even if v2 is a trivial tweak).
    ☐ The existing recurring task 1b911c77-4c4e-4dbe-a320-0cb129f1c38b
    is reassigned (not recreated) to this process: its spec-path
    updated, its description pointing at this spec.

    File: rebuild_theme_AG1_thin_content_enrichment_spec.md
    Modified: 2026-04-28 02:29
    Size: 3.9 KB