Datasets containing TREM2:
┌─────────────────────────────┬──────────────┬──────────┬────────────┐
│ Dataset │ Column │ Rows │ Values │
├─────────────────────────────┼──────────────┼──────────┼────────────┤
│ SEA-AD Differential Expr. │ gene_symbol │ 15,000 │ log_fc=2.3 │
│ Allen Brain Cell Atlas │ gene │ 50,000 │ expr=high │
│ AD GWAS Summary Stats │ gene_name │ 8,000 │ p=1.2e-8 │
└─────────────────────────────┴──────────────┴──────────┴────────────┘Implementation:
def get_datasets_for_entity(entity_name, entity_type=None):
"""Query artifact_links and column metadata to find datasets
containing this entity. Returns dataset artifacts with the
specific column and summary statistics for that entity."""Column: gene_symbol (linked to: gene entities)
Known genes in KG: 12,450 / 15,000 (83% coverage)
Top entities by KG connectivity:
TREM2 (142 edges) | APOE (138 edges) | APP (95 edges) | MAPT (89 edges)
Column: cell_type (linked to: cell_type entities)
Known cell types in KG: 8 / 8 (100% coverage)
microglia (256 edges) | astrocyte (198 edges) | neuron (312 edges)Implementation:
def get_kg_context_for_dataset(dataset_artifact_id):
"""For each linked column in the dataset, resolve entities against
the KG and return coverage stats and top entities by connectivity."""linked_entity_type:get_datasets_for_entity() returns datasets containing a given entityget_kg_context_for_dataset() returns KG coverage per linked column{
"requirements": {
"coding": 5,
"analysis": 5,
"safety": 9
},
"completion_shas": [
"915c9692b",
"520dbcc2e"
],
"completion_shas_checked_at": ""
}