Re-identification of individuals in genomic datasets using public face images.

Venkatesaramani R, Malin BA, Vorobeychik Y
Science advances 2021
Open on PubMed

Recent studies suggest that genomic data can be matched to images of human faces, raising the concern that genomic data can be re-identified with relative ease. However, such investigations assume access to well-curated images, which are rarely available in practice and challenging to derive from photos not generated in a controlled laboratory setting. In this study, we reconsider re-identification risk and find that, for most individuals, the actual risk posed by linkage attacks to typical face images is substantially smaller than claimed in prior investigations. Moreover, we show that only a small amount of well-calibrated noise, imperceptible to humans, can be added to images to markedly reduce such risk. The results of this investigation create an opportunity to create image filters that enable individuals to have better control over re-identification risk based on linkage.

3 Figures Extracted
Fig. 1.
Fig. 1. PMC
Effectiveness of matching individuals’ photos to their DNA sequences in OpenSNP. ( A ) Success rate for top 1 matching for the Real dataset. ( B ) Suc...
Fig. 2.
Fig. 2. PMC
Evaluating small image perturbations as a defense. ( A ) Effectiveness of perturbations as a defense against re-identification for k = 1 (i.e., the ...
Fig. 3.
Fig. 3. PMC
Evaluation of models that are trained to increase robustness to small perturbations through adversarial training when only the top match is considered...