Current social media generate a tremendous amount of visual material that can be exploited by researchers operating in social sciences and digital humanities disciplines. However, privacy regulations impose significant restrictions to both data collection and sharing. CAMOUFLAGE leverages recent advances in controlled image synthesis to generate a synthetic version of an image corpora with similar characteristics to a target collection, while at the same time removing all personally identifiable information to ensure the anonymity of users who published the original image. Solving this ambitious goal requires tackling three distinct, yet related, research objectives: to design and implement controllable image synthesis that retains the visual and semantic content of a target image; to determine whether the resulting synthetic images can be considered successfully anonymized; and whether the synthetic collection is semantically equivalent to the original collection.
The CAMOUFLAGE synthesizer extracts non-sensitive data from the original image in order to constrain a diffusion model to preserve the composition of the image, under a predetermined measure of “equivalence”, while removing personal identifiers.
As a motivating example and case study, CAMOUFLAGE will focus on the semiotic analysis of visual big data, specifically of a collection of profile pictures from, e.g., Facebook and Instagram. Different big data analytics scenarios will be considered, from the large-scale automatic extraction of quantitative information with pre-trained neural networks, to the visual analysis by expert semioticians. If successful, CAMOUFLAGE will not only deliver a useful tool and anonymized assets to the community, but may also bring novel insights into the existing limitations and biases of current generative models.
The team:
Lia Morra |
Luca Piano |