AI4Media work in the first 16 months of the project on “Human- and Society-centered AI Algorithms” comprises the following activities:
- policy recommendations for content moderation, which investigate aspects of future regulation: who should decide which content should be removed, for which reasons, when and how;
- development of detectors for content manipulation and synthesis, which address the growing problem of disinformation based on visual, audio and textual content;
- development of trusted recommenders, which address challenges related to privacy and bias for recommendation services;
- development of tools for healthier political debate, aiming at sentiment analysis, public opinion monitoring, and measuring the overall “healthiness” of online discussions;
- development of tools to understand the perception of hyper-local news, focusing on health information for this period;
- measuring user perception of social media, focusing on tools and methods that can accurately predict or identify viewer’s emotions and perception of content such as interestingness or memorability;
- measuring real-life effects of private content sharing, which can often lead to unexpected and serious consequences.
All this is presented in the document “First-generation of Human- and Society-centered AI algorithms (D6.1)”, which also includes references to publications and published software.
Policy recommendations for content moderation: This section addresses a key legal topic around media: Who should decide which content should be removed, for which reasons it should be removed, and when and how it should be removed? In this context, several questions are addressed:
- Which overall approach should be taken? Self-regulation (such as codes of practice, codes of conducts), or hard-law EU regulatory instruments?
- How can regulation approaches be designed to respect fundamental rights such as freedom of expression without limiting the open public debate?
- How can it be ensured that legitimate, lawful content is not deleted and that the freedom of expression is not violated?
- How do users know what gets deleted, and whether what gets deleted violates laws or not?
Beyond that, the section addresses the use of automated tools in content moderation and offers a critical assessment of the technical limitations of algorithmic content moderation and points out risks for fundamental human rights, such as freedom of expression. Finally, it introduces the main elements of the EU regulatory framework applicable to content moderation.
Manipulation and synthetic content detection in multimedia: This section addresses various approaches related to audio, video and textual content verification, i.e. the detection and localization of manipulations and fabrications, with a focus on the latter: Especially due to the latest advancements in the field of Generative Adversarial Networks (GANs) and Language Models (LMs), the distinction between real and fake content (Deepfakes) is becoming increasingly difficult to make. Apart from many beneficial applications, there are also many applications that are potentially harmful to individuals, communities, and the society as a whole, especially with respect to the creation and distribution of propaganda, phishing attacks, fraud, etc., and there is a growing demand for technologies to support content verification and fact-checking. AI4Media aims at the development of such technologies, which are also used within several of the AI4Media use cases. This document reports on the activities and results of the first project phase:
- for visual synthesis and manipulation detection, three methods for detecting synthetic / manipulated images and videos (based on facial features and CNN/LSTM architectures, optical flow, and CNN), and one method for image (layout-to-image translation based on a novel Double Pooling GAN with a Double Pooling Module), and an evaluation for existing state-of-the-art CNN-based approaches are presented
- for audio synthesis and manipulation detection, two detection methods (based on microphone classification, and DNN) and synthetic speech generation tools for training and tests are presented.
- for text synthesis and manipulation detection, an approach for the composition of a dataset with DeepFake tweets and a method to distinguish between synthetic and original tweets are presented
Hybrid, privacy-enhanced recommendation: This section outlines the initial activities related to recommendation (they will mostly take place in the second half of the project): Recommender systems are powerful tools that can help users find “the needle in the hack stack” and provide orientation, but they also strongly influence how users perceive the world and can contribute to a problem that is often referred to with “filter bubbles” – AI4Media aims at proposing how such effects can be minimized. Beyond that, the task also aims at developing tools to address privacy, which is a potential issue for all recommenders that exploit usage or usage data, by applying so-called Privacy Enhancing Technologies (PET).
AI for Healthier Political Debate: This section describes how Neural knowledge transfer can be applied for improved sentiment analysis in texts including figurative language (e.g. sarcasm, irony, metaphors), with many applications in automated social media monitoring, and customer feedback processing, e-mail scanning, etc. It also describes a new approach for public opinion monitoring via semantic analysis of tweets, especially relevant for political debates, preparing an annotated dataset for semantic analysis of tweets in the Greek language, and applying/validating the aforementioned analysis tools with them. Finally, it describes how the healthiness of online discussions on Twitter was assessed using the temporal dynamics of attention data.
Perception of hyper-local news: Local news are indispensable sources of information and stories of relevance to individuals and communities, and this section includes a description of several analysis approaches for local news and the understanding of their perception both by people and machines: classification of covid-19-related misinformation and disinformation in online news articles, building a corpus of local news about covid-19 vaccination across European countries, and exploration of online video as another health information source.
Measuring and Predicting User Perception of Social Media: This section provides a description of tools and methods developed which can accurately predict or identify viewer’s emotions and perceptions of content, including:
- benchmarking and predicting media interestingness in images and videos
- predicting video memorability, using Vision Transformers
- use of decision-level fusion/ensembling systems for media memorability, violence detection and media interestingness
- use of a Pairwise Ranking Network for Affect Recognition, and validating it for EEG data
- estimating Continuous Affect with label uncertainty
Real-life effects of private content sharing: This section described activities related to the analysis of content sharing, which can often lead to unexpected and serious consequences, especially when applied to an unintended context (e.g. to a job application process vs. a personal environment). The main objective is to improve user awareness about data processing through feedback contextualization, applying a method that rates visual user-profiles and individual photos in a given situation by exploiting situation models, visual detectors and a dedicated photographic profiles dataset.
The document can be found HERE, and the initial results include the following OSS tools:
- Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation: A Novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage. See https://github.com/Amazingren/CrossMLP
- DeepFusionSystem_v2: A DNN-based late fusion approach that uses a custom number of inducers as inputs and outputs a new result, according to late fusion schemes. See https://github.com/cmihaigabriel/DeepFusionSystem_v2
- Predicting Media Memorability: This dataset contains video samples annotated with short- and long-term memorability ground truth values, evaluated via memory tests performed on human annotators. See https://multimediaeval.github.io/editions/2020/tasks/memorability/ and https://multimediaeval.github.io/editions/2021/tasks/memorability/
- LERVUP (LEarning to Rate Visual User Profiles): an approach that focuses on the effects of data sharing in impactful real-life situations, which relies on three components: (1) a set of visual objects with associated situation impact ratings obtained by crowdsourcing, (2) a corresponding set of object detectors for mining users’ photos and (3) a ground truth dataset made of 500 visual user profiles which are manually rated per situation. See https://github.com/v18nguye/lervup_official
Authors: Patrick Aichroth & Thomas Köllmer (Fraunhofer IDMT)