* The workshop was streamed live in Facebook by @MKLabCERTH
Please visit this link to watch it again (no facebook account is necessary)
Video recording also available in YouTube with each talk as separate chapter
Presentations are available below
Motivated by the challenges, risks and opportunities that the wide use of AI brings to media, society and politics, AI4Media aspires to become a centre of excellence and a wide network of researchers across Europe and beyond, with a focus on delivering the next generation of core AI advances to serve the key sector of Media, to make sure that the European values of ethical and trustworthy AI are embedded in future AI deployments, and to reimagine AI as a crucial beneficial enabling technology in the service of Society and Media.
The AI4Media consortium, comprising 30 leading partners in the areas of AI and media (9 universities, 9 research centres, 12 industrial partners) and 35 associate members, will establish the networking infrastructure to bring together the currently fragmented European AI landscape in the field of media, and foster deeper and long-running interactions between academia and industry, including Digital Innovation Hubs. It will also shape a research agenda for media AI research, and implement research and innovation both with respect to cutting-edge technologies at the core of AI research, and within specific fields of media-related AI. AI4Media will provide a targeted funding framework through open calls, to speed up the uptake of innovations developed within the network. A PhD programme will further enhance links to the industry and the fostering and exchange of talent, while providing motivation to prevent brain drain, and a set of use cases will be developed by the network to demonstrate the impact of the achieved advances in the media sector. The Excellence Centre that is established during the AI4Media project, and the ecosystem that will grow around it, will provide a long-term basis for the support of AI excellence in Europe, long after the project end, with the aim of ensuring that Ethical AI guided by European values assumes a global leading role in the field of Media.
Artificial intelligence (AI) is transforming the ways information reaches us through means such as recommendations and delivery. It also influences how content is created. For example, AI and particularly machine learning (ML) can be used to summarise videos, enhance picture quality and process content. In that context, this talk addresses new ways of putting ML advances to specific use cases where prediction of pixels is required.
Adding colour to pixels of black and white video has traditionally been an expensive and time-consuming task - until now. Recent advances in deep learning have enabled the development of new colourisation algorithms. In particular, the talk will address the usage of Generative Adversarial Networks (GANs) to predict colours on a black and white source image using knowledge from a given set of colourised content. This enables us to obtain more natural, realistic and plausible colourful content.
Furthermore, by learning how to predict pixels in videos, the content delivery channels can be used more effectively by improving content compression with machine learning. Since this often comes with increased computational complexity and no transparency, examples of how to use ML more efficiently will be addressed in this talk.
Generative Adversarial Networks (GANs) have been used to generate highly compelling pictures or videos, such as manipulated facial animations, interior and outdoor images, videos. This lecture provides an overview of several GAN applications for media production, such as image content generation (e.g., human facial and body images), automatic image restyling/translation/captioning, text-to-image synthesis, video frame prediction, video content generation (e.g., human animations), automatic audio-visual content captioning. Future progress in this area holds the promise of revolutionizing arts and media production.
This talk addresses the problem of generating video summaries from original, full-length videos. For this, we will present deep network architectures that are based on Generative Adversarial Networks (GANs). An important feature of these GAN-based architectures is that they can be trained without supervision, i.e. without using manually-generated ground-truth video summaries; thus, they can be easily re-trained for different types of video content. Based on one of these network architectures, we will also present a Web service that supports the automatic generation of video summaries that fulfill the specifications for posting on the most common video sharing platforms and social networks with just a few mouse clicks.
Deepfake creation technologies enable the automatic generation of highly realistic visual content, typically animated people, on demand. Their emergence and constant improvement have given rise to concerns about trust in media and exacerbate the challenge of online disinformation. As a result, solutions for deepfake detection are considered as a valuable tool in the fight against disinformation. This talk will provide a brief introduction to the problem of deepfake detection, and will describe the experiences gained from the participation of the Media Verification (MeVer) team of ITI-CERTH in the DeepFake Detection Challenge (DFDC). Special focus will be given to the importance of training data pre-processing for the performance of the deepfake detection system. Additionally, practical considerations and challenges will be discussed when trying to deploy the service "in the wild", as part of the WeVerify verification platform.
Image animation consists of generating a video sequence so that an object in a source image is animated according to some external information (a conditioning label or the motion of a driving video). In this talk I will two solutions to this problem: 1) generating facial expressions, e.g., smiles that are different from each other (e.g., spontaneous, tense, etc.) using diversity as the driving force. 2) generate videos owithout using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to any object of this class. To achieve this, we decouple appearance and motion information using a self-supervised formulation. To support complex motions, we use a representation consisting of a set of learned keypoints along with their local affine transformations. A generator network models occlusions arising during target motions and combines the appearance extracted from the source image and the motion derived from the driving video. Our solutions score best on diverse benchmarks and on a variety of object categories.
Face de-identification methods commonly employed in media involve applying additive noise techniques such as pixilation and blurring in the facial image region, effectively corroding image quality without necessarily achieving sufficient de-identification performance. Recently proposed deep learning-based methods to this end, promise excellent de-identification performance while producing visually pleasing yet still not useful images for the human viewers. This lecture overviews the face de-identification problem, focusing on adversarial-based methods that disable automated face detection/recognition in a humanly imperceptible manner, thus producing machine de-identified images that maintain maximal utility.
This presentation will describe some major challenges in the detection of synthetic media and deepfakes. In particular, it will present recent approaches that try to fool current detectors of synthetic media. Analyzing the vulnerabilities of such detectors is extremely important in order to build better forgery detectors able to face malicious attacks. This is well known in forensics, where many counter-forensics methods have been proposed in the literature. Indeed, forensics and counter-forensics go hand in hand, a competition that contributes to improving the level of digital integrity over time. The talk will especially focus on GAN image detection and show that current powerful deep learning based detectors can be easily fooled by properly inserting camera-based traces in such synthetic images, without any knowledge about the architecture under attack.