Seven use cases have been defined by AI4Media’s industry partners, informed by emerging market opportunities and urgent industry challenges, raising specific requirements and research questions. AI4Media use cases highlight how AI applies throughout the media industry value chain, from research and content creation to production, distribution, consumption/interaction, performance and quality measurement. These industry cases play a key role in exploiting and sustaining results of AI4Media research activities. Have a look at them:
This use case from Deutsche Welle (DW) and Athens Technology Center (ATC) leverages AI technologies to improve support tools used by journalists and fact-checking experts for digital content verification and disinformation detection. While partner DW provides journalistic and media-focused requirements, ATC is responsible for AI component integration and the operation of the demonstrators, Truly Media – a web-based platform for collaborative verification – and TruthNest – a Twitter analytics and bot detection tool. Two main topics are covered within the use case: 1) verification of content from social media with a focus on synthetic media detection, and 2) detection of communication narratives and patterns related to disinformation. The key motivation behind this work is to demonstrate how advanced AI support functions can enable news journalists to keep up with rapid new developments in the area of manipulated social media content, synthetic media, and disinformation.
To that end, related AI technologies that are being integrated into the use case demonstrators support journalists in detecting manipulated and synthetically generated images, videos, and audio, as well as detecting bot-generated tweets and managing their content through media summarization technologies. These AI-based tools are being developed by some of the largest research centres in Europe, such as the Centre for Research and Technology Hellas (CERTH), Fraunhofer, and CEA. We are also experimenting with AI at the edge applications for journalism. We are experimenting with how latest advances in the area can be leveraged, with a view to perform on-device critical media processing tasks, such as deepfake detection and face anonymization or NLP-based text analysis and question answering. This is considered a valuable capability in the context of counteracting disinformation, especially in cases where the media content of interest is of confidential or sensitive nature or in cases where the surrounding context does not allow it to be shared over public communication networks (e.g. areas without high-bandwidth connectivity or under strict monitoring by local authorities).
Another key aspect is the exploration of Trustworthy AI in relation to these topics and the specific needs of media organisations. Our goals are to explore and demonstrate how an AI component from a third-party provider can be enhanced in terms of transparency and robustness, to develop related AI transparency information documents for different target groups within a media organisation, and to make such transparency information available in the user interface of the demonstrators.
If you wish to know more about this work, please contact Danae Tsabouraki at d.tsabouraki@atc.gr.
Journalists face a challenging environment where the amount of incoming content is ever increasing, while the need to publish news as fast as possible is extremely pressing. At the same time, journalists need to ensure the published content is both relevant to their audience and is a trustworthy source of information, avoiding errors and misinformation. This use case from the Flemish Public Broadcaster (VRT) focuses on interweaving smart (AI-powered) tools into day-to-day journalistic workflows in modern newsrooms, aiming to optimise repetitive tasks and create opportunities for new story formats supported by these tools. VRT is creating a Smart News Assistant, i.e., a multi-functional and AI-driven toolbox that will support journalists in monitoring, fact-checking, and creating engaging news formats.
Current work focused on investigating the workflow of a journalist and how to customise it with AI. We enhanced the Image Verification Tool from CERTH by creating a new user interface that provides step-by-step guidance through the image verification process. We developed a new prototype called Video Curator that matches incoming audiovisual content with news-related text written by journalists, in order to suggest suitable video output. Work is underway to create a new prototype that will help journalists to better understand and use data in their stories.
Content preservation, high-quality production and process automation are at the core of the current transformation of Public Service Media (PSM) from its traditional business to the modern digital era. Emerging AI-based technologies can support PSMs in this transition by providing capabilities to simplify and accelerate content production processes and to enhance existing content, such as broadcasters’ archives.
The use case defined by Rai, the Italian public broadcaster, focuses on three main tasks usually accomplished during everyday operations, namely content access, content production and content monitoring. Content access includes tools supporting users to find content according to specific semantic features, like persons’ names, places and organisations referenced in texts, recognising monuments depicted in images or identifying TV celebrities appearing in videos. Content production involves activities aimed at the creation and enhancement of content (e.g., video super resolution, video denoising). Content monitoring comprises some of the pillars of public media services, such as diversity analysis, content reliability assessment and social media analysis.
The use case aims at exploring the plethora of new AI-driven tools to find the most suitable for each of these domains of application, identifying an as smooth as possible integration of each component into well established media workflows. Content access tasks have already been tackled, working on the possible introduction in the production workflows of four AI-driven components related to informative content and archive exploitation.
Indeed, since being able to leverage on visual features instead of using only textual metadata could be of help to journalists for their search and retrieval activities, we worked on technologies allowing professionals to identify faces of TV personalities and geographic landmarks in video, as well as improving their possibilities to search for content using images as query. Another important feature that has been integrated into Rai’s tools for journalists is an AI-driven NER working on English and German content, which will improve some daily workflow of professionals working in bilingual regions of Italy.
As for the content production activities, Public Service Media are extremely interested in video enhancement tasks, to be able to upgrade the big amount of archived content from e.g., HD to 4k (or even from SD to HD sometimes) for their possible reuse. Following this path, we assessed a super-resolution component and compared its performances with SOTA technologies obtaining promising results. Further tests will follow using different models. Content monitoring activities will also be tackled in the next period.
Researchers working in media studies, history, political sciences, and other fields within social sciences and humanities (SSH) have greatly benefited from the digitization of audiovisual archives. It has expanded the scale and scope of their investigations, stimulating new research questions – a great example of this is an examination of political party representation in the media during the election period. The Netherlands Institute for Sound & Vision developed a use case that investigates how AI-based tooling could enhance SSH research with big data from archival collections. Specifically, AI4Media has provided us with an opportunity to expand the capabilities of the CLARIAH Media Suite, a digital environment for conducting research with multimodal datasets from Dutch heritage organisations.
Over the last two years we have been collaborating with Fraunhofer IDMT to develop the Partial Audio Functionality (PAM) for the Media Suite that allows researchers to detect and trace the reuse of audiovisual programs based on the matching of identical audio signals. This can show ways in which moving images have been reused to frame a topic by giving source material a different meaning in a new context. For instance, a Media Suite user might choose a particular press conference and perform a PAM analysis to identify how segments from this program have been quoted in the evening news in weeks that follow, allowing them to compare how the same topic is reported on by different TV channels.
We have already performed an initial evaluation of PAM with researchers in the field of media studies. They confirmed the usefulness of the tool in studying the circulation and ‘canonization’ of images and speeches. Researchers were particularly excited to see a tool that is based on audio rather than visual analysis. This opens up new possibilities for currently underrepresented research areas, such as the analysis of soundscapes. What also became evident during this evaluation is that researchers place a high priority on the explainability and trustworthiness of AI tools. They need to be transparent about the limitations of their methods or potential biases and make their research replicable. Therefore, the next step in our work will be extending PAM with a model card based on IBM’s AI Fairness 360.
Digital games are one of the fastest-growing multimedia sectors with a projected market growth of $200 billion by 2023. This incredible trajectory is partly supported by a “games-as-a-service” business model, in which games are continuously developed and expanded beyond their initial release. While the steady flow of content helps with customer retention, it also puts pressure on developers because this content has to be tested and optimised before release. Artificial Intelligence (AI) can provide a radically new approach to game development and testing by allowing developers to test thousands of different configurations. AI can replace or augment existing practices by providing product evaluations faster than current methods and can evaluate potential products with a reduced need for human labour or data.
Automated Testing for Games: In the first sub-use case, Automated Testing for Games, MODL.AI demonstrates how AI tools can enhance the development process through automated testing and bug finding. The first objective is to provide a prototype of the platform where users can investigate a number of quality assurance reports generated by an AI agent. These reports are generated by a quality diversity agent run in a simple game demo. We are currently working to expand the prototype of the platform into a fully functional one, where the user can investigate a number of quality assurance reports generated by an AI agent in any game, supported by a plug-in for the world’s most prolific game engines, Unity and Unreal Engine, for easy integration for game developers.
Improved Music Analysis for Games: Even if video game producers usually ask human musicians to compose original background music, the development team needs audio examples that match the ambiance of the game in order to define audio mood boards and to provide music examples to facilitate the communication with the composers. The finding of suitable music examples is not a simple task, and it can take a long time. In this context IRCAM intends to demonstrate the benefit of AI methods in the context of developing video games. Based on an automatic analysis of music files, the demonstrator proposes an exploration of a wide music catalogue that is not manually annotated.
In the current release, a catalogue of 105,000 songs was analysed to predict attributes (music genres, moods, tempo, etc.) and to compute similarity embeddings. Then the “Music Explorer” demonstrator – a web service – allows the exploration of the catalogue in two ways: first, the user defines the musical attributes which fit the ambiance of the game, and the service proposes a list of songs which fit the attributes. Contrarily to similar tools, here the criterion is based on automatically estimated attributes, and the method is then applicable even for catalogues which are not manually annotated. The second search method is based on music similarity. Here the user chooses a reference song, and he selects one or more music concepts (among: genre, mood, instrumentation, era, harmony and rhythm) in order to define the meaning of “similarity”. Then, the service returns the list of the closest songs in the catalogue. The analyses, for attributes and similarity search, are based on AI methods, and the web service is composed of a GUI displayed in the web browser of the user, and a back-end integrated on a distant server and running the AI components remotely. During the first evaluation, the Music Explorer demonstrator proved its usefulness and its ability to quickly find music examples, in order to help video game producers during the creation of a game.
This use case developed by the Barcelona Supercomputing Center explores the relationship between Human creation and AI tools for music composition. Labelled as Human co-creation, it potentially may have a deep impact on an industry feeding content to a society continuously consuming media production. We are currently developing novel tools that may contribute to an efficient creation task using AI tools, where the efforts of the artist or creator are focused on deeply creative tasks, relying on the assistant to perform less critical parts transparently during content co-creation. As the functionalities of these models can be complex to handle, the purpose is to provide to the final user – typically a music creator – a collection of well-organised functionalities and capabilities for AI-assisted music creation. These functionalities enable users to a) train and manipulate the model using a defined dataset selected by them, b) generate from the trained model novel content based on a small audio seed, and c) assess the quality of the generated audio content and publish the content on popular audio platforms
The current developments allow a non expert user to use advanced, pre-trained generative models or to prepare datasets for training under controlled conditions. We include a number of generative models released under the AI4Media project but also elsewhere. In addition, we have explored user requirements to understand the needs of a community of non experts approaching AI tools. The implementation of musical processing tools opens the possibility to create in a transparent manner content used in multiple formats. Composers use large datasets with music fragments and combine them using Machine Learning methods. While a single training may provide a large amount of content (different audio files), using different datasets improves the quality and variability of the generated output. However, the computational requirements are large and better training methods and data models are needed.
Media companies have accumulated vast digital archives and collections of images and videos over the years. Since these collections have been gradually and iteratively built over time, often by different departments and units of media companies, they usually have little or no metadata such as tags, categories, and other types of annotations. This lack of coherent media asset organisation tailored to the media company business and services precludes the easy reusage and the successful monetisation of these media assets, and the creation and offering of new services. In addition, both big traditional media companies and more so digital media platforms combine in their collections both media content, created by these companies, but increasingly also user-generated content (UGC). Such hybrid media archives need advanced content moderation (CM) solutions, often working in real time to safeguard viewers and meet law and regulation requirements of various jurisdictions.
Currently our work focuses on including the integration and the use of Imagga’s content moderation and facial recognition technologies. Imagga has implemented novel methodologies based on advanced deep learning techniques such as CNNs and RNNs aimed at photo and video moderation – tagging, categorisation, and facial recognition As part of the content moderation, we have included object detection of infamous symbols and analysis whether the video has not-safe-for-work (NSFW) or explicit content. For facial recognition, we have included a celebrity recognition model able to recognize around 3,000 different celebrities. For each scene in each video, we have generated annotation metadata that is used for filtering and searching. The videos are split by keyframes and then processed by Imagga’s technologies to receive coordinates for infamous symbols and celebrities, present in the extracted keyframe images. These frames are also analysed for the presence of NSFW content. Then through a user-friendly web UI, the content can be searched and filtered.
Conclusions
The active guidance provided by use case partners to research partners throughout the integration process plays a crucial role in achieving success in all cases. This emphasises the significance of industry and research collaboration right from the project’s inception, highlighting that a lab-to-market transfer process requires their joint efforts. Moreover, the direct involvement of end-users in iterative and agile development processes further amplifies the potential market adoption of AI-related innovations, fostering a user-centric approach and ensuring the practical relevance of the developed solutions.
Nevertheless, as in the case of every innovation process, various challenges arise and often need to be tackled impromptu. Even more so, since we are dealing with the fast-evolving and frequently disrupted domain of digital technologies. Among these challenges are problems related to structural and organisational differences among the consortium partners, integration complexities, usability and understandability issues, human-AI collaboration challenges, and dataset creation concerns. Moreover, ChatGPT’s public release was certainly a game changer for AI-driven innovation, although its long term impact on the media industry remains to be seen.
To address these challenges, we sought to enhance collaboration by establishing closer, one-to-one relationships between industrial and research partners. Knowledge exchange, co-design activities, and joint events were also used to strengthen collaboration. Efforts to design more user-friendly interfaces and increase transparency for end users of the demonstrators were made in order to address usability issues. Human-AI collaboration aspects were improved through the development of transparency information and feedback mechanisms to enhance user trust in AI-generated results. Finally, careful dataset curation and Non-Disclosure-Agreements addressed bias and privacy concerns. Overall, our experience shows that a collaborative approach and ongoing adaptability are key to addressing challenges and ensuring the successful integration of AI research innovations into real-world applications.
Author(s): DanaeTsabouraki (Athens Technology Center); Birgit Gray (Deutsche Welle); Chaja Libot (VRT); Maurizio Montagnuolo (RAI); Rasa Bocyte (Netherlands Institute for Sound & Vision); Christoffer Holmgård (modl.ai); Rémi Mignot (IRCAM); Artur Garcia (BSC); Chris Georgiev (Imagga Technologies).