AI4media project

Development of a Realistic Crowd Simulation Environment for Fine-grained Validation of People Tracking Methods

CrowdSim2: an Open Synthetic Benchmark for Object Detectors

A Spatio-Temporal Attentive Network for Video-Based Crowd Counting

Learning to Detect Fallen People in Virtual Worlds

Predicting Player Engagement in Tom Clancy’s The Division 2: A Multimodal Approach via Pixels and Gamepad Actions

Selecting a Diverse Set of Aesthetically-pleasing and Representative Video Thumbnails using Reinforcement Learning

Expressive Piano Performance Rendering from Unpaired Data

Recent advances in data-driven expressive performance rendering have enabled automatic models to reproduce the characteristics and the variability of human performances of musical compositions. However, these models need to be trained with aligned pairs of scores and performances and they rely notably on score-specific markings, which limits their scope of application. This work tackles the piano performance rendering task in a low-informed setting by only considering the score note information and without aligned data.
The proposed model relies on an adversarial training where the basic score notes properties are modified in order to reproduce the expressive qualities contained in a dataset of real performances. First results for unaligned score-to-performance rendering are presented through a conducted listening test. While the interpretation quality is not on par with highly-supervised methods and human renditions, our method shows promising results for transferring realistic expressivity into scores.

DDSP-Piano: A Neural Sound Synthesizer Informed by Instrument Knowledge

Instrument sound synthesis using deep neural networks has received numerous improvements over the last couple of years. Among them, the Differentiable Digital Signal Processing (DDSP) framework has modernized the spectral modeling paradigm by including signal-based synthesizers and effects into fully differentiable architectures. The present work extends the applications of DDSP to the task of polyphonic sound synthesis, with the proposal of a differentiable piano synthesizer conditioned on MIDI inputs. The model architecture is motivated by high-level acoustic modeling knowledge of the instrument, which, along with the sound structure priors inherent to the DDSP components, makes for a lightweight, interpretable, and realistic-sounding piano model. A subjective listening test has revealed that the proposed approach achieves better sound quality than a state-of-the-art neural-based piano synthesizer, but physical-modeling-based models still hold the best quality. Leveraging its interpretability and modularity, a qualitative analysis of the model behavior was also conducted: it highlights where additional modeling knowledge and optimization procedures could be inserted in order to improve the synthesis quality and the manipulation of sound properties. Eventually, the proposed differentiable synthesizer can be further used with other deep learning models for alternative musical tasks handling polyphonic audio and symbolic data.

Eliciting and Annotating Emotion in Virtual Spaces

We propose an online methodology where moment-to-moment affect annotations are gathered while exploring and visually interacting with virtual environments. For this task we developed an application to support this methodology, targeting both a VR and a desktop experience, and conducted a study to evaluate these two media of display. Results show that in terms of usability, both experiences were perceived equally positive. Presence was rated significantly higher for the VR experience, while participant ratings indicated a tendency for medium distraction during the annotation process. Additionally, effects between the architectural design elements were identified with perceived pleasure. The strengths and limitations of the proposed approach are highlighted to ground further work in gathering affect data in immersive and interactive media within the context of architectural appraisal.

Knowing Your Annotator: Rapidly Testing the Reliability of Affect Annotation

The laborious and costly nature of affect annotation is a key detrimental factor for obtaining large scale corpora with valid and reliable affect labels. Motivated by the lack of tools that can effectively determine an annotator’s reliability, this paper proposes general quality assurance (QA) tests for real-time continuous annotation tasks. Assuming that the annotation tasks rely on stimuli with audiovisual components, such as videos, we propose and evaluate two QA tests: a visual and an auditory QA test. We validate the QA tool across 20 annotators that are asked to go through the test followed by a lengthy task of annotating the engagement of gameplay videos. Our findings suggest that the proposed QA tool reveals, unsurprisingly, that trained annotators are more reliable than the best of untrained crowdworkers we could employ. Importantly, the QA tool introduced can predict effectively the reliability of an affect annotator with 80% accuracy, thereby, saving on resources, effort and cost, and maximizing the reliability of labels solicited in affective corpora. The introduced QA tool is available and accessible through the PAGAN annotation platform.

Proceedings of the 3rd International Workshop on Learning to Quantify (LQ 2023)

The 3rd International Workshop on Learning to Quantify (LQ 2023 – https: //lq-2023.github.io/) was held in Torino, IT, on September 18, 2023, as a satellite workshop of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2023). While the 1st edition of the workshop (LQ 2021 – https://cikmlq2021. github.io/) had to be an entirely online event, LQ 2023 (like the 2nd edi- tion LQ 2022 – https://lq-2023.github.io/) was a hybrid event, with presentations given in-presence, and both in-presence attendees and remote attendees. The workshop was the second part (Sep 18 afternoon) of a full-day event, whose first part (Sep 18 morning) consisted of a tutorial on Learning to Quantify presented by Alejandro Moreo and Fabrizio Sebastiani. The LQ 2023 workshop consisted of the presentations of seven contributed papers, and a final collective discussion on the open problems of learning to quantify and on future initiatives.

The present volume contains five of the seven contributed papers that were accepted for presentation at the workshop (the authors of the other two papers decided not to have their paper included in the proceedings). Each contributed paper was submitted as a response to the call for papers, was reviewed by at least three members of the international program commit- tee, and was revised by the authors so as to take into account the feedback provided by the reviewers.

GreekPolitics: Sentiment Analysis on Greek Politically Charged Tweets

The rapid growth of on-line social media platforms has rendered opinion mining/sentiment analysis a critical area of research. This paper focuses on analyzing Twitter posts (tweets), written in the Greek language and politically charged in content. This is a rather underexplored topic, due to the inadequacy of publicly available annotated datasets. Thus, we present and release GreekPolitics: a dataset of Greek tweets with politically charged content, annotated for four different sentiments: polarity, figurativeness, aggressiveness and bias. GreekPolitics has been evaluated comprehensively using state-of-the-art Deep Neural Networks (DNNs) and data augmentation methods. This paper details the dataset, the evaluation process and the experimental results.

Quantifying the knowledge in Deep Neural Networks: an overview

Deep Neural Networks (DNNs) have proven to be extremely effective at learning a wide range of tasks. Due to their complexity and frequently inexplicable internal state, DNNs are difficult to analyze: their black-box nature makes it challenging for humans to comprehend their internal behavior. Several attempts to interpret their operation have been made during the last decade, but analyzing deep neural models from the perspective of the knowledge encoded in their layers is a very promising research direction, which has barely been touched upon. Such a research approach could provide a more accurate insight into a DNN model, its internal state, learning progress, and knowledge
storage capabilities. The purpose of this survey is two-fold: a) to review the concept of DNN knowledge quantification and highlight it as an important near-future challenge, as well as b) to provide a brief account of the scant existing methods attempting to actually quantify DNN knowledge. Although a few such algorithms have been proposed, this is an emerging topic still under investigation.

Political Tweet Sentiment Analysis For Public Opinion Polling

Public opinion measurement through polling is a classical political analysis task, e.g. for predicting national and local election results. However, polls are expensive to run and their results may be biased primarily due to improper population sampling. In this paper, we propose two innovative methods for employing tweet sentiment analysis’ results for public opinion polling. Our first method utilizes merely the tweet sentiment analysis’ results outperforming a plethora of well-recognised methods. In addition, we introduce a novel hybrid way to estimate electorally results from both public opinion polls and tweets. This method enables more accurate, frequent and inexpensive public opinion estimation and used for estimating the result of the 2023 Greek national election. Our method managed to achieve lower deviation than the conventional public opinion polls from the actual election’s results, introducing new possibilities for public opinion estimation using social media platforms.

Towards Human Society-inspired Decentralized DNN Inference

In human societies, individuals make their own decisions and they may select if and who may influence it, by e.g., consulting with people of their acquaintance or experts of a field. At a societal level, the overall knowledge is preserved and enhanced by individual person empowerment, where complicated consensus protocols have been developed over time in the form of societal mechanisms to assess, weight, combine and isolate individual people opinions. In distributed machine learning environments however, individual AI agents are merely part of a system where decisions are made in a centralized and aggregated fashion or require a fixed network topology, a practice prone to security risks and collaboration is nearly absent. For instance, Byzantine Failures may tamper both the training and inference stage of individual AI agents, leading to significantly reduced overall system performance. Inspired by societal practices, we propose a decentralized inference strategy where each individual agent is empowered to make their own decisions, by exchanging and aggregating information with other agents in their network. To this end, a ”Quality of Inference” consensus protocol (QoI) is proposed, forming a single commonly accepted inference rule applied by every individual agent. The overall system knowledge and decisions on specific manners can thereby be stored by all individual agents in a decentralized fashion, employing e.g., blockchain technology. Our experiments in classification tasks indicate that the proposed approach forms a secure decentralized inference framework, that prevents adversaries at tampering the overall process and achieves comparable performance with centralized decision aggregation methods.

Deep Reinforcement Learning with semi-expert distillation for autonomous UAV cinematography

Unmanned Aerial Vehicles (UAVs, or drones) have revolutionized modern media production. Being rapidly deployable “flying cameras”, they can easily capture aesthetically pleasing aerial footage of static or moving filming targets/subjects. Current approaches rely either on manual UAV/gimbal control by human experts or on a combination of complex computer vision algorithms and hardware configurations for automating the flight+flying process. This paper explores an efficient Deep Reinforcement Learning (DRL) alternative, which implicitly merges the target detection and path planning steps into a single algorithm. To achieve this, a baseline DRL approach is augmented with a novel policy distillation component, which transfers knowledge from a suitable, semi-expert Model Predictive Control (MPC) controller into the DRL agent. Thus, the latter is able to autonomously execute a specific UAV cinematography task with purely visual input. Unlike the MPC controller, the proposed DRL agent does not need to know the 3D world position of the filming target during inference. Experiments conducted in a photorealistic simulator showcase superior performance and training speed compared to the baseline agent while surpassing the MPC controller in terms of visual occlusion avoidance.

Knowledge Distillation-driven Communication Framework for Neural Networks: Enabling Efficient Student-Teacher Interactions

This paper presents a novel framework for facilitating communication and knowledge exchange among neural networks, leveraging the roles of both students and teachers. In our proposed framework, each node represents a neural network, capable of acting as either a student or a teacher. When new data is introduced and a network has not been trained on it, the node assumes the role of a student, initiating a communication process. The student node communicates with potential teachers, identifying those networks that have already been trained on the incoming data. Subsequently, the student node employs knowledge distillation techniques to learn from the teachers and gain insights from their accumulated knowledge. This approach enables efficient and effective knowledge transfer within the neural network ecosystem, enhancing learning capabilities and fostering collaboration among diverse networks. Experimental results demonstrate the efficacy of our framework in improving overall network performance and knowledge utilization.

Archives: Scientific Papers

Environment Classification via Blind Roomprints Estimation

Audio Splicing Detection and Localization Based on Acquisition Device Traces

Speaker-Independent Microphone Identification in Noisy Conditions

Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis

PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image Translation

Spectral Denoising for Microphone Classification

Neural Knowledge Transfer for Sentiment Analysis in Texts with Figurative Language

Exploiting Caption Diversity for Unsupervised Video Summarization