AI4media project

A Report on Modern Evolutionary Methods for Quality-driven Optimization in Information Visualization, Image Processing and Design Problems

PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image Translation

Neural Knowledge Transfer for Sentiment Analysis in Texts with Figurative Language

Exploiting Caption Diversity for Unsupervised Video Summarization

Development of a Realistic Crowd Simulation Environment for Fine-grained Validation of People Tracking Methods

CrowdSim2: an Open Synthetic Benchmark for Object Detectors

Eliciting and Annotating Emotion in Virtual Spaces

We propose an online methodology where moment-to-moment affect annotations are gathered while exploring and visually interacting with virtual environments. For this task we developed an application to support this methodology, targeting both a VR and a desktop experience, and conducted a study to evaluate these two media of display. Results show that in terms of usability, both experiences were perceived equally positive. Presence was rated significantly higher for the VR experience, while participant ratings indicated a tendency for medium distraction during the annotation process. Additionally, effects between the architectural design elements were identified with perceived pleasure. The strengths and limitations of the proposed approach are highlighted to ground further work in gathering affect data in immersive and interactive media within the context of architectural appraisal.

Proceedings of the 3rd International Workshop on Learning to Quantify (LQ 2023)

The 3rd International Workshop on Learning to Quantify (LQ 2023 – https: //lq-2023.github.io/) was held in Torino, IT, on September 18, 2023, as a satellite workshop of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2023). While the 1st edition of the workshop (LQ 2021 – https://cikmlq2021. github.io/) had to be an entirely online event, LQ 2023 (like the 2nd edi- tion LQ 2022 – https://lq-2023.github.io/) was a hybrid event, with presentations given in-presence, and both in-presence attendees and remote attendees. The workshop was the second part (Sep 18 afternoon) of a full-day event, whose first part (Sep 18 morning) consisted of a tutorial on Learning to Quantify presented by Alejandro Moreo and Fabrizio Sebastiani. The LQ 2023 workshop consisted of the presentations of seven contributed papers, and a final collective discussion on the open problems of learning to quantify and on future initiatives.

The present volume contains five of the seven contributed papers that were accepted for presentation at the workshop (the authors of the other two papers decided not to have their paper included in the proceedings). Each contributed paper was submitted as a response to the call for papers, was reviewed by at least three members of the international program commit- tee, and was revised by the authors so as to take into account the feedback provided by the reviewers.

GreekPolitics: Sentiment Analysis on Greek Politically Charged Tweets

The rapid growth of on-line social media platforms has rendered opinion mining/sentiment analysis a critical area of research. This paper focuses on analyzing Twitter posts (tweets), written in the Greek language and politically charged in content. This is a rather underexplored topic, due to the inadequacy of publicly available annotated datasets. Thus, we present and release GreekPolitics: a dataset of Greek tweets with politically charged content, annotated for four different sentiments: polarity, figurativeness, aggressiveness and bias. GreekPolitics has been evaluated comprehensively using state-of-the-art Deep Neural Networks (DNNs) and data augmentation methods. This paper details the dataset, the evaluation process and the experimental results.

Quantifying the knowledge in Deep Neural Networks: an overview

Deep Neural Networks (DNNs) have proven to be extremely effective at learning a wide range of tasks. Due to their complexity and frequently inexplicable internal state, DNNs are difficult to analyze: their black-box nature makes it challenging for humans to comprehend their internal behavior. Several attempts to interpret their operation have been made during the last decade, but analyzing deep neural models from the perspective of the knowledge encoded in their layers is a very promising research direction, which has barely been touched upon. Such a research approach could provide a more accurate insight into a DNN model, its internal state, learning progress, and knowledge
storage capabilities. The purpose of this survey is two-fold: a) to review the concept of DNN knowledge quantification and highlight it as an important near-future challenge, as well as b) to provide a brief account of the scant existing methods attempting to actually quantify DNN knowledge. Although a few such algorithms have been proposed, this is an emerging topic still under investigation.

Political Tweet Sentiment Analysis For Public Opinion Polling

Public opinion measurement through polling is a classical political analysis task, e.g. for predicting national and local election results. However, polls are expensive to run and their results may be biased primarily due to improper population sampling. In this paper, we propose two innovative methods for employing tweet sentiment analysis’ results for public opinion polling. Our first method utilizes merely the tweet sentiment analysis’ results outperforming a plethora of well-recognised methods. In addition, we introduce a novel hybrid way to estimate electorally results from both public opinion polls and tweets. This method enables more accurate, frequent and inexpensive public opinion estimation and used for estimating the result of the 2023 Greek national election. Our method managed to achieve lower deviation than the conventional public opinion polls from the actual election’s results, introducing new possibilities for public opinion estimation using social media platforms.

Towards Human Society-inspired Decentralized DNN Inference

In human societies, individuals make their own decisions and they may select if and who may influence it, by e.g., consulting with people of their acquaintance or experts of a field. At a societal level, the overall knowledge is preserved and enhanced by individual person empowerment, where complicated consensus protocols have been developed over time in the form of societal mechanisms to assess, weight, combine and isolate individual people opinions. In distributed machine learning environments however, individual AI agents are merely part of a system where decisions are made in a centralized and aggregated fashion or require a fixed network topology, a practice prone to security risks and collaboration is nearly absent. For instance, Byzantine Failures may tamper both the training and inference stage of individual AI agents, leading to significantly reduced overall system performance. Inspired by societal practices, we propose a decentralized inference strategy where each individual agent is empowered to make their own decisions, by exchanging and aggregating information with other agents in their network. To this end, a ”Quality of Inference” consensus protocol (QoI) is proposed, forming a single commonly accepted inference rule applied by every individual agent. The overall system knowledge and decisions on specific manners can thereby be stored by all individual agents in a decentralized fashion, employing e.g., blockchain technology. Our experiments in classification tasks indicate that the proposed approach forms a secure decentralized inference framework, that prevents adversaries at tampering the overall process and achieves comparable performance with centralized decision aggregation methods.

Deep Reinforcement Learning with semi-expert distillation for autonomous UAV cinematography

Unmanned Aerial Vehicles (UAVs, or drones) have revolutionized modern media production. Being rapidly deployable “flying cameras”, they can easily capture aesthetically pleasing aerial footage of static or moving filming targets/subjects. Current approaches rely either on manual UAV/gimbal control by human experts or on a combination of complex computer vision algorithms and hardware configurations for automating the flight+flying process. This paper explores an efficient Deep Reinforcement Learning (DRL) alternative, which implicitly merges the target detection and path planning steps into a single algorithm. To achieve this, a baseline DRL approach is augmented with a novel policy distillation component, which transfers knowledge from a suitable, semi-expert Model Predictive Control (MPC) controller into the DRL agent. Thus, the latter is able to autonomously execute a specific UAV cinematography task with purely visual input. Unlike the MPC controller, the proposed DRL agent does not need to know the 3D world position of the filming target during inference. Experiments conducted in a photorealistic simulator showcase superior performance and training speed compared to the baseline agent while surpassing the MPC controller in terms of visual occlusion avoidance.

Knowledge Distillation-driven Communication Framework for Neural Networks: Enabling Efficient Student-Teacher Interactions

This paper presents a novel framework for facilitating communication and knowledge exchange among neural networks, leveraging the roles of both students and teachers. In our proposed framework, each node represents a neural network, capable of acting as either a student or a teacher. When new data is introduced and a network has not been trained on it, the node assumes the role of a student, initiating a communication process. The student node communicates with potential teachers, identifying those networks that have already been trained on the incoming data. Subsequently, the student node employs knowledge distillation techniques to learn from the teachers and gain insights from their accumulated knowledge. This approach enables efficient and effective knowledge transfer within the neural network ecosystem, enhancing learning capabilities and fostering collaboration among diverse networks. Experimental results demonstrate the efficacy of our framework in improving overall network performance and knowledge utilization.

Prompting Visual-Language Models for Dynamic Facial Expression Recognition

This paper presents a novel visual-language model called DFER-CLIP, which is based on the CLIP model and designed for in-the-wild Dynamic Facial Expression Recognition (DFER). Specifically, the proposed DFER-CLIP consists of a visual part and a textual part. For the visual part, based on the CLIP image encoder, a temporal model consisting of several Transformer encoders is introduced for extracting temporal facial expression features, and the final feature embedding is obtained as a learnable “class” token. For the textual part, we use as inputs textual descriptions of the facial behaviour that is related to the classes (facial expressions) that we are interested in recognising — those descriptions are generated using large language models, like ChatGPT. This, in contrast to works that use only the class names and more accurately captures the relationship between them. Alongside the textual description, we introduce a learnable token which helps the model learn relevant context information for each expression during training. Extensive experiments demonstrate the effectiveness of the proposed method and show that our DFER-CLIP also achieves state-of-the-art results compared with the current supervised DFER methods on the DFEW, FERV39k, and MAFW benchmarks.

MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset

Deep learning has achieved great success in recent years with the aid of advanced neural network structures and large-scale human-annotated datasets. However, it is often costly and difficult to accurately and efficiently annotate large-scale datasets, especially for some specialized domains where fine-grained labels are required. In this setting, coarse labels are much easier to acquire as they do not require expert knowledge. In this work, we propose a contrastive learning method, called masked contrastive learning (MaskCon) to address the under-explored problem setting, where we learn with a coarse-labelled dataset in order to address a finer labelling problem. More specifically, within the contrastive learning framework, for each sample our method generates soft-labels with the aid of coarse labels against other samples and another augmented view of the sample in question. By contrast to self-supervised contrastive learning where only the sample’s augmentations are considered hard positives, and in supervised contrastive learning where only samples with the same coarse labels are considered hard positives, we propose soft labels based on sample distances, that are masked by the coarse labels. This allows us to utilize both inter-sample relations and coarse labels. We demonstrate that our method can obtain as special cases many existing state-of-the-art works and that it provides tighter bounds on the generalization error. Experimentally, our method achieves significant improvement over the current state-of-the-art in various datasets, including CIFAR10, CIFAR100, ImageNet-1K, Standford Online Products and Stanford Cars196 datasets.

Publisher: N/A

Domain Expertise Assessment for Multi-DNN Agent Systems

Towards Wine Tasting Activity Recognition for a Digital Sommelier

ProGAP: Progressive Graph Neural Networks with Differential Privacy Guarantees

Detecting Images Generated by Diffusers

Porting Large Language Models to Mobile Devices for Question Answering

MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains

Universal Local Attractors on Graphs

Reassembling digital archives—strategies for counter-archiving

Criteria for the Objective Assessment of Quality and Diversity in Volume Visualization