2022 . 06 . 27

One step ahead in multimedia analysis and summarization

AI4Media explores innovative Deep Neural Networks (DNNs) for image/video/ audio analysis and summarisation through cutting-edge machine learning. The work performed up to now resulted in novel ways to automatically shorten long videos through unsupervised key-frame extraction, as well as in novel AI tools for the management or retrieval of media datasets.

However, typical DNNs require very large amounts of labeled training data in order to achieve good performance. In a systematic effort to bypass this, AI4Media also researched novel approaches to training or adapting DNNs for scenarios marked by a lack of large-scale, domain-specific datasets or annotations. The result up to now includes several innovative methods for few-shot, semi-supervised or unsupervised learning with media data.

In addition, AI4Media has researched advanced audio analysis for automatic music annotation and audio partial matching/reuse detection, mainly relying on DNNs. Overall, these algorithms can be readily exploited by industry-oriented tools for intelligent and automated media archives, management, analysis, search or retrieval, as well as synthetic audio detection/verification.

In this context, AI4Media has produced, up to now, several modern AI tools for:

Video key-frame extraction. Check out the related papers:

Video Summarization Using Deep Neural Networks: A Survey (Link)

Adversarial Unsupervised video summarization augmented with dictionary loss (Link)

Information retrieval on cultural media datasets, relying on a synthesis of computational deep learning with symbolic semantic reasoning. Check out the related paper:

Learning and Reasoning for Cultural Metadata Quality (Link)

Few-shot object detection. Check out the related code:

Few-shot object detection (Code)

Unsupervised domain adaptation for traffic density estimation/counting or for visual object detection. Check out the related paper:

Domain Adaptation for Traffic Density Estimation (Link)

Advanced video browsing and search. Check out the related paper:

The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval (Link)

Semi-supervised learning for fine-grained visual categorization. Check out the related paper:

Fine-Grained Adversarial Semi-supervised Learning (Link)

Deep dictionary-based representation learning. Check out the related paper and code:

When Dictionary Learning Meets Deep Learning: Deep Dictionary Learning and Coding Network for Image Recognition With Limited Data (Link)

Deep Micro-Dictionary Learning and Coding Network (Code)

Even though these activities are only the outcomes of the first project period, future research plans have already been laid with the intention to expand upon them in exciting new directions.

Author: Ioannis Mademlis, (Aristotle University of Thessaloniki)