Main Content

Machine Learning for Audiovisual Analytics

Many photos of the tuscan countryside.

Enabling computers to extract insights from audiovisual data means to translate raw pixels or audio samples into a representation interpretable by humans or other computer systems. The goal is to train computers how to identify, classify, and categorize audiovisual content as humans do.

In this context, we focus on methods for the automated content analysis of digital audio/video material.
In the visual domain, this includes methods for cut detection, camera motion detection, text detection, object detection, person/face detection, concept detection, and similarity search. In the auditory domain, methods for speaker recognition, acoustic event recognition, and audio classification were developed. Furthermore, we investigated methods for object segmentation in satellite images and flourescence microscopy images. Many of the proposed methods are based on machine learning approaches, in particular on deep learning.

We are currently supporting several projects in the digital humanities. In particular, we helped to perform optical character recognition of scanned documents related to Syrian literature and developed a web application for doing research with the resulting data in the field of Arabic literature. Moreover, we work on a deep learning approach to perform "stamp recognition" on index cards in the context of producing the Lessico Etimologico Italiano.

Selected Publications

  • Nikolaus Korfhage, Markus Mühling, and Bernd Freisleben:
    ElasticHash: Semantic Image Similarity Search by Deep Hashing with Elasticsearch. 19th International Conference on Computer Analysis of Images and Patterns (CAIP). In: N. Tsapatsoulis et al. (Eds.): CAIP 2021, LNCS 13053, pp. 1–10, Springer Nature Switzerland, 2021. (Best Paper Award)
  • Daniel Schneider, Nikolaus Korfhage, Markus Mühling, Peter Lüttig, Bernd Freisleben:
    Automatic Transcription of Organ Tablature Music Notation with Deep Neural Networks. Transactions of the International Society for Music Information Retrieval, doi: 10.5334/tismir.77, 4(1):14-28, 2021.
  • Nikolaus Korfhage, Markus Mühling, Stefan Ringshandl, Anke Becker, Bernd Schmeck, Bernd Freisleben:
    Detection and Segmentation of Morphologically Complex Eukaryotic Cells in Fluorescence Microscopy Images via Feature Pyramid Fusion. PLOS Computational Biology 16(9): e1008179, 2020
  • Nikolaus Korfhage, Markus Mühling, Bernd Freisleben:
    Intentional Image Similarity Search. 9th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition, Winterthur, Switzerland, LNCS 12294, 23-35, Springer, 2020
  • Markus Mühling, Jakob Franz, Nikolaus Korfhage, Bernd Freisleben:
    Bird Species Recognition via Neural Architecture Search. CLEF 2020, CEUR Proceedings, Vol. 2696, 2020
    (Winner of BirdCLEF 2020)
  • Markus Mühling, Manja Meister, Nikolaus Korfhage, Jörg Wehling, Angelika Hörth, Ralph Ewerth, Bernd Freisleben:
    Content-based Video Retrieval in Historical Collections of the German Broadcasting Archive. International Journal on Digital Libraries 20(2): 167-183, 2019
  • Johannes Drönner, Nikolaus Korfhage, Sebastian Egli, Markus Mühling, Boris Thies, Jörg Bendix, Bernd Freisleben, Bernhard Seeger:
    Fast Cloud Segmentation Using Convolutional Neural Networks. Remote Sensing 10(11): 1782, 2018
  • Markus Mühling, Nikolaus Korfhage, Eric Müller, Christian Otto, Matthias Springstein, Thomas Langelage, Uli Veith, Ralph Ewerth, Bernd Freisleben:
    Deep Learning for Content-based Video Retrieval in Film and Television Production. Multimedia Tools and Applications 76(21): 22169-22194, 2017
  • Ralph Ewerth, Markus Mühling, Bernd Freisleben:
    Robust Video Content Analysis via Transductive Learning. ACM Transactions on Intelligent Systems and Technology 3(3): 41:1-41:26, 2012
  • Ralph Ewerth, Khalid Ballafkir, Markus Mühling, Dominik Seiler, Bernd Freisleben:
    Long-Term Incremental Web-Supervised Learning of Visual Concepts via Random Savannas. IEEE Transactions on Multimedia 14(4): 1008-1020, 2012
  • Martin Schwalb, Ralph Ewerth, Bernd Freisleben:
    Fast Motion Estimation on Graphics Hardware for H.264 Video Encoding. IEEE Transactions on Multimedia 11(1): 1-10, 2009
  • Thilo Stadelmann, Bernd Freisleben:
    Unfolding Speaker Clustering Potential: A Biomimetic Approach. 17th International Conference on Multimedia 2009, Beijing, China, 185-194, ACM, 2009
  • Ralph Ewerth, Markus Mühling, Bernd Freisleben:
    Self-Supervised Learning of Face Appearances in TV Casts and Movies. International Journal on Semantic Computing 1(2): 185-204, 2007

Further information