Seminar Promotionsprogramm Data Science

Dieses Seminar richtet sich in erster Linie an die Mitgliederinnen und Mitglieder des strukturierten Promotionsprogrammes Data Science. Gäste sind aber natürlich jederzeit herzlich willkommen.

Das Seminar findet während der Vorlesungszeit in einem zweiwöchigen Rhythmus statt und dauert stets etwa eine Stunde. Die nächsten Termine können der Auflistung unten entnommen werden.

Aktuelle Termine

Wannimmer möglich, wird das Seminar als hybride Veranstaltung (HS V und parallel über BBB) stattfinden. Für den Präsenzteil findet die sogenannte "3G-Regel" Anwendung.

Mittwoch, 20. Oktober 2021, 16:30 Uhr

Vortrag: Paolo Rosso (Full Professor, NLEL Valencia)
Titel: Detecting online harmful information: fake news, conspiracy theories, and misogyny

The ease of generating content online and the anonymity that social media provide have increased the amount of harmful content that is published. Fake news, conspiracy theories, and offensive content are published and propagated on daily basis. In this keynote I will describe how fake news, and conspiracy theories, can be detected going beyond just considering textual information: emotions, psycholinguistics characteristics, and multimodal information play an important role and should be in the loop. At the end of my talk I will also mention the problem of hate speech, and concretely misogyny, and the Multimodal Automatic Misogyny Identification (MAMI) shared task that we will be organising at SemEval 2022.

Mittwoch, 03. November 2021, 16:30 Uhr

Vortrag: Ramit Sawhney (Software Developer, IIIT Delhi)
Titel: HYPMIX - Method for Interpolative Data Augmentation in Hyperbolic Space, applied on Text, Speech, and Vision
Abstract: Paper-Link

Mittwoch, 17. November 2021, 16:30 Uhr

Vortrag: M. Sc. Dorian Vogel (Doktorand SPP, AG Prof. Dahlke)
Titel: High order methods for PDEs: Quarklets, hp-adaptivity and tree structures
Abstract: tba

Mittwoch, 24. November 2021, 15:30 Uhr

Vortrag: Prof. Dr. Jochen Leidner (Universitätsprofessor, Hochschule Coburg)
Titel: Fun and Profit with Machine-Learning & Natural Language Processing

The triad of Natural Language Processing (NLP), Information Retrieval (NLP) and Machine Learning (ML) provides powerful tools and techniques to build useful and commercially valuable systems that support today's professional knowledge workers in many vertical domains, including law, police/law enforcement, journalism/news, pharmacology, finance and insurance.
In this talk, I will describe a number of practical examples of how NLP, IR and ML can help build capabilities that generate enormous economical value in the B2B business (not facing consumers) involving professionals like lawyers, judges, reporters/investigative journalists, financial analysts and traders and insurance claim investigators. As we shall see, these techniques have long arrived in the daily lives and work of many professionals.

About the speaker:
Dr. Jochen L. Leidner is the Professor of Explainable and Responsible Artificial Intelligence with Insurance Applications at Coburg University of Applied Sciences and Arts in Germany, and a Visiting Professor of Data Analytics at the University of Sheffield in the UK. His research areas include Natural Language Processing, Information Retrieval and applied Machine Learning.
After obtaining an M.A. in Computational Linguistics, English language and linguistics and computer science at FAU (Friedrich-Alexander-Universität Erlangen-Nuremberg) and an M.Phil. in Computer Speech, Text and Internet Technology, at the University of Cambridge, he obtained a Ph.D. in Informatics from the University of Edinburgh. His industry experience includes software engineering roles at SAP and Director of Research (R\&D) positions at Thomson Reuters and startups.
His research was published at conferences including ACM SIGIR, ACL, AGI, NAACL, IEEE ICDM and ECIR. He is a Fellow of the Royal Geographical Society, a member of ACL, ACM, GI, SRA, the BCS Information Retrieval Specialist Group and presently Award Chair for the Microsoft/BCS IRSG Karen Spärck Jones Award. He won scholarships and prizes from DAAD, Cambridge Prince‘s Trust, and Peterhouse, Cambridge.
In 2015, and 2016, he was twice named Thomson Reuters Inventor of the Year for the best patent application and won the Best Method Paper Award of the Social Media Society (2016) as well as grants like a Scottish Enterprise/Royal Society of Edinburgh Enterprise Fellowship in Electronic Markets (2007) and a Royal Academy of Engineering Visiting Professorship (2017-2020). He has co-authored or edited more than 70 publications in AI, computational linguistics, machine learning, information retrieval, geography and pharmacology, and he holds more than a dozen granted patents.

Mittwoch, 08. Dezember 2021, 16:30 Uhr

Vortrag: Prof. Dr. Massimo Fornasier (Universitätsprofessor, TU München)
Titel: Three Mathematical Tales of Machine Learning

I tell three mathematical tales of machine learning:
1. Identification of deep neural networks, 2. Global optimization over manifolds, 3. Mean-field optimal control of NeurODE.
Tale 1. is about the proof that, despite the NP-hardness of the problem, generic neural networks can be identified up to natural symmetries by a finite number of input-output samples scaling with the complexity of the network. Numerical validation of the result is presented. A crucial subproblem of the identification pipeline is the solution of a nonconvex optimization over the sphere.
Tale 2. is in fact about solving global optimizations over spheres by means of a multi-agent dynamics, which combines a consensus mechanism and random exploration. The proof of global solution is based on showing that the large particle limit of the SDE system is distributed as the solution of the deterministic PDE, whose large time asymptotics converges to a globalminimizer. I present numerical results in robust linear regression for computing eigenfaces.
In the Tale 3. I introduce NeurODE, which are neural networks approximable by ODE. I show that their training can be formulated as a mean-field optimal control and I present results of existence and finite particle approximation. I also show the derivation of a mean-field Pontryagin maximum principle and its well posedness. Again a numerical experiment of a simple 2D classification problem validates the theoretical results.

Mittwoch, 19.01.2022, 16:30 Uhr

Vortrag: Prof. Dr. Philipp Grohs (Universitätsprofessor, Universität Wien)
Titel: tba
Abstract: tba

Mittwoch, 26.01.2022, 16:30 Uhr

Vortrag: Dr. Sebastian Lerch (Junior Research Group Leader, KIT)
Titel: tba
Abstract: tba

Mittwoch, 09.02.2022, 16:30 Uhr

Vortrag: M. Sc. Frank Sommer (Doktorand SPP, AG Prof. Komusiewicz)
Titel: tba
Abstract: tba

Termine vergangener Semester

Mittwoch, 05. Mai 2021, 16:30 Uhr

Vortrag:  M. Sc. Sebastian Spänig (Doktorand SPP, AG Prof. Heider)
Titel:  A large-scale comparative study on peptide encodings for biomedical classification

Owing to the great variety of distinct peptide encodings, working on a biomedical classification task at hand is challenging. Researchers have to determine encodings capable to represent underlying patterns as numerical input for the subsequent machine learning. A general guideline is lacking in the literature, thus, we present here the first large-scale comprehensive study to investigate the performance of a wide range of encodings on multiple datasets from different biomedical domains. For the sake of completeness,we added additional sequence- and structure-based encodings. In particular, we collected 50 biomedical datasets and defined a fixed parameter space for 48 encoding groups, leading to a total of397,700 encoded datasets. Our results demonstrate that none of the encodings are superior for all biomedical domains. Nevertheless, some encodings often outperform others, thus reducing the initial encoding selection substantially. Our work offers researchers to objectively compare novel encodings to the state of the art. Our findings pave the way for a more sophisticated encoding optimization, e.g.,as part of automated machine learning pipelines.The work presented here is implemented as a large-scale, end-to-end workflow designed for easy reproducibility and extensibility. All standardized datasets and results are available for download to comply with FAIR standard.

Mittwoch, 19. Mai 2021, 16:30 Uhr

Vortrag:  Dr. rer. nat. Martin Grohe (Universitätsprofessor, RWTH Aachen)
Titel:  word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

Vector representations of graphs and relational data, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to these forms of structured data. A wide range of methods for generating such vector embeddings has been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively little attention from a theoretical point of view.

The first part of my talk will be devoted to embedding algorithms in practice. Starting with a brief overview of common embedding techniques, I will speak about a new embedding algorithm for dynamically changing relational data. In the second part of my talk, I will discuss theoretical ideas that have proved useful for analysing and designing vector embeddings and that may help us to develop a more principled view on the area.

Mittwoch, 02. Juni 2021, 16:30 Uhr

Vortrag:  M. Sc. Sven Heuer (Doktorand SPP, AG Prof. Dahlke)
Titel:  Analysing audio files by the example of birdsong recognition


Audio transformation is a well-researched area of interest in Mathematics and Computer Science with the short time Fourier transform as a standard tool. However, some finer aspects are still open, especially concerning the function spaces that have to be considered, depending on the given signal. In this talk, we will use the example of audio files containing birdsongs to look at some of these aspects. Our main topics will be denoising and compression, detection and classification. For the classification part, we will also look at the integration of the Gabor transform directly into a Convolutional Neural Network.
A different transformation to use would be the Wavelet transform. We will look at some of its properties and in the end give an outlook at how we might be able to integrate it into our current approaches.

Mittwoch, 09. Juni 2021, 13:00 Uhr

Im Rahmen des Oberseminars zur Numerik und Optimierung.
Vortrag:  Prof. Dr. Holger Wendland (Universität Bayreuth)
Titel:  Multiscale Approximation by Radial Basis Functions


Hier verlinkt.

Mittwoch, 16. Juni 2021, 16:30 Uhr

Vortrag:  Prof. Dr. Michael Möller (Universität Siegen)
Titel:  On the confluence of machine learning and energy minimization methods for inverse problems

Mittwoch, 30. Juni 2021, 16:30 Uhr

Vortrag:  M. Sc. Niels Grüttemeier (Doktorand SPP, AG Komusiewicz)
Titel:  The Computational Problem of Bayesian Network Structure Learning

Bayesian Network Structure Learning (BNSL) is motivated by finding conditional dependencies of random variables such that these dependencies describe multivariate probability distribution as closely as possible. In the computational problem one is given a set of vertices N and a family of so-called local scores and one aims to find an arc-set A such that (N,A) is a directed acyclic graph and the score is maximized. In this talk, I provide some intuition for this computational problem and describe its role in the process of modelling and reasoning with Bayesian networks. Furthermore, I sketch some of our results on the parameterized complexity of BNSL when additional structureal constraints are posed on the network or on its so-called moralized graph.