Seminar Promotionsprogramm Data Science

Dieses Seminar richtet sich in erster Linie an die Mitgliederinnen und Mitglieder des strukturierten Promotionsprogrammes Data Science. Gäste sind aber natürlich jederzeit herzlich willkommen.

Das Seminar findet während der Vorlesungszeit in einem zweiwöchigen Rhythmus statt und dauert stets etwa eine Stunde. Die nächsten Termine können der Auflistung unten entnommen werden.

Aktuelle Termine

Wann immer möglich, wird das Seminar als hybride Veranstaltung stattfinden. Für den Präsenzteil findet die aktuelle Allgemeinverfügung zur Infektionsvermeidung mit dem Coronavirus der Philipps-Universität Marburg Anwendung (https://www.uni-marburg.de/de/universitaet/administration/sicherheit/coronavirus/regelungen).

Donnerstag, 20. Oktober 2022, 14:15 Uhr, Raum 04A23 (HS V)

Vortrag: Yunxiao Ren (Doktorandin SPP, AG Prof. Heider)

Titel: Deep Transfer Learning Enables Robust Prediction of Antimicrobial Resistance for Small Sample Sizes

Antimicrobial resistance (AMR) has become one of the serious global health problems, threatening the effective treatment of a growing number of infections. Machine learning and deep learning show great potential in rapid and accurate AMR predictions. However, a large number of samples for the training of these models are essential. In particular, for novel antibiotics, limited training samples and data imbalance hinder the models’ generalization performance and overall accuracy. We propose a deep transfer learning model that we firstly constructed a basic convolutional neural network (CNN) for each antibiotic in our dataset including ciprofloxacin (CIP), cefotaxime (CTX), ceftazidime (CTZ), and gentamicin (GEN), and then used the model for CIP, i.e., the best-performing CNN, as the pre-trained model and transferred the knowledge and to improve the prediction of the other three antibiotics, i.e., CTX, CTZ, and GEN. Our results demonstrated that our approach can improve model performance for AMR prediction on small, imbalanced datasets. As our approach relies on transfer learning and secondary mutations, it is also applicable to novel antibiotics and emerging resistance in the future.

Donnerstag, 3. November 2022, 14:15 Uhr, Raum 04A23 (HS V)

Vortrag: Jan Ruhland (Doktorand SPP, AG Prof. Heider)

Titel: The virtual doctor - a medical decision support system

The COVID pandemic has revealed the flaws of health systems around the world and the immense pressure physcians are exposed to. But no impactful measures have yet been established to relieve health-care workers from the increasing burdens. The WHO even estimates a shortage of 12.9 million health-care workers by 2035. In this talk, I will introduce the virtual doctor project which aims to relieve healthcare professionals by providing automated examinations of patients as well as disease predictions based on machine learning models. The project ranges from utilized hardware for automated data collection to machine learning methods to predict possible diseases.

Donnerstag, 17. November 2022, 14:15 Uhr, Raum 04A23 (HS V)

Vortrag: Leon Fehse (Doktorand SPP, AG Prof. Heider)

Titel: Data analysis of the taxonomic profile of the human gut microbiome

Which microbial species our gut contains and with what abundance these species occur in it, is believed to have a considerable impact on our overall health and well-being. Unsurprisingly in recent years the scientific community has seen an increased interest for research focusing on the biomedical importance of the microbiome to many different aspects of our health. In my research I look closer at the taxonomic profile of the human gut microbiome and try to infer a connection to some medically relevant conditions. For example, is the microbiome connected to our mental health? Is the microbiome a suitable predictor of our biological age? To try answering these questions I employ Machine Learning algorithms to analyze existing microbiome data sets.

Termine vergangener Semester

Mittwoch, 05. Mai 2021, 16:30 Uhr

Vortrag:  M. Sc. Sebastian Spänig (Doktorand SPP, AG Prof. Heider)
Titel:  A large-scale comparative study on peptide encodings for biomedical classification

Owing to the great variety of distinct peptide encodings, working on a biomedical classification task at hand is challenging. Researchers have to determine encodings capable to represent underlying patterns as numerical input for the subsequent machine learning. A general guideline is lacking in the literature, thus, we present here the first large-scale comprehensive study to investigate the performance of a wide range of encodings on multiple datasets from different biomedical domains. For the sake of completeness,we added additional sequence- and structure-based encodings. In particular, we collected 50 biomedical datasets and defined a fixed parameter space for 48 encoding groups, leading to a total of397,700 encoded datasets. Our results demonstrate that none of the encodings are superior for all biomedical domains. Nevertheless, some encodings often outperform others, thus reducing the initial encoding selection substantially. Our work offers researchers to objectively compare novel encodings to the state of the art. Our findings pave the way for a more sophisticated encoding optimization, e.g.,as part of automated machine learning pipelines.The work presented here is implemented as a large-scale, end-to-end workflow designed for easy reproducibility and extensibility. All standardized datasets and results are available for download to comply with FAIR standard.

Mittwoch, 19. Mai 2021, 16:30 Uhr

Vortrag:  Dr. rer. nat. Martin Grohe (Universitätsprofessor, RWTH Aachen)
Titel:  word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

Vector representations of graphs and relational data, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to these forms of structured data. A wide range of methods for generating such vector embeddings has been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively little attention from a theoretical point of view.

The first part of my talk will be devoted to embedding algorithms in practice. Starting with a brief overview of common embedding techniques, I will speak about a new embedding algorithm for dynamically changing relational data. In the second part of my talk, I will discuss theoretical ideas that have proved useful for analysing and designing vector embeddings and that may help us to develop a more principled view on the area.

Mittwoch, 02. Juni 2021, 16:30 Uhr

Vortrag:  M. Sc. Sven Heuer (Doktorand SPP, AG Prof. Dahlke)
Titel:  Analysing audio files by the example of birdsong recognition


Audio transformation is a well-researched area of interest in Mathematics and Computer Science with the short time Fourier transform as a standard tool. However, some finer aspects are still open, especially concerning the function spaces that have to be considered, depending on the given signal. In this talk, we will use the example of audio files containing birdsongs to look at some of these aspects. Our main topics will be denoising and compression, detection and classification. For the classification part, we will also look at the integration of the Gabor transform directly into a Convolutional Neural Network.
A different transformation to use would be the Wavelet transform. We will look at some of its properties and in the end give an outlook at how we might be able to integrate it into our current approaches.

Mittwoch, 09. Juni 2021, 13:00 Uhr

Im Rahmen des Oberseminars zur Numerik und Optimierung.
Vortrag:  Prof. Dr. Holger Wendland (Universität Bayreuth)
Titel:  Multiscale Approximation by Radial Basis Functions


Hier verlinkt.

Mittwoch, 16. Juni 2021, 16:30 Uhr

Vortrag:  Prof. Dr. Michael Möller (Universität Siegen)
Titel:  On the confluence of machine learning and energy minimization methods for inverse problems

Mittwoch, 30. Juni 2021, 16:30 Uhr

Vortrag:  M. Sc. Niels Grüttemeier (Doktorand SPP, AG Komusiewicz)
Titel:  The Computational Problem of Bayesian Network Structure Learning

Bayesian Network Structure Learning (BNSL) is motivated by finding conditional dependencies of random variables such that these dependencies describe multivariate probability distribution as closely as possible. In the computational problem one is given a set of vertices N and a family of so-called local scores and one aims to find an arc-set A such that (N,A) is a directed acyclic graph and the score is maximized. In this talk, I provide some intuition for this computational problem and describe its role in the process of modelling and reasoning with Bayesian networks. Furthermore, I sketch some of our results on the parameterized complexity of BNSL when additional structureal constraints are posed on the network or on its so-called moralized graph.

Mittwoch, 20. Oktober 2021, 16:30 Uhr

Vortrag: Paolo Rosso (Full Professor, NLEL Valencia)
Titel: Detecting online harmful information: fake news, conspiracy theories, and misogyny

The ease of generating content online and the anonymity that social media provide have increased the amount of harmful content that is published. Fake news, conspiracy theories, and offensive content are published and propagated on daily basis. In this keynote I will describe how fake news, and conspiracy theories, can be detected going beyond just considering textual information: emotions, psycholinguistics characteristics, and multimodal information play an important role and should be in the loop. At the end of my talk I will also mention the problem of hate speech, and concretely misogyny, and the Multimodal Automatic Misogyny Identification (MAMI) shared task that we will be organising at SemEval 2022.

Mittwoch, 03. November 2021, 16:30 Uhr

Vortrag: Ramit Sawhney (Software Developer, IIIT Delhi)
Titel: HYPMIX - Method for Interpolative Data Augmentation in Hyperbolic Space, applied on Text, Speech, and Vision

Mittwoch, 17. November 2021, 16:30 Uhr

Vortrag: M. Sc. Dorian Vogel (Doktorand SPP, AG Prof. Dahlke)
Titel: High order methods for PDEs: Quarklets, hp-adaptivity and tree structures
Abstract: tba

Mittwoch, 24. November 2021, 15:30 Uhr

Vortrag: Prof. Dr. Jochen Leidner (Universitätsprofessor, Hochschule Coburg)
Titel: Fun and Profit with Machine-Learning & Natural Language Processing

The triad of Natural Language Processing (NLP), Information Retrieval (NLP) and Machine Learning (ML) provides powerful tools and techniques to build useful and commercially valuable systems that support today's professional knowledge workers in many vertical domains, including law, police/law enforcement, journalism/news, pharmacology, finance and insurance.
In this talk, I will describe a number of practical examples of how NLP, IR and ML can help build capabilities that generate enormous economical value in the B2B business (not facing consumers) involving professionals like lawyers, judges, reporters/investigative journalists, financial analysts and traders and insurance claim investigators. As we shall see, these techniques have long arrived in the daily lives and work of many professionals.

About the speaker:
Dr. Jochen L. Leidner is the Professor of Explainable and Responsible Artificial Intelligence with Insurance Applications at Coburg University of Applied Sciences and Arts in Germany, and a Visiting Professor of Data Analytics at the University of Sheffield in the UK. His research areas include Natural Language Processing, Information Retrieval and applied Machine Learning.
After obtaining an M.A. in Computational Linguistics, English language and linguistics and computer science at FAU (Friedrich-Alexander-Universität Erlangen-Nuremberg) and an M.Phil. in Computer Speech, Text and Internet Technology, at the University of Cambridge, he obtained a Ph.D. in Informatics from the University of Edinburgh. His industry experience includes software engineering roles at SAP and Director of Research (R\&D) positions at Thomson Reuters and startups.
His research was published at conferences including ACM SIGIR, ACL, AGI, NAACL, IEEE ICDM and ECIR. He is a Fellow of the Royal Geographical Society, a member of ACL, ACM, GI, SRA, the BCS Information Retrieval Specialist Group and presently Award Chair for the Microsoft/BCS IRSG Karen Spärck Jones Award. He won scholarships and prizes from DAAD, Cambridge Prince‘s Trust, and Peterhouse, Cambridge.
In 2015, and 2016, he was twice named Thomson Reuters Inventor of the Year for the best patent application and won the Best Method Paper Award of the Social Media Society (2016) as well as grants like a Scottish Enterprise/Royal Society of Edinburgh Enterprise Fellowship in Electronic Markets (2007) and a Royal Academy of Engineering Visiting Professorship (2017-2020). He has co-authored or edited more than 70 publications in AI, computational linguistics, machine learning, information retrieval, geography and pharmacology, and he holds more than a dozen granted patents.

Mittwoch, 08. Dezember 2021, 16:30 Uhr

Vortrag: Prof. Dr. Massimo Fornasier (Universitätsprofessor, TU München)
Titel: Three Mathematical Tales of Machine Learning

I tell three mathematical tales of machine learning:
1. Identification of deep neural networks, 2. Global optimization over manifolds, 3. Mean-field optimal control of NeurODE.
Tale 1. is about the proof that, despite the NP-hardness of the problem, generic neural networks can be identified up to natural symmetries by a finite number of input-output samples scaling with the complexity of the network. Numerical validation of the result is presented. A crucial subproblem of the identification pipeline is the solution of a nonconvex optimization over the sphere.
Tale 2. is in fact about solving global optimizations over spheres by means of a multi-agent dynamics, which combines a consensus mechanism and random exploration. The proof of global solution is based on showing that the large particle limit of the SDE system is distributed as the solution of the deterministic PDE, whose large time asymptotics converges to a globalminimizer. I present numerical results in robust linear regression for computing eigenfaces.
In the Tale 3. I introduce NeurODE, which are neural networks approximable by ODE. I show that their training can be formulated as a mean-field optimal control and I present results of existence and finite particle approximation. I also show the derivation of a mean-field Pontryagin maximum principle and its well posedness. Again a numerical experiment of a simple 2D classification problem validates the theoretical results.

Mittwoch, 19.01.2022, 16:30 Uhr

Vortrag: Prof. Dr. Philipp Grohs (Universitätsprofessor, Universität Wien)
Titel: tba
Abstract: tba

Mittwoch, 16.02.2022, 17:00 Uhr

Vortrag: M. Sc. Frank Sommer (Doktorand SPP, AG Prof. Komusiewicz)
Titel: tba
Abstract: tba

Donnerstag, 02. Juni 2022, 14:15 Uhr, Raum 04C37 (SRXV C)

Vortrag: Dr. Björn Krüger (Gokhale Method Enterprise Inc., Stanford, USA)
Titel: Capturing and Analyzing Motion Data - Technologies and Applications

Capturing and analyzing motion data has made enormous progress in the last years. While optical motion capture systems have been the only option to capture motion in high spatial and temporal resolution twenty years ago, wearable devices have now been developed, capturing data for special use-cases in high accuracy. Data-driven approaches using motion-capture datasets are common to analyze readings from wearables and to gain insights. In my talk I will present some of my works in motion data, discuss the development of a wearable to measure posture, and show some of the emerging applications. I’ll conclude my talk with sneak peek to possible future developments in this field.

Mittwoch, 15. Juni 2022, 16:00 Uhr, Raum 04A23 (HS V A4)

Vortrag: Dr. Sebastian Ordyniak (University of Leeds, UK)
Titel: Novel Parameterized Algorithms for Decision Tree Learning

Decision trees (DT) have become an invaluable tool for providing interpretable models of data in various areas of computer science. Probably the most fundamental computational task in the context of DTs is to learn DTs from data. Here one usually aims at finding small trees since those are easier to interpret and require fewer decisions. Although learning a smallest DTs is known to be NP-hard, the problem’s behavior under natural restrictions on the data is widely open.  

To improve our understanding, we provide the first parameterized complexity analysis of the problem. Our starting point are hardness results which show that the problem is not fixed-parameter tractable when parameterized by solution size alone (i.e., size or depth of
the obtained DT). We then identify natural additional and necessary restrictions (modelled by parameters) to achieve fixed-parameter tractability. Our results provide a comprehensive complexity map for the considered parameters, exhibiting the significance of each parameter. The fixed-parameter tractability result is based on a new algorithmic technique that is of independent interest.

In particular, we show that learning DTs is fixed-parameter tractable parameterized by size (or depth) and an additional parameter that can be shown to behave well in practise. We complement our algorithmic results with lower bounds that allow us to arrive at an comprehensive complexity map with respect to all considered parameterizations.

Donnerstag, 30. Juni 2022; 14:15 Uhr, Raum 04C37 (SRXV C)

Vortrag: M. Sc. Allie Lahnala (Doktorandin SPP, AG Prof. Flek)
Titel: (Challenges of) Computational Empathy Understanding

Abstract: tba

Mittwoch, 13. Juli 2022, 16:00 Uhr, Raum 04A23 (HS V A4)

Vortrag: Dr. Imme Baumüller (Vaillant Group)
Titel: Culture eats Data Strategy for Breakfast – challenges in setting up a Data Unit in business enterprises

It's 2022 and hardly any company can ignore the topics of data architecture, data analytics, BI or data science. The corresponding job profiles are among the most sought-after in the market. But not only the recruitment of talent, but also the development and establishment of data units and a company-wide data strategy present companies with major challenges. In her presentation, Imme Baumüller will report her experiences setting up data units in the media industry and a large industrial company. Among other things, it will be discussed how a data unit can be structured and where it should be anchored organizationally, how to design the cooperation with the business side and why communication is the central success factor in data initiatives.