Main Content

Deep Legion

Detection of virulence factor protein domains in Legionella using deep autoencoders

Legionella pneumophila (L. pneumophila) is a Gram-negative, non-encapsulated, aerobic bacillus with a single, polar flagellum. It is an important intracellular pathogen causing Legionnaires' disease, also known as legionellosis, a specific form of pneumonia in humans. L. pneumophila is highly adapted to intracellular replication and many protist hosts, such as amoebae, in aquatic environments. It manipulates vital host cell functions,like vesicle trafficking and gene expression, by using specific virulence factors.
Legionella expresses over 300 of these virulence factors that can be injected into the host cell cytosol via a Type-IV secretion system. Many of them contain eukaryote-like protein motifs acquired during co-evolution with their hosts inserted in bacterial factors, e. g., containing Type-IV-secretion-motifs, and optimized for bacterial expression. Overall, Legionella genomes contain the highest amount of these eukaryote-like protein motifs.
Many of these factors are essential virulence factors and, therefore, important for understanding disease pathophysiology and potential therapeutic targets. Thus, it is of utmost importance to accurately annotate these sequences to improve the treatment for patients.
Thus, this project aims to identify these factors computationally based on our current data, computationally predict its origin and function, and validate these predictions in vitro. Since there are many limitations with existing tools, we aim to develop a deep learning-based framework that can be used to identify such virulence factors in almost real-time with high accuracy. Our deep learning-based annotation pipeline will pave the way to new applications for precision medicine in infectious diseases.
We will implement this new deep learning-based annotation pipeline as an ultrafast bioinformatics command-line software tool throughout this project. Furthermore, we will complement this tool by database creation features for the compilation of customized databases. By doing so, we will provide the bioinformatics community with a powerful tool for the rapid annotation of DNA sequences, protein motifs, and domains. Additionally, we will containerize and provide this tool within highly scalable cloud computing infrastructures.

Project Partner:

Prof. Dr. Dominik Heider (Coordinator)
Philipps-University of Marburg
Department of Mathematics & Computer Science

Prof. Dr. Alexander Goesmann
Justus-Liebig-University Giessen
Bioinformatics and Systems Biology

Funded within the BMBF initiative on Computational Life Sciences 2021