Main Content

Areas

The list of our research interests covers different areas and goes beyond what we can show on this website. We love data, challenging tasks and large-scale computer systems. We are also very interested in collaborative, cross-domain research projects and the development of open-source software. A few more concrete areas that we currently pursue are listed below.

Data Management
Studies have shown that data scientist spend more time on data management than on actual data analysis. This shows that data engineering is a complex, yet insufficiently well supported activity. For this reason, we conduct research on novel data engineering approaches that offer the required intelligence and performance to fulfill practical needs. Our efforts in this area focus on data retrieval, data preparation, data cleaning, data transformation, data integration, and data linkage.

Data Profiling
Without proper metadata, datasets become inaccessible for many applications ranging from basic information retrieval over data integration tasks to machine learning scenarios. In our data profiling research, we develop automatic, highly efficient metadata discovery algorithms. These algorithms indicate i.a. uniqueness, functional dependence, order constraints, entity matches and inter-table links. We also investigate the application of the discovered metadata in various practical use cases.

Time Series Analytics
Time series datasets are particularly challenging for data engineering and analytics due to their size, recording speed and complex nature. They are also a dominant form of data in statistics, econometrics, sciences, and engineering. To extent the toolbox of time series analytics techniques with more efficient and more effective approaches, we devise novel systems for time series forecasting, anomaly detection, pattern recognition, and classification.

Scalable Computing
Software systems up until the turn of the century became constantly faster simply because the hardware they were running on increased its clock speed with every new generation. But this free lunch is over so that novel and existing algorithms need to embrace parallelism and distribution to make use of modern hardware. For many algorithms and, in particular, the more difficult algorithms in data engineering and data science, this paradigm shift is very challenging to realize. We therefore develop novel distributed algorithms and systems for complex, data-centric tasks. Our research objectives, thereby, cover not only efficiency aspects but also robustness, elasticity and energy consumption.

Data Analytics
Statistics, data mining and machine learning are well understood and widely adopted data analytics activities. In this group, we study the applicability of such techniques in practical real-world use cases with special attention to performance and runtime aspects.