Direkt zum Inhalt
 
 
Bannergrafik (AG-Freisleben)
 
  Startseite  
 
null

Tools for Intelligent System Management of Very Large Computing Systems


Project purpose


TIMaCS deals with the challenges in the administration of very large computing systems containing resources with the cumulative performance of several petaflops.

The project is aimed at reducing the complexity of the manual administration of computing systems by realizing a framework for intelligent system management of very large computing systems based on the technologies of virtualization, knowledge-based analysis and validation of collected information, definition of metrics and policies. This framework will be able to automatically start predefined actions in addition to the notification of an administrator. Beyond that, data analysis based on previous monitoring data, regression tests and intense regular check is targeted towards preventive actions prior to failures. The framework to be realized will include open interfaces to be easily connectable to relevant existing systems like accounting oder user management systems (user policies, priority, ...). The project partners seek to develop a framework ready for production and validation at the High Performance Computing Center Stuttgart (HLRS), the Center for Information Services and High Performance Computing (ZIH) in Dresden and the Computing Center at the Philipps-Universität Marburg. Additional project partners are: NEC European High Performance Computing Technology Center and Science + Computing AG.

Objectives

  1. Design and implementation of a robust and highly scalable monitoring solution for very large computing systems based on existing tools and supplementary implementations ready for production.
  2. Design and implementation of a system for partitioning and dynamically assigning users of very large computing systems based on virtualization technologies. Easy setup or removal of single compute nodes from a heterogeneous or hybrid system will be included.
  3. Development of a management framework that supports different automatization and escalation strategies based on policies: notification of an administrator, semi-automatic to fully-automatic counteractions, forecasts, anomaly detection and their validation under production conditions.
  4. Development of tools for detecting and automatically handling errors as well as for performing preventive actions.
  5. Providing sustainability by defining standard conforming interfaces and an integrated framework targeting at the combination of not yet synchronized developments of tools for monitoring and management, cluster virtualization, policy based management and knowledge based data analysis.


Project's website: http://www.timacs.de

Funding: Bundesministerium für Bildung und Forschung, High-Performance Computing Program

Contact: Matthias Schmidt, Roland Schwarzkopf

Zuletzt aktualisiert: 04.08.2011 · fallenbn

 
 
 
Fb. 12 - Mathematik und Informatik

Verteilte Systeme (AG Freisleben), Hans-Meerwein-Straße, D-35032 Marburg
Tel. 06421/28-21567, Fax 06421/28-21573, E-Mail: freisleb@informatik.uni-marburg.de

URL dieser Seite: http://www.uni-marburg.de/fb12/verteilte_systeme/forschung/timacs

Impressum