Software
Here you might find public software that has been developed by the Databionics research group for scientific purposes. Please quote the corresponding publications.
Name | Code | Licence | Link | Authors |
DataIO | R | GPL | Github | Alfred Ultsch, Florian Lerch, Michael Thrun, Catharina Lippman, Felix Pape, Onno Hansen-Goos, Sabine Herda |
DataVisualizations | R | GPL | CRAN | Michael Thrun, Felix Pape, Onno Hansen-Goos, Fredericke Matz, Alfred Ultsch |
DatabionicSwarm | R | GPL | CRAN | Michael Thrun |
ProjectionBasedClustering | R | GPL | CRAN | Michael Thrun, Florian Lerch, Felix Pape, Kristian Nybo, Jarkko Venna |
GeneralizedUmatrix | R | GPL | CRAN | Michael Thrun, Alfred Ultsch |
Umatrix |
R | GPL | Download, Manual |
Florian Lerch, Michael Thrun, Alfred Ultsch |
AdaptGauss: Gaussian Mixture Models (GMM) |
R |
GPL |
CRAN |
Michael Thrun, Onno Hansen-Goos, Rabea Griese, Catharina Lippmann, Florian Lerch, Jörn Lötsch, Alfred Ultsch |
ABCanalysis | R | GPL | CRAN, Online | Michael Thrun, Florian Lerch, Jörn Lötsch, Alfred Ultsch |
Vademecum | Java | GPL | Sourceforge Project | Torben Rühl, Steffen Springer, Burcu Dalmis, Jan Kohlhof, Dirk Schäfer |
Databionic ESOM Tools | Java | GPL | SourceForge Project | Christan Stamm, Mario Nöcker, Fabian Mörchen, u.v.a. |
Databionic MusicMiner | Java | GPL | SourceForge Project | Mario Nöcker, Christan Stamm, Fabian Mörchen, Niko Efthymiou, Michael Thies, Ingo Löhken, u.v.a. |
Time Series Knowledge Mining | Matlab | GPL | Download | Fabian Mörchen |
Pareto Density Estimation | R | GPL | CRAN | Michael Thrun, Onno Hansen-Goos, Rabea Griese, Catharina Lippmann, Jörn Lötsch, Alfred Ultsch |
Persist Time Series Discretization | Matlab | GPL | Download | Fabian Mörchen |
Audio Feature Extraction | Matlab | GPL | Auf Anfrage |
Ingo Löhken, Michael Thies, Fabian Mörchen |
DWT/DFT time series feature extraction | Matlab | GPL | Download | Fabian Mörchen |
LaTeX/PDF Reports | Matlab | GPL | Download | Fabian Mörchen |
Spin3D |
Java | GPL | Sourceforge
Project |
Pascal Lehwark |
Generalized Umatrix
Projections from a high-dimensional data space onto a two-dimensional plane are used to detect structures, such as clusters, in multivariate data. The generalized Umatrix is able to visualize errors of these two-dimensional scatter plots by using a 3D topographic map.
Ultsch, A., & Thrun, M. C.: Credible Visualizations for Planar Projections, in Cottrell, M. (Ed.), 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), IEEE Xplore, France, 2017.
Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, https://doi.org/10.1007/978-3-658-20540-9, 2018.
Databionic Swarm
Here a swarm system, called databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data such as natural clusters characterized by distance and/or density based structures in the data space. The first module is the parameter-free projection method Pswarm, which exploits the concepts of self-organization and emergence, game theory, swarm intelligence and symmetry considerations. The second module is a parameter-free high-dimensional data visualization technique, which generates projected points on a topographic map with hypsometric colors based on the generalized U-matrix. The third module is the clustering method itself with non-critical parameters. The clustering can be verified by the visualization and vice versa.
Projection Based Clustering
Various visualizations of high-dimensional data such as heat map and silhouette plot for grouped data, visualizations of the distribution of distances, the scatter-density plot for two variables, the Shepard density plot and many more are presented here. Additionally, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through the combination of four methods.
DataVisualizations
Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.
Thrun, M. C., & Ultsch, A.: Effects of the payout system of income taxes to municipalities in Germany, 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, Vol. accepted, Foundation of the Cracow University of Economics, Zakopane, Poland, 2018.
Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, (Ultsch, A. & Huellermeier, E. Eds., 10.1007/978-3-658-20540-9), Doctoral dissertation, Heidelberg, Springer, ISBN: 978-3658205393, 2018.
Umatrix
By gaining the property of emergence through self-organization, the enhancement of self organizing maps is called emergent SOM (ESOM). The result of the projection by ESOM is a three dimensional landscape in form of an Ustar matrix, which is the combination of a U-Matrix and a P-Matrix. The Ustar matrix displays a representation of distance and density structures of the input data. And automatic and/or interactive rectangular island generation as well as supervised clustering is possible. Currently, we offer a stable bethaversion as a preview for this Webpage. The following packages have to be installed/Imports: Rcpp, ggplot2, shiny, ABCanalysis, shinyjs, reshape2, fields, plyr, abind, tcltk, png, tools, grid, rgl
Thrun, M. C., Lerch, F., Lötsch, J., & Ultsch, A.: Visualization and 3D Printing of Multivariate Data of Biomarkers, Proc. of International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, 2016.
AdaptGauss
Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf. PDE has been designed in particular to identify groups/classes in a dataset. Precise limits for the classes can be calculated using the theorem of Bayes. Verification of the model is possible by QQ plot and Chi-squared test.
Ultsch, A., Thrun, M.C., Hansen-Goos, O., Lötsch, J.: Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox(AdaptGauss), International Journal of Molecular Sciences, doi:10.3390/ijms161025897, 2015.
Thrun M.C.,Ultsch, A., Models of Income Distributions for Knowledge Discovery, European Conference on Data Analysis, DOI 10.13140/RG.2.1.4463.0244, Colchester 2015.
Computed ABC Analysis
For a given data set, the package provides a novel method of computing precise limits to acquire subsets which are easily interpreted. Closely related to the Lorenz curve, the ABC curve visualizes the data by graphically representing the cumulative distribution function. Based on an ABC analysis the algorithm calculates, with the help of the ABC curve, the optimal limits by exploiting the mathematical properties pertaining to distribution of analyzed items. The data containing positive values is divided into three disjoint subsets A, B and C, with subset A comprising very profitable values, i.e. largest data values ("the important few") subset B comprising values where the profit equals to the effort required to obtain it, and the subset C comprising of non-profitable values, i.e., the smallest data sets ("the trivial many").
Ultsch, A., Lötsch, J.: Computed ABC analysis for rational selection of most informative variables in multivariate data, PLoS One, 2015.
Databionic ESOM Tools
Im Rahmen einer Projektgruppe wurden bei uns die Databionics ESOM Tools, ein Softwarepaket für Training, Visualisierung und interaktiver Analyse von Emergenten Selbst-Organisierenden Merkmalskarten, entwickelt. Die Software steht unter der GPL zur Verfügung. Für alle weiteren Informationen besuchen Sie bitte das SourceForge Project. | |
Ultsch, A., Mörchen, F.: ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM, Technical Report No. 46, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005) |
Databionic MusicMiner
Im Rahmen einer Projektgruppe wurde bei uns der Databionic MusicMiner entwickelt. Es handelt sich um ein Programm das die Ähnlichkeit von Musikstücken aus dem Klang berechnet und basierend darauf eine Musiksammlung als Landkarte darstellt. Die Software steht unter der GPL zur Verfügung. Für alle weiteren Informationen besuchen Sie bitte das SourceForge Project. | |
Mörchen, F., Ultsch, A., Thies, M., Löhken, I., Nöcker, M., Stamm, C., Efthymiou, N., Kümmerer, M.: MusicMiner: Visualizing timbre distances of music as topographical maps, Technical Report No. 47, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005) |
Time Series Knowledge Mining
Time Series Knowledge Mining (TSKM) ist eine Methodik für die Suche nach verständlichen Mustern in multivariaten Zeitreihen. Wir stellen (in Kürze) eine Implementierung für Matlab unter der GPL zur Verfügung. | Download | |
Mörchen, F.: Time Series Knowledge Mining, Phd thesis, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2006) |
Pareto Density Estimation
Pareto Density Estimation (PDE) is an information optimum estimation of the empirical probability density. PDE has been designed in particular to identify groups/classes in a dataset. It is now part of the R package AdaptGauss. | |
Ultsch, A.: Pareto density estimation: A density estimation for knowledge discover, in Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005. |
Persist Time Series Discretization
Der Persist Algorithmus ermöglicht eine Diskretisierung von Zeitreihen in Zustände optimaler Dauer. Im Gegensatz zu herkömmlichen statischen Histogram Methoden wird die zeitliche Abfolge der Werte zur Optimierung der Bins verwendet. Wir stellen eine Implementierung für Matlab unter der GPL zur Verfügung: Download. | |
Mörchen, F., Ultsch, A.: Optimizing Time Series Discretization for Knowledge Discovery, Grossman, R.L., Bayardo, R., Bennet, K., Vaidya, J. (Eds), In Proceedings The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, (2005), pp. 660-665 |
Audio Feature Extraction
Die Analyse von Musikdaten erfolgt häufig auf Klangmerkmalen die
auf kurzen Zeitfenstern berechnet werden. Ein bekanntes Beispiel sind
die Mel Frequency Cepstral Coefficients (MFCC). Im Rahmen einer
Projektgruppe wurden bei eine flexible Software zur Berechnung von
sehr vielen solcher Klangmerkmalen erstellt. Wir stellen (in Kürze)
eine Implementierung für Matlab
unter der GPL zur Verfügung. |
|
Mörchen, F., Ultsch, A., Thies, M., Löhken, I.: Modelling timbre distance with temporal statistics from polyphonic music, IEEE Transactions on Speech and Audio Processing 14(1)IEEE, pp, 81-90, 2006. |
DWT/DFT time series feature extraction
Die best Auswahl von Koeffizienten aus der Diskreten Wavelet Transformation (DWT) oder der Diskreten Fourier Transformation (DFT) von Zeitreihen in Sinne der Energieerhaltung ist absteigend nach Größe des Betrags. Bei einer Menge von Zeitreihen wie sie z.B. zum Clustern oder Klassifizieren vorliegen führt dies zu schlecht vergleichbaren Representationen, da pro Zeitreihe unterschiedliche Koeffizienten ausgewählt werden können. Wir haben daher eine globale Auswahlstrategie vorgeschlagen, die eine vergleichbare Darstellung mit guter Energieerhaltung verbindet. Wir stellen eine Implementierung für Matlab unter der GPL zur Verfügung: Download. | |
Mörchen, F.: Time series feature extraction for data mining using DWT and DFT, Technical Report No. 33, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2003) |
LaTeX/PDF Reports
Diese kleine Toolbox ermöglicht die Erstellung von PDF Reports mit Matlab Funktionen. Durch Anhängen von Ergebnissen in Form von Tabellen und Bildern entsteht so automatisch eine Dokumentation die später komfortabel analysiert werden kann. Als zusätzliche Software wird LaTeX und Ghostscript benötigt: Download. |