eXtensible and fleXible Library
(Java-library for advanced query processing)
Java Event Processing Connectivity
(Java-middleware for uniform event processing)
(Visualization of data structures and algorithms)
- DataGenerator of Norbert Beckmann
R*-Tree-Implementations (e.g. Revised R*-tree, R*-tree,
Hilbert R-tree, …) of Norbert Beckmann
This page serves as a source for datasets which we have used in our experiments.
- Spatial Data
No. Area Description number of MBRs zipped size in MB coverage source used for experiments in M1 L.A. streets 131,461 1.35 0.03 Tiger [BKS 93], [BSS 00], [DS 00] M2 L.A. rivers and railways 128,971 1.99 0.22 Tiger [BKS 93], [BSS 00], [DS 00] M3 california streets 1,888,012 16.93 0.12 Tiger [BSS 00], [DS 00] M4 california railways 625,640 0.33 0.21 Tiger
M5 california borders 234,251 2.82
M6 california hydrography 360,330 4.12
- The temporal data we used in our experiments with
the multi-version B-tree is
here (small size 100'000).
- Datasets for MVBT experiments
Description format description d50 ASCII file contains sequence of 10'000'000 triples (blank separated). Triples have following format: Operation type, long key, integer paylod. 1 decodes insert, 2 decodes update and 3 decodes delete; The first 1'000'000 operations are insertions (10% of the data set). The remaining 90% of the file represent a mix of insertions, deletions and updates. The portion of the specific operation is decoded in the file name. For example the file d50 consists of 1'000'000 insert operations followed by a mix of insertions ($4'500'000$) and deletions (4'500'000). The file u75 consists of 1'000'000 insert operations followed by a mix of insertions (2'250'000) and updates 6'750'000. Note default payload value is 0. Please, replace it for your purpose. In our experiments we replace it with 16 bytes paylod. Version numbers generated while reading the file line by line. u0
- Results of Spatial Joins
Description number of results format description zipped size in MB M2 & M1 85,854 M2.ID M1.ID (the MBRs of M2 are numerated starting with 10,000,000, the MBRs of M1 with 0) 0.37 M3 & M3 9,784,072
- Results of k-nearest neighbors
Description format description size in MB 20-nearest neighbors for each element in M2 M2.ID M1.ID k 'euclidian distance' (the center of the MBRs was used for the computation, the MBRs of M2 are numerated starting with 10,000,000, the MBRs of M1 with 0) 33.79
- Datasets for Sort-based Parallel R-tree loading ([ASSS
Description format description size in MB USA-data Contains the minimum bounding rectangles of all streets from TIGER files, containing 72 Million rectangles. The file is in hadoop sequence file format with datasets < NullWritable, DoublePointRectangle>.
For convenience, we provide a plain data set consisting of rectangles with the following format: <xlow, ylow, xhigh, yhigh>, each coordinate occupying 8 bytes in double floating point format. The file can be obtained from here.
3338 E-USA-data Extended USA dataset, composed of four copies of USA-data by translating the original data set with the following vectors: (0.0, 0.0), (75.5, -33.9), (0.0, -33.9), (75.5, -3.9).
A plain file can be obtained from here (see USA-data for the file formats).
13414 qr1 Query point data set obtained by considering every 100-th middle point of the rectangles from USA-data, consisting of 722,261 points. The files are in plain format, sequential point data with <x,y> coordinates, each coordinate occupying 8 bytes in double floating point format. 22 qr2 Query rectangle data with quadratic rectangles where each rectangle returns 100 results on average, consisting of 722,226 rectangles. The file format ist the same as for qr1. 2.2 qr3 Query rectangle data with quadratic rectangles where each rectangle returns 1000 results on average, consisting of 22,856 rectangles. The file format ist the same as for qr1. 0.7