A review of The TileDB Array Data Storage Manager

Posted in reviews by Christopher R. Wirz on Fri Mar 02 2018

TileDB is a storage manager for multi-dimensional arrays - suitable for scientific applications.

Note: This review is in the format of the CS265 Discussion format and may reflect my personal opinion.

1. What is the problem?

When data is multi-dimensional, there are options to store it in columns of rows, rows of columns, columns of columns, or rows of rows. Multi-Dimensional data is often sparse such that the empty cells can end up being wasted space. While solutions exist to mitigate this, the DBMS must still perform with reasonable read rates.

2. Why is it important?

Most modern programming languages offer arbitrary-length multi-dimensional arrays as a data structure - but not all DBMS solutions allow for persisting these objects, either by dimensionality or base type. Array storage is important to the scientific community such as astronomy, genomics, and other applications where data is large and sparse.

3. Why is it hard?

Arrays can have arbitrary dimensionality. This means that the schema is arbitrary. The values are written linearly in a global cell order, but this array must be co-located on disk or memory to minimize seeks and misses - thus increasing read times. Sparse and dense blocks make a common compression scheme ineffective.

4. Why existing solutions do not work?

TileDB is optimized for both sparse and dense arrays. Array storage managers like HDF5 groups elements into chunks - which are stored in large binary format files. HDF5 stores dense regions of sparse arrays as separate dense arrays - which takes time to identify. Also, HDF5 performs in-place writes - which does not work well on randomly placed small blocks. Finally, HDF5 cannot perform concurrent writes on compressed data.

SciDB is another popular array database, but is limited in storage management. Not designed for sparse arrays, it uses chunks (like HDF5) that must be updated all at once - even when a single element changes. SciDB has tried to overcome this by grouping under-utilized chunks into bigger chunks, but the original chunks are still the unit of processing.

5. What is the core intuition for the solution?

TileDB organized array elements into ordered collections called fragments (which contain tiles). The organization leads to sequential writes - as well as efficient reads. Dense fragments group their elements into regular chunks, but sparse fragments explicitly store values at the indices in order - also with a fixed capacity. Fragments improve the write performance of batches - which operate on a fragment at a time in a predictable order. The authors also add a consolidation algorithm to compacting fragments into one. TileDB stores the minimum bounding rectangle (MBR) and the bounding coordinates (first and last coordinates) of the data tile.

6. Does the paper prove its claims?

The authors show that TileDB has faster reads and writes than SciDB and Vertica (faster reads and writes) and HDF5 (faster writes). The authors provide good experimental controls, including flushing all caches before each experiment. To be more comparable, TileDB should have its own cache - not the OS cache, but this was a design decision.

7. What is the setup of analysis/experiments? is it sufficient?

The TileDB API copies that of HDF5, so for experimentation it was just dropped in place. This was an excellent control and approach to TileDB's design. HDF5 was compared for dense arrays and parallel operations, but SciDB and Vertica were compared for both dense and sparse arrays. The system used was a 2.3GHz 36-core CPU with 128GB of RAM with both SSD and HDD storage. This is about what one would expect for a scientific computing workstation.

8. Are there any gaps in the logic/proof?

The datasets described only use 2D synthetic data with integers only. A more valid experiment would be higher dimensionality "real-world" data. Also, TileDB should be run in modes that can be more correlated to its competitors.

Most DBAs run an OPTIMIZE TABLE command periodically to cleanup OLAP databases. Generally speaking, research data would most likely reside in an OLAP system. While they allow for DBA-instigated consolidation, it is not discussed if any file locking occurs and how the freed space is addressed.

Since most of the data discussed by the authors is inherently wide, it doesn't make much sense to have a column-major cell order. While column-major examples are not discussed, the it seems that this is still a storage option.

9. Describe at least one possible next step.

The authors discuss scalability with large datasets, but not datasets that are larger than the storage of a single node. Currently, they only consider instances within the node to allow for parallel processing. A clustered solution would be a good next step.

The authors should consider types other than integers. Eventually, the authors should attempt to store object types - as many languages now allow for collections of collections of objects.

BibTex Citation

@article{papadopoulos2016tiledb,
  title={The TileDB array data storage manager},
  author={Papadopoulos, Stavros and Datta, Kushal and Madden, Samuel and Mattson, Timothy},
  journal={Proceedings of the VLDB Endowment},
  volume={10},
  number={4},
  pages={349--360},
  year={2016},
  publisher={VLDB Endowment}
}