A review of The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases

Posted in reviews by Christopher R. Wirz on Wed Feb 28 2018



As memory becomes less expensive (specifically if a system has many nodes), many databases can fit into RAM. In this case, the index structure performance will drive the performance of the entire database.

Note: This review is in the CS265 format and may reflect my personal opinion.

1. What is the problem?

The authors have developed adaptive radix tree (ART) to efficiently index main-memory. This structure supports insertions and deletions and is space-efficient. It is also sorted such that it supports range scans.

2. Why is it important?

Simplifying and speeding up key comparisons leads to increased performance of the database system. However, increasing speed may increase the size of the data structure. Limiting the size of the tree can be accomplished through the selection of internal data structures. Using adaptive nodes, space can be saved by the indexing data structure.

3. Why is it hard?

Modern CPUs can perform multiple comparisons with a single SIMD instruction, but with a tree, the depth of evaluation is very limited. An alternative is GPUs, which allows for more throughput, but does not have large enough memory capacity and loses communication with the main memory. Most data structure require rebuilding to be updated.

4. Why existing solutions do not work?

Previous in-memory data structures are not efficient on modern hardware, because they take into effect CPU caches (example balanced binary search trees (O(log n) access time)). Faster queries, such as hast tables (O(1) access time), only support point queries (range scans are poorly supported). Also, hash tables are expensive to grow or reorganize (O(n) complexity).

Most databases systems avoid disk access as much as possible (except persistence), but the data structures that previously supported this are not efficient on modern hardware (which is faster). Typically cache-friendly access patterns (like B+-trees) are expensive when performing updates. Also, when performing a scan, the results of comparisons cannot be predicted easily and additional latencies are added after every second comparison.

5. What is the core intuition for the solution?

Because radix trees have ordered keys, ART can adapt the representation of every node by adapting each inner node locally. The global space and access efficiency are then updated as well by these changes. Also, long keys can be collapsed to decrease tree height - as tree height reduces space efficiency. One advantage of Radix trees is that they require no rebalancing in that the same tree will result regardless of data insertion order. Finally, the path to the key is the key with the Radix tree.

6. Does the paper prove its claims?

The authors show that the maximum consumption is 52 bytes per key, with an experimental average of 8.1 bytes per key. ART is compared to other state of the air memory index structure - as well as other search trees.

7. What is the setup of analysis/experiments? is it sufficient?

ART is incorporated into HyPer and ran against the TPC-C benchmark. This is a very real-life series of transactions and data - so it is a very valid test.

Evaluations were performed on a system with an Intel Core i7 3930K CPU (6 cores, 12 threads) 3.2 GHz with 32GB ram. This system represents a commodity configuration - or a reasonable cloud virtual machine - which is a typical use case.

The authors evaluated multiple key structures including a B+-tree, kary, FAST, Generalized Prefix Tree [GPT], red-black tree [RB], and chained hash table [HT].

8. Are there any gaps in the logic/proof?

The authors analyze various tree structures, but discuss the size of the tree in terms of number of nodes, not the size of each node times the number of nodes. For example, the node size of a binary search tree would take up less space than a radix node. Also, it isn't explicitly stated if the radix key is stored on the node - or just that the key is the path to the node.

9. Describe at least one possible next step.

The next step would be to persist the data onto disk such that the total size of the data set exceeds the size of the memory. Depending on the structure of the radix node, the authors will need to strategize a format that supports continuous reading.

BibTex Citation

@inproceedings{leis2013adaptive,
  title={The adaptive radix tree: ARTful indexing for main-memory databases},
  author={Leis, Viktor and Kemper, Alfons and Neumann, Thomas},
  booktitle={Data Engineering (ICDE), 2013 IEEE 29th International Conference on},
  pages={38--49},
  year={2013},
  organization={IEEE}
}
Looking for a job?