Column-Oriented Databases vs RDBMS and Oracle
This is the second part of my article about Column-Store databases. In the first part Column-Oriented Databases – Old Idea, New wave I was focusing on topics like performance and functionality of Column-Oriented Databases and their comparison to RDBMS, specifically to Oracle database. This time I will continue the comparison of two database camps – Column-Stores vs Row-Stores – in areas of compression, partitioning. I’ll mention also the usage of Column-oriented storage benefits in Oracle products, like for example a new Oracle 12c database In-Memory Option.
Compression of columns vs rows
One of the potential advantages of column-oriented storage is the possibility of good compression. It is important to understand why compressing the data can be advantageous. It is not primarily the pure cost of having enough disk space to cover the physical size of the data that matters – disks are relatively cheap and are getting larger and cheaper at a steady rate. Rather, the potential benefit is when data has to be retrieved from disk as part of processing queries. Good I/O bandwidth is not cheap and techniques, such as compression, that reduce the size of the data that is retrieved from storage can be very advantageous, although there is usually some CPU-cost associated with compressing and uncompressing the data.
Oracle for example provides several major mechanisms for utilizing compression to benefit query processing. One is the row-level objects compression feature; another is Exadata Hybrid Columnar Compression – HCC (see below).
Partitioning VERTICAL vs HORIZONTAL
Column-oriented storage is a form of vertical partitioning of the data. One of the disadvantages of this type of partitioning is the cost of joining different partitions to reconstruct a row. This cost usually never highlighted in column database benchmarks. Since, conveniently, the “projections” were created for the queries in a way that obviated the need to join them. In reality, the impact of vertical partitioning on operations like joins, updates, and date loads makes it hard to use since the possible benefits are very workload dependent and it can make some operations in a workload run much slower even if there are others that run faster.
There is another form of partitioning that exists in RDBMS and that lacks the kind of drawbacks associated with vertical partitioning, namely horizontal partitioning. Horizontal partitioning has tremendous benefits for performance, availability, and manageability. Benefits include:
– Partition pruning
– Partition-wise joins
– High availability
– ETL processing
Implementation of horizontal partitioning provides users with a high degree of flexibility due to different options for partitioning criteria.
RDBMS are not pure Row-Stores
I’m certain if columnar data stores are the new wave of the future that become a proven technology, you can be sure Oracle will have it. In fact best columnar stores advantages are already utilized in the RDBMS market leader. Some features examples are below:
Oracle user Hybrid Columnar Compression (HCC) in own Exadata machines. HCC is a compression method for organizing how data is stored on Exadata machines since 11gR2 version. Instead of storing the data in traditional rows, the data is grouped, ordered, stored and compressed one column at a time.
Oracle In-Memory Database
Oracle In-Memory Database is a Row-Store but it uses in-memory columnar compression. It provides the ability to compress database objects at the column level, thus storing the data more efficiently. This mechanism provides space reduction for tables by eliminating the redundant storage of duplicate values within columns and improves the performance of SQL queries that perform full table scans.
Oracle 12c database In-Memory Option
Oracle 12c database In-Memory Option will be released with next PatchSet in June 2014 and will use in-memory cache with columnar structure.
As a conclusion, proponents of column-oriented databases have recently been making claims that would make it seem like this storage organization is far superior to row-oriented storage. In support of such claims, magic benchmarks have been published that, unlike industry standard benchmarks like TPC-H, avoids joins and only measures how fast a materialized view can be scanned. Such a benchmark tells very little about the pros and cons of column-oriented storage in real-world data warehouse environments. In reality, column-based storage and vertical partitioning are old concepts that have both advantages and disadvantages. In the complex world of data warehousing, there may well be scenarios where such storage organizations have some benefits. However, for real-world environments, it is likely that both the scenarios and benefits are limited compared to long-established Oracle features like materialized views, bitmap indexing, horizontal partitioning, built-in OLAP and other valuable features required for production database solutions.