This is the second part of my article about Column-Store databases. In the first part Column-Oriented Databases – Old Idea, New wave I was focusing on topics like performance and functionality of Column-Oriented Databases and their comparison to RDBMS, specifically to Oracle database. This time I will continue the comparison of two database camps – Column-Stores vs Row-Stores – in areas of compression, partitioning. I’ll mention also the usage of Column-oriented storage benefits in Oracle products, like for example a new Oracle 12c database In-Memory Option.
Compression of columns vs rows
One of the potential advantages of column-oriented storage is the possibility of good compression. It is important to understand why compressing the data can be advantageous. It is not primarily the pure cost of having enough disk space to cover the physical size of the data that matters – disks are relatively cheap and are getting larger and cheaper at a steady rate. Rather, the potential benefit is when data has to be retrieved from disk as part of processing queries. Good I/O bandwidth is not cheap and techniques, such as compression, that reduce the size of the data that is retrieved from storage can be very advantageous, although there is usually some CPU-cost associated with compressing and uncompressing the data.
Oracle for example provides several major mechanisms for utilizing compression to benefit query processing. One is the row-level objects compression feature; another is Exadata Hybrid Columnar Compression – HCC (see below).
Partitioning VERTICAL vs HORIZONTAL
Column-oriented storage is a form of vertical partitioning of the data. One of the disadvantages of this type of partitioning is Read more »
During the last few years I went through several POC of different Column-Store databases reviewing their functionality, performance and use cases. Usually at the beginning of every exercise I saw the impressive vendor promises of reach functionality, great performance and scalability. Some even said: this is a new trend in database world, even a standard! You do not need RDBMS anymore!
In this type of cases I usually act as conservative database architect. And you know what – that always helped eliminating additional companies’ efforts and frustrations in implementing specialized database solutions. This time I share some experiences in evaluating Column-Store databases. But let start with basics first.
While most commercial RDBMS products store data in some form of row format, some database vendors provide column-oriented storage of data. The supposed advantages of storing the data by column rather than by row include a better ability to compress the data, something that would reduce the need for disk-I/O. The idea of column-based storage is not new and has been used in commercial products from former Sybase and Sand Technology for well over a decade. In reality, each storage format has its own set of advantages and disadvantages and there is no free lunch – only tradeoffs.
The tradeoffs associated with column-based storage include the cost of tracking and eventual reconstruction of the rows to which the column values belong as well as additional complexity for ETL and OLTP processing. While recognizing that each storage format has its pros and cons and that there are scenarios where a column-based format has some merit, it is worth examining whether the column-based format lives up to its recent hype.
Beware of disingenuous benchmark numbers
Yes folks – PERFORMANCE is the main sales factor of the columnar databases!
There are claims that Column-Stores outperform a commercial row-store RDBMS by large factors. I just want to warn you to not rely blindly on magic performance benchmarks the vendors have done, in house themselves. Usually these performance test cases are not similar to the real production database loads, created often for read-only data using database engines that lacks RDBMS features and functionality that would be required in a production system.
A second observation is that the often benchmarks against Column-Stores do not test joins. Read more »