What NoSQL database to choose
Big Data application requirements are usually very extensive and might not fit into one database technology. The list of these requirements can be huge and specific. Some general ones are below:
• Handling huge amounts of unstructured data
• Combination from search and analytical type of queries
• Denormalized data patterns
• Support for high volume writes and read volumes that are not always predictable
• High availability
• No downtime deployments
Perhaps that is the reason why NoSQL solutions are very much application and situation specific based on every product in corresponding category functionality and features. Usually no one NoSQL database matches all the application requirements like usual General-Purpose RDBMS systems. It depends a lot on the application architecture and what NoSQL database fits better you have to decide.
The NoSQL databases have specific characteristics to showcase and they perform best if used for that purpose. So you cannot use a Key-Value store when you need a Graph or Document database for example, while Relational database systems (RDBMS) are all quite compatible.
Then what NoSQL database to choose? -> There is no clear answer to this. Again, it’s very dependent to the application architecture and requirements. However, if it’s your task to make a decision I recommend the following simple approach:
1) Gather the app requirements careful in terms of:
– required data structures (do NOT use noSQL for structured data),
– data model type and its agility,
– type of data operations,
– type of app workload,
2) Try/evaluate RDBMS first for your use case. No money – go for mySQL. If it does not work, go further to step 3.
3) Based on your requirements, choose the proper noSQL category (read my previous article on that: NoSQL database types comparison in examples)
4) Choose a few candidates from that category
5) Evaluate carefully DBs feature sets, applicable use cases and match them again to your requirements
6) Contact vendors, read and test yourself.
7) Do not blindly trust noSQL vendor presentations (It’s just sales)
8) Find similar use cases /references from somebody else
9) Do not use a particular noSQL engine only due to the fact its bundled in any development framework
To help you with your choice, I share a few my notes on the most popular NoSQL databases below.
Mongo DB is certainly one of the leaders of NoSQL Document databases. You can have it in Community Edition or getting commercial support using Enterprise Edition.
Nowadays MongoDB is a popular part of development frameworks, like node.js and others. It’s used like data store to store the things, even structured things (which is wrong). It’s simple to start and use it in development, but this is exactly the catch. Do not think, it’s born for all your apps data and use cases.
With all its popularity, MongoDB has some drawbacks in areas of instability and poor write performance on big data volumes as well as lack of robust high availability solution. Below are some pros and cons:
• Rapid software development due to document oriented and schema-less design
• Reach API support for apps development
• Comprehensive set of features (support for secondary indexes, etc.)
• Flexible Querying and Aggregation Framework
• Both free Open Source and enterprise editions are available
• No ACID transactions (no rollbacks, possibility of inconsistent reads)
• Multitasking can be a problem due locking, no schema design and eventual inconsistency
• Poor write performance on big data volumes that do not fit into memory
• Scalability and availability can be limited compared to Casandra
See more about noSQL MongoDB
Oracle NoSQL database
NoSQL Database is a key-value store that is built using Oracle Berkeley DB with large amount of infrastructure features on top of it. If you are an Oracle customer it’s worth to looking into it mainly due to the tight integration to Oracle Server and Hadoop. You can get even the Oracle Big Data Appliance bundled with everything you need for high performance Big Data manipulations. See some advantages of Oracle NoSQL database below:
• Simple data model using key-value pairs with major and sub-keys
• Efficient programming model with ACID transactions and JSON support
• Good integration with Oracle Database and Hadoop
• Data distribution with support for multiple data centers
• High availability option including remote failover and synchronization
• Scalable throughput and possibility of adding capacity dynamically
Some features are below:
See more about Oracle NoSQL database
CouchBase is a commercial product that was originally based on a document store of NOSQL CouchDB that includes some extras on the top + vendor support.
• Couchbase is open-source including In-Memory option
• Documents are stored as key values in a binary form
• ANSI98 SQL support based on JSON docs (not on a table level)
• Read Consistency only for a single document
• Data is compressed on storage
• Aggregations based on index service
• Data is loaded in memory fully or partly
See more about NoSQL CouchBase database
Casandra would be a good candidate to evaluate from the column stores family. It has Community Edition + Casandra Enterprise offered from DataStax.
Below are some pros and cons:
• Highly distributed and scalable database
• Supports heavy write operations and reasonable read operations (should be faster than MongoDB especially at writing large data volumes)
• Strong and friendly Partitioning implementation
• Close integration with Hadoop (file system implementation identical to an HDFS file system that allows Hadoop to read directly off Cassandra key-spaces rather than copying data around)
• Includes reliable remote data center replication
• Enterprise support for DSE packaged offering via DataStax
• Features set is less comprehensive compared to MongoDB (but Cassandra is quickly catching up, important is to find those required for your application)
• Maintainability is rather complex, especially in the cluster environments
See more about noSQL Casandra database
That was a short review and some recommendation on how to choose the best noSQL database for your use case. I will update this post with some more tips later. In the mean time you can read my another post: NoSQL database types comparison in examples