In the cloud computing and Big Data world, we often use 3Vs (Volume, Velocity, Variety) to measure a data technology’s effectiveness. Traditional relational databases often fail at the fast scaling with large amount of unstructured data in either storage, processing or query performance, when data volumes are no longer measured in megabytes(MB) or gigabytes(GB), but in terabytes(TB) or petabytes(PB). For example, the challenges to analyze large influx of social media data or real-time streaming data. Under such scenarios, various NoSQL databases coupled with cloud-enabled processing technologies (e.g. Hadoop ecosystem) come into today’s arena.
Compared with Relational databases, especially in dealing with unstructured data, NoSQL database technologies in general are less expensive, more scalable and often with better query performance. Typically these technologies allow data to be stored in more native formats and does not need to enforce schema in advance.
Today’s NoSQL database products can be roughly divided into the following categories:
– Document DB: Data are stored in a “document” structure which consists of many different key-value pairs. Document itself is an object container. MongoDB is one of the leading Document DBs on the market. MongoDB data are stored in BSON (Binary JSON) format. Microsoft is now offering a fully managed XML-JSON-based DocumentDB service on Azure. The document concepts are the same.
– Graph DB: Data are stored in a network structure which can be easily represented by visual graphs, such as social connections. Neo4J and HyperGraphDB are examples of Graph DB offerings. The data in a Graph DB are stored as Nodes and Labels. It allows faster queries on relationships between nodes. For example, to answer a question on the potential relationship between two seemingly non-related Twitter IDs. The query needs not to perform expensive joins. In building a graph from raw data, most of the relationships in the data model need to be pre-defined by JSON files. Dynamic relationship crawling APIs can still be challenges for Graph DB.
– Simple Key-value DB: Every single item in the database is a key-value pair. Redis is one of such examples. Additional functionalities can be added to the pair such as specifying a type for the value, as “string” or “Integer”, etc.. Google acquired Firebase in 2014 as a real-time database for developers. It’s in fact a JSON-based Key-String DB. JSON can be returned through RESTFUL client-side code. The sample usages of Firebase are real-time chat rooms, control notifications, etc.
– Wide-column stores: Examples are Cassandra and HBase. These open-source data models are optimized for data stores across multiple clusters and fast query performance over large datasets. They are widely used today for analyzing Big Data. HBase is also available in Mcrosoft Azure HDInsight service offering.
Because many of these NoSQL data structures are based on JSON or BSON, developers can write object-oriented code against the data objects, which in turn can be easily integrated into other application logics.