Other Databases

PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness. It runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.

An enterprise class database, PostgreSQL boasts sophisticated features such as Multi-Version Concurrency Control (MVCC), point in time recovery, tablespaces, asynchronous replication, nested transactions (savepoints), online/hot backups, a sophisticated query planner/optimizer, and write ahead logging for fault tolerance. It supports international character sets, multibyte character encodings, Unicode, and it is locale-aware for sorting, case-sensitivity, and formatting. It is highly scalable both in the sheer quantity of data it can manage and in the number of concurrent users it can accommodate. There are active PostgreSQL systems in production environments that manage in excess of 4 terabytes of data.

MySQL vs PostgreSQL is a decision many must make when approaching open-source relational databases management systems. Both are time-proven solutions that compete strongly with proprietary database software. MySQL has long been assumed to be the faster but less full-featured of the two database systems, while PostgreSQL was assumed to be a more densely featured database system often described as an open-source version of Oracle. MySQL has been popular among various software projects because of its speed and ease of use, while PostgreSQL has had a close following from developers who come from an Oracle or SQL Server background.

These assumptions, however, are mostly outdated and incorrect. MySQL has come a long way in adding advanced functionality while PostgreSQL dramatically improved its speed within the last major releases. Many, however, are unaware of the convergence and still hold on to stereotypes based on MySQL 4.1 and PostgreSQL 7.4. The current versions are MySQL 5.6 and PostgreSQL 9.1.

Microsoft SQL Server

Microsoft SQL Server is a relational database management system (RDBMS). Its primary query language is Transact-SQL, an implementation of the ANSI/ISO standard Structured Query Language (SQL) used by Microsoft and Sybase.

The architecture of Microsoft SQL Server is broadly divided into three components: SQLOS that implements the basic services required by SQL Server, including thread scheduling, memory management and I/O management; the Relational Engine , which implements the relational database components including support for databases, tables, queries and stored procedures as well as implementing the type system; and the Protocol Layer that exposes the SQL Server functionality.

Microsoft SQL Server’s tight integration with Windows Server, automated self-tuning and management tools, and the wide availability of developers and compatible business applications can help small to medium-sized businesses achieve a positive ROI (Return-on-Investment).

NoSQL Databases

The general definition of a NOSQL data store is that it manages data that is not strictly tabular and relational, so it does not make sense to use SQL for the creation and retrieval of the data. NOSQL data stores are usually non-relational, distributed, open-source, and horizontally scalable.

The NoSQL (or “not only SQL”) movement is defined by a simple premise: Use the solution that best suits the problem and objectives.

If the data structure is more appropriately accessed through key-value pairs, then the best solution is likely a dedicated key value pair database.

If the objective is to quickly find connections within data containing objects and relationships, then the best solution is a graph database that can get results without any need for translation (O/R mapping).

Although the original emergence of NOSQL data stores was motivated by web-scale data, the movement has grown to encompass a wide variety of data stores that just happen to not use SQL as their processing language. There is no general agreement on the taxonomy of NOSQL data stores, but the categories below capture much of the landscape.

Tabular / Columnar Data Stores

Storing sparse tabular data, these stores look most like traditional tabular databases. Their primary data retrieval paradigm utilizes column filters, generally leveraging hand-coded map-reduce algorithms.

BigTable is a compressed, high performance, and proprietary database system built on Google File System (GFS), Chubby Lock Service, and a few other Google programs;

HBase is an open source; non-relational, distributed database modeled after Google’s BigTable and is written in Java. It runs on top of HDFS, providing a fault-tolerant way of storing large quantities of sparse data.

Hypertable is an open source database inspired by publications on the design of Google’s BigTable. Hypertable runs on top of a distributed file system such as the Apache Hadoop DFS, GlusterFS, or the Kosmos File System (KFS). It is written almost entirely in C++ for performance.

VoltDB is an in-memory database. It is an ACID-compliant RDBMS which uses a shared nothing architecture. VoltDB is based on the academic HStore project. VoltDB is a relational database that supports SQL access from within pre-compiled Java stored procedures.

Google Fusion Tables is a free service for sharing and visualizing data online. It allows you to upload and share data, merge data from multiple tables into interesting derived tables, and see the most up-to-date data from all sources.

Document Stores

These NOSQL data sources store unstructured (i.e., text) or semi-structured (i.e., XML) documents. Their data retrieval paradigm varies highly, but documents can always be retrieved by unique handle. XML data sources leverage XQuery. Text documents are indexed, facilitating keyword search-like retrieval.

Apache CouchDB, commonly referred to as CouchDB, is an open source document-oriented database written in the Erlang programming language. It is designed for local replication and to scale vertically across a wide range of devices.

MongoDB is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language.

Terrastore is a distributed, scalable and consistent document store supporting single-cluster and multi-cluster deployments. It provides advanced scalability support and elasticity feature without loosening the consistency at data level.

Graph Databases

These NOSQL sources store graph-oriented data with nodes, edges, and properties and are commonly used to store associations in social networks.

Neo4j is an open-source graph database, implemented in Java. It is “embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs.

AllegroGraph is a Graph database. It considers each stored item to have any number of relationships. These relationships can be viewed as links, which together form a network, or graph.

FlockDB is an open source distributed, fault-tolerant graph database for managing data at webscale. It was initially used by Twitter to build its database of users and manage their relationships to one another. It scales horizontally and is designed for on-line, low-latency, high throughput environments such as websites.

VertexDB is a high performance graph database server that supports automatic garbage collection. It uses the HTTP protocol for requests and JSON for its response data format and the API are inspired by the FUSE file system API plus a few extra methods for queries and queues.

Key/Value Stores

These sources store simple key/value pairs like a traditional hash table. Their data retrieval paradigm is simple; given a key, return the value.

Dynamo is a highly available, proprietary key-value structured storage system. It has properties of both databases and distributed hash tables (DHTs). It is not directly exposed as a web service, but is used to power parts of other Amazon Web Services

Memcached is a general-purpose distributed memory caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read.

Cassandra is an open source distributed database management system. It is designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powers their Inbox Search feature.

Amazon SimpleDB is a distributed database written in Erlang by Amazon.com. It is used as a web service in concert with EC2 and S3 and is part of Amazon Web Services.

Voldemort is a distributed key-value storage system. It is used at LinkedIn for certain high-scalability storage problems where simple functional partitioning is not sufficient.

Kyoto Cabinet is a library of routines for managing a database. The database is a simple data file containing records; each is a pair of a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table or B+ tree.

Scalaris is a scalable, transactional, distributed key-value store. It can be used for building scalable Web 2.0 services.

Riak is a Dynamo-inspired database that is being used in production by companies like Mozilla.

Object and Multi-value Databases

These types of stores preceded the NOSQL movement, but they have found new life as part of the movement. Object databases store objects (as in object-oriented programming). Multi-value databases store tabular data, but individual cells can store multiple values. Examples include Objectivity, GemStone and Unidata. Proprietary query languages are used.

Miscellaneous NOSQL Sources

Several other data stores can be classified as NOSQL stores, but they don’t fit into any of the categories above. Examples include: GT.M, IBM Lotus/Domino, and the ISIS family.