- Other articles in this series:
- Part 1
In "Four Reasons to Love CouchDB, Part 1" I expressed how easy CouchDB is to implement and learn, despite being different than what you'd be used to using any of the many popular database platforms out there. I even suggested that you don't even have to learn a server-side language to interact with it. Now, lets move on to our next reason:
Reason #2: It's not a RDBMS
Relational database persist data in a very complex manner. One of my professors a while ago (do they call them that at Community College?) used an analogy of a garage to show why they don't always make sense. "If your garage was a RDBMS, and you wanted to store your car in it, you would remove the engine, wheels, steering wheel, gas tank, etc and insert them individually in a neat manner. When you wanted your car back, you'd select all of the parts and reassemble them." This works well if you ever want to quickly find the serial number and displacement of your engine, or want to share tires with another car, but doesn't work well when you (more commonly) just want to drive your car. Lets not forget to mention the schema changes that would need to be made if you brought home a motorcycle!
CouchDB is a Document-Oriented Database
In MySQL, we're often taught to normalize data to increase performance and reduce data duplication. This results in lots of "JOIN category ON category.id = article.id, JOIN author ON author.id = article.id" just to fetch all of the useful information for a single object. CouchDB is a Document-oriented database -- All data for a specific object is self-contained, within the same document. This results in data being persisted in the same way that we think of it, with sacrifice for data duplication.
But that doesn't mean that ALL data is duplicated of course. We can still retain the concept of "foreign keys", but we use them minimally. The idea is to be semistructured.
A real world example.
We want to store articles, each with an article and collection of comments. We want to be able to show an article on a page, along with it's comments. We also want to be able to get all comments across all posts for a "latest comments" page.
In MySQL, we would normally normalize all of this data into separate tables, and create foreign key references. There would be a single table forauthors, a table forarticles, and a table forcomments. With this complex schema, we'd end up with queries such as the following
1 | SELECT * FROM articles LEFT JOIN author ON author.id = article.author_id WHERE article.id = $articleId |
To show the article information, and then a second query to show all comments for that article:
1 | SELECT * FROM comments WHERE article_id = $articleId |
In CouchDB, we can easily store all of this data within the same document, without hindering out ability to query it. Although schema-less, CouchDB does have the concept of "views", which are similar to the concept of Views in RDBMS. Views are functions stored within special documents called "design documents". They contain a field named "views" which is an array of JavaScript (the default language) functions used to filter (map) and sort (key) document data, and to generate aggregate results (reduce).
The following document represents all of the data we stored in our MySQL schema above, except that it's all stored within one object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | { "id" : "article_id_2", "type" : "article", "title" : "Some Title", "author" : { "name" : "A.J. Brown", "email" : "nospam@ajbrown.org" }, . . . "comments": [ { "name" : "John Smith", "text" : "A.J. Brown is the best, ever", "created" : "2008-12-21 01:23:24 AM" }, { "name" : "Jane Applebees", "text" : "I totally agree with John", "created" : "2008-12-21 02:55:24 AM" }, ] } |
Although everything is stored within the structure of a single document, We can still get all of the data we need easily. We just need a view that will key our documents on the article id:
1 2 3 4 5 | function map ( doc ) { if ( doc.type == 'article' ) { emit( null, doc ); } } |
This function says "Show me all documents with a field named "type" that has a value of "article", and sort and collate (key) them by their document id". Now we can call the view and get all articles and their comments. By adding "?key=
To retrieve only all comments of all posts, we'll need to create a second view. This view will extract the comments from every document has a type=article, and collate them by them by the comment's created field.
1 2 3 4 5 | function map ( doc ) { for (var i in doc.comments) { emit(doc.comments[i].created, doc.comments[i]); } } |
Our resultset would look something like this: (note, view result sets have key and id information. I've left those out for simplicity sake):
1 2 3 4 5 6 7 8 9 10 | { "name" : "John Smith", "text" : "A.J. Brown is the best, ever", "created" : "2008-12-21 01:23:24 AM" }, { "name" : "Jane Applebees", "text" : "I totally agree with John", "created" : "2008-12-21 02:55:24 AM" }, |
Since we keyed our view on the comments created field,we can easily narrow our results by date by specifying astartKey and anendKey in our request URI:
http://couchserver/database/_view/articles/comments?startkey=["2008-12-21 02:00:00 AM"]
This would only return "Jane Applebee's" comment, since the specified start key come sequentially after the rest of the results.
Handling the custom fields is simple with CouchDB
Conclusion
Storing data this way makes a lot of sense in a lot of cases. For example, my newest project needs to be able to store system events. Every event has (and enforces) a standard set of fields, but there may be additional custom (unpredictable) fields as well, as defined by whatever process sends the event. Those custom fields must be available for querying by any program that does know of them. Handling the custom fields is simple with CouchDB -- I just add them to the document being stored. A relational database schema for this data model would be complex and hard to follow.
RDMS's are powerful, but sometimes they're overkill for our applications. I've seen some hypothesis suggesting that CouchDB's document oriented strategy would work for 90% of web applications today. Who knows how accurate that is, but it definitely makes you think. After all, you can't get any more cutting edge than a database engine asking us to rethink how we model data which is written in a language that asks us to rethink parallel processing.
Check back tomorrow for Reason #3
Special thanks to the following blogs for helping me get started with CouchDb:












Great overview. Thanks.
Waiting impatiently for part 3
Been a little busy right after the holidays, but it's in the works!