Rightsource

View Original

Understanding NoSQL document-oriented databases using CouchDB

BY: Ramiro Alves

RightSource Developer

Somehow I remember that this month I’m gonna have 5 years using CouchDB, an Open Source small yet powerful Database Manager System that we started using because at some point in a Security Project I was working on, our previous Databases Engineer went too far away with normalization rules and queries started taking too long to throw results and our team started evaluating options on how to access our data the fastest way, we ended up with CouchDB v1.5… but why?

Our team needed fastest access to data, so CouchDB being a NoSQL document-oriented database let us access documents directly using it’s incorporated APIs directly, no need to make queries, basically we have an horizontal-organized document’s structure and it was awesome, and then, as any “new” paradigm to conventional programmers, questions started to emerge, and according to my experience I’ll try to answer them one by one.

What’s a document and how it is formed?

A document for CouchDB is a quasi free structured JSON.

Why a quasi free structured JSON?

CouchDB have it’s own internal processes, to make the magic work, CouchDB required some obligatory fields like:

  • _id: A main and unique identifier formatted in a UUID v4 form.

  • deleted: CouchDB naturally doesn’t delete documents, but stores them with a deletion boolean flag, this flag removes the document from almost any list, view, or limit it’s access to the user, but can be retrieved at any time.

  • _rev: Anytime we modify a document, it keeps a version control, this way the prior version of a document gets stored in the database and the newest one take its place in views, list, or access to the user. To access old version, the user must know it’s old revision number.

Can I make queries against the database?

Short answer: No “queries”

What if I want to only get documents with certain info?  

Normally when retrieving data, you might wanna get a range of records that meets some conditions, CouchDB gave us naturally the feature of being able to retrieve from our horizontal pool a set of documents if we knew their _id field, but what would happen if we wanted to retrieve documents that meets some condition let’s say… from yesterday until today, knowing that for each stored document we added a field to know when it was stored? At that point, we needed to create views, at the end I’ll tell you a story about this caveats.I did read that you added a field to know when the document was created, is there any other field you would like to recommend?

Yes, totally, as we know, JSONs can take any form and be stored in CouchDB, in our database we stored users, settings for those users, and other related info. I’m pretty sure it’s a common scenario to add a “type” key to the JSON defining what entity this JSON contains info for.

How do we relate documents to others? 

CouchDB is not a relational database, it was conceived in part to make access pretty straightforward and therefore have no relational features per se, but as a developers, we always want to granulate data and keep it the most organized possible, so let’s assume that we can store JSONs with fields that relate to other documents in our database,in the following example, Document A relates to Document B based on Document’s A _id field and Document’s B idUser field:

Ok, this is great, but you still haven’t told me how I can retrieve documents from certain type:

Yes, I forgot, we were covering the basic doubts and we’ll keep with it. The only way at the time was that we had to build a design document and in every design document add the view. The view should look like a function as the following:

After having that view in our database, we should be able to query it by making an HTTP GET Request against our Database host, and it should look like this: 

  • DATABASE_URI is the URI for our CouchDB running instance.

  • DATABASE_NAME is the name of our database.

  • DESIGN_DOCUMENT_NAME is the name of the design document containing our view function.

  • VIEW the function that’s going to filter our results, this function will expose documents in two values, a key, which is the query index we are going to use, and value, which is the value exposed for that record.

Sure, that view will return all user-type documents in our database, but what if besides that I would like to know which users were created from yesterday until today.

Well, in that scenario we might want to add another view that exposes the date the user was created, and should look as the following:

The small change we made comparing the previous view, in out emit function, we are emitting the date it was created, instead of the main identifier. So finally, we would be able to query our users from yesterday until today adding the following params to our HTTP GET Request.

  • startkey: assigning our date from yesterday.

  • endkey: providing the end value of the date we are looking for.

I’m getting it, but what if I want to do something else, for example, counting how many documents of a kind we have.

Ok, let’s look at our two points above, have you seen how we’ve been returning null values in our emit functions? Let’s do this little change.

Aside from that, we would have to add a SUM function as a reducer for that view as it follows:

Normally this would do nothing if we keep querying as we’ve been doing, that’s why we will need to add a new param to our HTTP GET Request if we want to only count:

  • reduce: a boolean value indicating that if we want to apply our reduce function to the values in the returning records.

Great, now… how do we rollback? 

Sadly, the only way to rollback should be using the stored documents revisions, but there’s no built-in way to rollback at certain time or point without knowing the revisions we want to rollback to, so this would have to be code controlled.

How do we make migrations? 

As rollbacks, this changes on the structure of the documents, needs to be code-controlled.

How do we make transactions possible?

As migrations, to take control of transaction changes, this needs to be handled by code.

Are views the only way to query?

Nope, not at all, after IBM got interested in CouchDB, they incorporated MANGO Queries to CouchDB in which it is necessary to add indexes to our database to perform complex queries using only a JSON, this way of querying in CouchDB really made life easier for us in the sense that all of this became pretty intuitive. Let’s assume we have the following structure.

What we need to understand from the structure above is that making a query with that object we’ll try to retrieve all documents that:

  • Have a valid name field, and the retrieved type should be “user” or “userSettings”

  • The records are gonna be retrieved sorted by name.

  • The records only will contain the fields _id and name.

  • We are only taking 10 documents from the whole collection.

  • Those 10 documents taken, started to count after 20 documents that met the condition above.

To close the subject, aside from the non-conventional approach that CouchDB and it’s environment brings to the table, it also provides some quick-response on execution environments, light-weighted memory space, and a ton of fun yet powerful features to achieve real-time responses from heavy systems.