DISCOVERY

December 16th, 2017

Learning MongoDB Part II: Working with Documents

MongoDB

JavaScript

Relational Database

NoSQL

Document Database

Today I'm building on my first MongoDB discovery and looking at documents in more depth. To start, let's implement Christmas tree purchases in the database. The first task is picking a tree to buy! I searched the database for a tree I liked and used the findOne() function to return a single tree.

db.tree.findOne({type:"douglas", grade:"7-8ft", height:"7' 3\""}) /* Result */ { "_id" : ObjectId("5a3352702e48ee76cb1fe459"), "type" : "douglas", "height" : "7' 3\"", "source_price" : 10, "sell_price" : 60, "grade" : "7-8ft", "sold" : false, "buyer_id" : undefined }

Next I created a customer collection to hold all the people who bought trees:

db.customer.insert({ username: "andy", name: "Andrew Jarombek", email: "andy@jarombek.com" })

One thing you may have noticed is that each document has a field called _id with a value of ObjectId(...) containing a hex number. These hex digits are not randomly generated. Instead they hold organized information about the document. The first eight hex digits (four bytes) of the ObjectId are a timestamp of when the id was created. The rest of the id is broken down into three pieces - the machine ID, process ID, and a counter which increments each time an ObjectId is generated. All these items together create a very reliable unique key (you don't have to worry about collisions, the possibility is so small).

So why am I going into the details of the ObjectId? These unique ids are commonly used to link documents together by creating a property on one document that contains the id of another document. In the case of the Christmas tree database, I created a collection for purchases. In this collection, each document is linked to both the purchased tree and the customer. I took the tree and customer document ids and put them in the purchase document.

let tree_id = db.tree.findOne({type:"douglas", grade:"7-8ft", height:"7' 3\""})._id; let user_id = db.customer.findOne()._id; db.purchase.insert({ type: "douglas", grade: "7-8ft", price: 60, tree_id: tree_id, username: "andy", user_id: user_id, date: Date() })

This code also displays the ability to use JavaScript variables in MongoDB queries. Variables allow for readable and structured queries. Now I can see the other document ids in the purchase document:

db.purchase.findOne() /* Result */ { "_id" : ObjectId("5a34a1942e48ee76cb1fe82c"), "type" : "douglas", "grade" : "7-8ft", "price" : 60, "tree_id" : ObjectId("5a3352702e48ee76cb1fe459"), "username" : "andy", "user_id" : ObjectId("5a349e732e48ee76cb1fe82b"), "date" : "Fri Dec 15 2017 23:31:16 GMT-0500 (EST)" }

You will notice there are some duplicated fields from other collections in the purchase document (such as the username property). This sort of duplication is frowned upon in a RDBMS, however since there are no JOINs in MongoDB duplication is okay1.

I've demonstrated how to link related documents in MongoDB, making it easy to find a linked document without a JOIN operation. Let's take a step back and look at the first query I made for picking out a Christmas tree. I called the explain() function on this query to find useful execution information:

db.tree.find({type:"douglas", grade:"7-8ft", height:"7' 3\""}).explain("executionStats") /* Result */ { "executionStats" : { "executionSuccess" : true, "nReturned" : 6, "executionTimeMillis" : 4, "totalKeysExamined" : 0, "totalDocsExamined" : 1003, } }

The most important property in the returned JSON object is totalDocsExamined. Notice that the query looked at every single document in the collection. Now imagine how slow this could be if there were millions of documents in the collection! For anyone who has used databases before the solution should come to mind - an index. Let's add indexes to the commonly queried fields in tree.

db.tree.createIndex({type: 1}) db.tree.createIndex({grade: 1}) db.tree.createIndex({height: 1})

You may be wondering about the significance of the value 1. This means that the index is stored in ascending order, while a -1 means descending order2. When I call explain() again, only the returned documents are examined. Much better!

All the indexes on a document are displayed with the getIndexes() function.

db.tree.getIndexes()

Indexes are used for other purposes besides speeding up query times. They can expire documents in a time-to-live (TTL) collection3. These collections use indexes to set a date that a document expires. In order to create a TTL collection, a date property needs to exist on the documents. In the tree documents I set this date to Christmas eve, since nobody will buy a tree after then.

db.tree.updateMany({}, {$set: {"availableUntil": new Date("2017-12-24")}})

Next I created an index on the availableUntil property. The second parameter of createIndex() contains additional options, in this case expiring the document zero seconds after the date in availableUntil4.

db.tree.createIndex({availableUntil: 1}, {expireAfterSeconds: 0})

I applied a lot of new MongoDB concepts to the tree database. The power of linked documents and indexes in MongoDB is now clear. I will look at MongoDB even more in my next discovery. The code for this discovery can be found on GitHub.

[1] Kyle Banker, Peter Bakkum, Shaun Verch, Douglas Garrett & Tom Hawkins, MongoDB In Action, 2nd ed (Shelter Island, NY: Manning, 2016), 83

[2] "db.collection.createIndex()", https://docs.mongodb.com/v3.4/reference/method/db.collection.createIndex/

[3] Banker., 90

[4] "Atomicity and Transactions", https://docs.mongodb.com/manual/tutorial/expire-data/