The best way to learn any new product is to a) read the manual, and b) start using the product.
I created a simple sample application so I could understand the various functions including adding data, searching as well as management functions etc. As with any good sample application using a source of data that already exists always makes life easier. For this example I’m going to use the Operating System output so I will have an ever increasing amount of output for no additional work.
I will be starting with a database called ‘stats’. For this database my first collection is going to be called ‘system’ and this is going to record the most basic of information including date/time, host and cpu (user,sys,idle) stats. I have a simple shell script that creates an appropriate JSON string and I use mongoimport to load the data. Here is my Version 0.1 architectural structure.
mongo> use stats; mongo> db.system.findOne(); { "_id" : ObjectId("4c11183580399ad2db4f503b"), "host" : "barney", "epoch" : 1276188725, "date" : "Thu Jun 10 12:52:05 EDT 2010", "cpu" : { "user" : 2, "sys" : 2, "idle" : 95 }, "raw" : " 11435699 1379565 9072198 423130352 2024835 238766 2938641 0 0" }
I made some initial design decisions before I understand the full strengths/limitation of MongoDB as well as what my actual access paths to data will be.
While I’m using a seconds since epoch for simple range searching, I’m adding a presentation date for user readability. I’ve created a different sub element for cpu, because it a) this element has a number of individual attributes I will want to report and search on, and b) this collection should be extended to include other information like load average, running processes, memory etc.
If my shell script runs in debug mode, I also record the raw data used to determine the end result. This makes debugging easier.
Here is my first query.
Find all statistics between two dates. It took a bit of getting the correct construct syntax correct, $le and $ge didn’t work so RTFM highlighted the correct syntax. I also first included elements for epoch, which resulted in a OR condition, I see you can add multiple comparison operators to a single element to get an AND operation.
mongo> db.system.find({epoch: { $gte: 1276188725, $lte: 1276188754}}); { "_id" : ObjectId("4c11183580399ad2db4f503b"), "host" : "barney", "epoch" : 1276188725, "date" : "Thu Jun 10 12:52:05 EDT 2010", "cpu" : { "user" : 2, "sys" : 2, "idle" : 95 }, "raw" : " 11435699 1379565 9072198 423130352 2024835 238766 2938641 0 0" } { "_id" : ObjectId("4c11184c80399ad2db4f503c"), "host" : "barney", "epoch" : 1276188748, "date" : "Thu Jun 10 12:52:28 EDT 2010", "cpu" : { "user" : 2, "sys" : 2, "idle" : 95 }, "raw" : " 11436605 1379565 9072320 423138450 2024862 238770 2938641 0 0" } { "_id" : ObjectId("4c11185080399ad2db4f503d"), "host" : "barney", "epoch" : 1276188752, "date" : "Thu Jun 10 12:52:32 EDT 2010", "cpu" : { "user" : 2, "sys" : 2, "idle" : 95 }, "raw" : " 11437005 1379565 9072330 423139527 2024862 238770 2938641 0 0" } { "_id" : ObjectId("4c11185180399ad2db4f503e"), "host" : "barney", "epoch" : 1276188753, "date" : "Thu Jun 10 12:52:33 EDT 2010", "cpu" : { "user" : 2, "sys" : 2, "idle" : 95 }, "raw" : " 11437130 1379565 9072334 423139862 2024862 238770 2938641 0 0" } { "_id" : ObjectId("4c11185280399ad2db4f503f"), "host" : "barney", "epoch" : 1276188754, "date" : "Thu Jun 10 12:52:34 EDT 2010", "cpu" : { "user" : 2, "sys" : 2, "idle" : 95 }, "raw" : " 11437316 1379565 9072338 423140325 2024910 238770 2938641 0 0" }
Assuming I’m going to have stats from more then one server in my data, we should always filter by hostname, and then for given period.
mongo> db.system.find({host: "barney", epoch: { $gte: 1276188725, $lte: 1276188754}});
If I only want to see the Date/Time and CPU stats, I can show a subset of the elements found.
mongo> db.system.find({epoch: { $gte: 1276188725, $lte: 1276188754}}, {date:1,cpu:1}); { "_id" : ObjectId("4c11183580399ad2db4f503b"), "date" : "Thu Jun 10 12:52:05 EDT 2010", "cpu" : { "user" : 2, "sys" : 2, "idle" : 95 } } { "_id" : ObjectId("4c11184c80399ad2db4f503c"), "date" : "Thu Jun 10 12:52:28 EDT 2010", "cpu" : { "user" : 2, "sys" : 2, "idle" : 95 } } ...
Filtering on a sub-element is also possible however I found that the representation of strings and numbers does not do an implied conversion. In the following example “2” does not match any results, while 2 does.
mongo> db.system.findOne({host: "barney", "cpu.user": "2"}) null mongo> db.system.findOne({host: "barney", "cpu.user": 2}) { "_id" : ObjectId("4c11161680399ad2db4f5033"), "host" : "barney", "epoch" : 1276188182, "date" : "Thu Jun 10 12:43:02 EDT 2010", "cpu" : { "user" : 2, "sys" : 2, "idle" : 95 } }
Given the collection and load process works, data is being recorded and I can perform some searching I now have the basis for adding additional rich data elements, learning about the internal DBA operations possible after I fix the bug with all my values being 2/2/95.