Map Reduce in Mongo
Recently I was interested in finding out which of our users has received the most coupons. We store information about each coupon in a MongoDB document that looks something like this:
db.coupons.findOne()
{
"_id" : ObjectId("4cb314e7c42af81a30ec3938"),
"recipient" : {
"$ref" : "users",
"$id" : ObjectId("4cb314c7c42af81a238c3838")
},
"text" : "Foo coupon",
"dateTime" : "Wed May 12 2010 00:00:00 GMT+0100 (GMT)",
}
I started off looking at the Aggregation page of the MongoDB help documents. This page lists a number of querying options:
- Count – useful if I wanted to know the number of coupons received by just one member, hence not quite what I was after
- Distinct – for a whole other category of problems
- Group – this looked exactly like what I needed, at least until I read the fine print: “Note: the result is returned as a single BSON object and for this reason must be fairly small – less than 10,000 keys, else you will get an exception. For larger grouping operations without limits, please use map/reduce.” We have 631,000 coupon documents. So map/reduce it is…
Following the example given here, I ended up with these functions:
m = function(){ emit(this.recipient.$id, {coupons : 1})};
r = function (key, values) {
var sum = 0;
for (var i = 0; i < values.length; i++) {
sum += 1;
};
return { coupons: sum };
}
Running this:
res = db.coupons.mapReduce(m,r);
And then querying the results:
db[res.result].find().sort({"value.coupons" : -1})
Gave the answer I needed:
{ "_id" : ObjectId("4cb314c5c42af81aa14f3838"), "value" : { "coupons" : 2236 } }
{ "_id" : ObjectId("4cb314c8c42af81adbb13838"), "value" : { "coupons" : 1133 } }
{ "_id" : ObjectId("4cb314c7c42af81ab2923838"), "value" : { "coupons" : 782 } }
...
This is a fairly straight forward map/reduce and was quite a nice introduction to them.
Advertisement