Liz Douglass

Posts Tagged ‘MongoDB

Map Reduce in Mongo

leave a comment »

Recently I was interested in finding out which of our users has received the most coupons. We store information about each coupon in a MongoDB document that looks something like this:

db.coupons.findOne()
{
    "_id" : ObjectId("4cb314e7c42af81a30ec3938"),
    "recipient" : {
        "$ref" : "users",
        "$id" : ObjectId("4cb314c7c42af81a238c3838")
    },
    "text" : "Foo coupon",
    "dateTime" : "Wed May 12 2010 00:00:00 GMT+0100 (GMT)",
}

I started off looking at the Aggregation page of the MongoDB help documents. This page lists a number of querying options:

  • Count – useful if I wanted to know the number of coupons received by just one member, hence not quite what I was after
  • Distinct – for a whole other category of problems
  • Group – this looked exactly like what I needed, at least until I read the fine print: “Note: the result is returned as a single BSON object and for this reason must be fairly small – less than 10,000 keys, else you will get an exception. For larger grouping operations without limits, please use map/reduce.” We have 631,000 coupon documents. So map/reduce it is…

Following the example given here, I ended up with these functions:

m = function(){ emit(this.recipient.$id, {coupons : 1})};

r = function (key, values) {
    var sum = 0;
    for (var i = 0; i < values.length; i++) {
        sum += 1;
    };
    return { coupons: sum };
}

Running this:

res = db.coupons.mapReduce(m,r);

And then querying the results:

db[res.result].find().sort({"value.coupons" : -1})

Gave the answer I needed:

{ "_id" : ObjectId("4cb314c5c42af81aa14f3838"), "value" : { "coupons" : 2236 } }
{ "_id" : ObjectId("4cb314c8c42af81adbb13838"), "value" : { "coupons" : 1133 } }
{ "_id" : ObjectId("4cb314c7c42af81ab2923838"), "value" : { "coupons" : 782 } }
...

This is a fairly straight forward map/reduce and was quite a nice introduction to them.

Advertisements

Written by lizdouglass

December 15, 2010 at 9:23 am

Posted in Uncategorized

Tagged with ,

MongoDB

with 2 comments

Recently I started on a project that is using some interesting technologies including Scala, MongoDB and Django. Some are quite new to me and I’ve learnt a great deal. Here are some observations of the things I’ve learnt.

MongoDB

Mongo is a schema-less document-oriented database that stores data in binary encoded JSON documents – BSON documents. The online documentation is quite good and there is a good tutorial and an online interactive shell.

How have we been using Mongo so far?

We have two projects that interact with with Mongo; one is a RESTful API back-end and the other is a tool for populating a Mongo database with data from a MySQL database. Both these projects are written in Scala and use the Mongo Java API.

Why Mongo?

– The JSON-like documents allows us to store data about in a way that is obvious because they read like plain English.  We need to store information about the users of our system. In nearly all cases, the information available is different for each person – some people have no contact phone numbers, others have 6 children. Mongo has allowed us to build up profiles for our users and only include the pieces of information that we actually have available for them.

– The schema-less and denormalised nature of the database means that we can modify the structure frequently. We only started this project a couple of weeks ago and have already made several quite large changes. These include how we organise the Mongo documents that we’ve been generating from a legacy MySQL database. This sort of flexibility is fantastic, especially at the beginning of a new project.

Populating the Mongo database

Our data migration project extracts data for each record in the MySQL database using a SQL query. This data is then used to create a Person domain object, which is composed of microtypes like the one below. The Person type, as well as all of these types, implement the ConvertableToMongo trait:

class NextOfKin(val relationship: Relationship, val person: Person) extends ConvertableToMongo {
  def toMongoObject: DBObject = {
    new BasicDBObject(Map(
      "relationship" -> relationship.toMongoObject,
      "person" -> person.toMongoObject).asJava)
}

where:

class Relationship(val description: String) extends ConvertableToMongo {
  def toMongoObject(): DBObject = {
    new BasicDBObject(Map("relationship" -> description).asJava)
  }
}

ConvertableToMongo is a trait:

trait ConvertableToMongo {
  def toMongoObject: DBObject
}

Note that we need to use the asJava method from the scala-javautils library convert the Scala map to the requisite Java map required by the Mongo API.

The ConvertableToMongo trait has a single method that returns a Mongo DBObject. These are inserted into a Mongo collection like this:

val usersCollection = mongo.getCollection("user")
members foreach(user  => usersCollection insert(user toMongoObject))

The end result is a Mongo document like this one:

{
	"_id" : ObjectId("4c29f7fdbe924173a47a759f"),
	"firstName" : "Joe",
	"surname" : "Bloggs",
	"gender" : "Male",
	"nextOfKin" : {
		"relationship" : "Son",
		"person" : {
			"name" : "John Bloggs"
		},
	},
}

Note that unless specified every document added to a Mongo collection will automatically be assigned an ObjectId with the key _id.

Written by lizdouglass

July 19, 2010 at 8:31 am

Posted in Uncategorized

Tagged with