Liz Douglass

Map Reduce in Mongo

leave a comment »

Recently I was interested in finding out which of our users has received the most coupons. We store information about each coupon in a MongoDB document that looks something like this:

db.coupons.findOne()
{
    "_id" : ObjectId("4cb314e7c42af81a30ec3938"),
    "recipient" : {
        "$ref" : "users",
        "$id" : ObjectId("4cb314c7c42af81a238c3838")
    },
    "text" : "Foo coupon",
    "dateTime" : "Wed May 12 2010 00:00:00 GMT+0100 (GMT)",
}

I started off looking at the Aggregation page of the MongoDB help documents. This page lists a number of querying options:

  • Count – useful if I wanted to know the number of coupons received by just one member, hence not quite what I was after
  • Distinct – for a whole other category of problems
  • Group – this looked exactly like what I needed, at least until I read the fine print: “Note: the result is returned as a single BSON object and for this reason must be fairly small – less than 10,000 keys, else you will get an exception. For larger grouping operations without limits, please use map/reduce.” We have 631,000 coupon documents. So map/reduce it is…

Following the example given here, I ended up with these functions:

m = function(){ emit(this.recipient.$id, {coupons : 1})};

r = function (key, values) {
    var sum = 0;
    for (var i = 0; i < values.length; i++) {
        sum += 1;
    };
    return { coupons: sum };
}

Running this:

res = db.coupons.mapReduce(m,r);

And then querying the results:

db[res.result].find().sort({"value.coupons" : -1})

Gave the answer I needed:

{ "_id" : ObjectId("4cb314c5c42af81aa14f3838"), "value" : { "coupons" : 2236 } }
{ "_id" : ObjectId("4cb314c8c42af81adbb13838"), "value" : { "coupons" : 1133 } }
{ "_id" : ObjectId("4cb314c7c42af81ab2923838"), "value" : { "coupons" : 782 } }
...

This is a fairly straight forward map/reduce and was quite a nice introduction to them.

Advertisements

Written by lizdouglass

December 15, 2010 at 9:23 am

Posted in Uncategorized

Tagged with ,

Scalatra, Scalate and Scaml

with one comment

A while ago I set about creating a webapp for monitoring the status of the various Quartz jobs that we use to keep our main application ticking. It was put together quickly using Simple Build Tool and Scalatra. I needed a templating engine and noticed that Scalatra has support for Scalate (although at the time the support was only experimental). I decided to give it a try. The first step was to add the dependencies for Scalate and Scalatra-Scalate into our build file:

val scalatraScalate = "org.scalatra" % "scalatra-scalate_2.8.0" % "2.0.0.M1"
val scalateCore = "org.fusesource.scalate" % "scalate-core" % "1.2"

I soon ran into the Scalatra issue with using Scalate and Logback and instead decided to use Scalate on its own. Referring to the Scalate Embedding Guide, I created an instance of the Scalate TemplateEngine class in a trait that is extended by my Scalatra servlet:

trait ScalateTemplateEngine {
   def render(templatePath: String, context: Map[String, Any], response: HttpServletResponse) = {
       val templateEngine = new TemplateEngine
       val template = templateEngine.load(templatePath)

       val buffer = new StringWriter()
       val renderContext = new DefaultRenderContext(templateEngine, new PrintWriter(buffer))

      context.foreach({case (key, value) => renderContext.attributes(key) = value})
      template.render(renderContext)

      response.getWriter.write(buffer.toString)
   }
}

Then, referring to this excellent blog post, I created an index.scaml file. The scaml file (below) is concise and reasonably readable. The only hiccups I encountered in making it were:

  1. Figuring out the syntax and correct indenting of the for loop
  2. Realising that all the values the map are automatically converted to Options (hence the many get calls).
-@ val title: String = "Scheduler"
-@ val header: String = "Scheduler"
-@ val quartzStatus: String = "No quartz status information available"
-@ val scheduledTasks: List[Map[String, String]] = Nil
-@ val quartzInformation: String = "No quartz information available"
!!! 5
%html
  %head
    %title= title
  %body
    %h1
      != header
    %h5
      != quartzStatus

    %table{:border => 1}
      %thead
        %td Task
        %td Status
        %td Quantity
        %td Last Attempt Start Time
        %td Last Attempt Finish Time
        %td Last Successful Run Start Time
        %td Last Successful Run Finish Time
        %td Last Error
        %td Action
      = for(scheduledTask <- scheduledTasks)
        %tbody
          %td= scheduledTask.get("name").get
          %td= scheduledTask.get("status").get
          %td= scheduledTask.get("quantity").get
          %td= scheduledTask.get("lastRunStart").get
          %td= scheduledTask.get("lastRunFinish").get
          %td= scheduledTask.get("lastSuccessStart").get
          %td= scheduledTask.get("lastSuccessFinish").get
          %td= scheduledTask.get("lastException").get
          %td
            %form(method="post")
              %input(type="submit" name={scheduledTask.get("action").get} value="Run now")

Written by lizdouglass

December 15, 2010 at 9:22 am

Posted in Uncategorized

Tagged with , ,

Using Mockito in a Scala unit test

leave a comment »

Our project has been using ScalaTest for unit and integration testing. For some of unit tests we have been using the Mockito mocking library. Recently I was caught out on two occasions when I was writing a Scala unit test that had some mocked classes.

The first head scratching moment was caused by a verification statement like this one:

verify(myClass).createSomething(org.mockito.Matchers.eq(name), anyString)

I had added an import statement for the Matchers class and expected to be cooking with gas….. but then this error appeared:

[error] MyClass.scala:47: type mismatch;
[error]  found   : Boolean
[error]  required: String
[error]     verify(myClass).createSomething(eq(name), anyString)
[error]                                      ^
[error] one error found

Why was the eq method returning a Boolean and not the type of the name variable (ie String)? I had written similar things in Java many times before. I realised that the eq method being used is the one that is defined in the Scala AnyRef class:

def   eq  (arg0: AnyRef)  : Boolean

This was simply fixed by explicitly calling the eq method from the Matchers class:

verify(outboundEmailService).createSomething(org.mockito.Matchers.eq(listName), anyString)

A little later on I ran into another problem with a verification. Once again, I thought there was not much to it:

verify(myClass).sendToGroup(org.mockito.Matchers.eq("foo"), startsWith(name), org.mockito.Matchers.eq(something))

But then….

org.mockito.exceptions.misusing.InvalidUseOfMatchersException:
Invalid use of argument matchers!
0 matchers expected, 3 recorded.
This exception may occur if matchers are combined with raw values:
//incorrect:
someMethod(anyObject(), "raw String");
When using matchers, all arguments have to be provided by matchers.
For example:
//correct:
someMethod(anyObject(), eq("String by matcher"));

I eventually realised that this meant that I had not provided matchers for all the parameters. In fact the method does have a fourth parameter, but it has a default value. I hadn’t expected to need to provide a matcher for the fourth parameter but as the docs state “If you are using argument matchers, all arguments have to be provided by matchers.”

Written by lizdouglass

December 15, 2010 at 8:58 am

Posted in Uncategorized

Tagged with ,

Django form validation – why does the error message always appear?

leave a comment »

Recently I’ve been working on a piece of functionality involving a Django form. The form was defined like this:

class FooForm(forms.Form):
    recipient = forms.CharField(widget=forms.HiddenInput)
    text = forms.CharField(widget=forms.Textarea)

    def clean_message(self):
       data = self.cleaned_data['text']
       message = data.strip(' \t\n\r')
       if (len(message) == 0):
           raise forms.ValidationError("You need to provide some text")
       return data

It was instantiated in a handler method like so:

def bar(request):
    form = FooForm({'recipient' : some.name})
    return render_to_response('my.html', RequestContext(request, {'form': form}))

The validation was working a little too well in that the error message for the text field was appearing when the form was first loaded. The response to a question posted here explains why this was happening. As the Django documentation says, the form has been data bound. This has happened because a data dictionary has been provided as the first argument to the form constructor. This “trigger(s) validation of this input” (Lott, March 17 2009). So the error message is appearing because the text field was not bound with a value that passes validation. As it also says in the Django documentation, the alternative to binding data is to provide dynamic initial values like so:

def bar(request):
    form = FooForm(initial={'recipient' : some.name})
    return render_to_response('my.html', RequestContext(request, {'form': form}}))

Using this, the validation is not immediately triggered and the message only appears if an invalid form has been submitted.

Written by lizdouglass

November 1, 2010 at 9:49 pm

Posted in Uncategorized

Tagged with ,

Migrating to Scala 2.8

leave a comment »

A few months ago Scala 2.8.0 was released. We migrated our project from version 2.7.7 quite soon after the announcement. Making the switch was quite straight forward, in fact all we needed to do was change the build.scala.versions property in our Simple Build Tool build.properties file. Admittedly it took several hours to fix all the compilation errors, but once this was done we found we found we were able to make our code base more readable because of two shiny new things in particular:

1. Collections:

As I’ve written about before, we are using MongoDB for persistence. We are also using the MongoDB Java driver. This library makes heavy use of BasicDBObjects. These are the simplest implementation of the DBObject interface. This interface represents “A key-value map that can be saved to the database”. BasicDBObjects are used for more than just inserting data. In fact they are also used extensively in many of the methods defined on the DBCollection class. For example, this is the definition of the update method:

WriteResult update(DBObject q, DBObject o)

(Source: class com.mongodb.DBCollection)
(q is the query used to find the correct element and o is the update operation that should be performed)

We often use the BasicDBObject constructor that takes a Java map. When were using Scala 2.7.7 we had a library to handle the conversion of maps from Scala to Java. We ended up with a lot of repository methods that were littered with calls to asJava, like the one below:

import org.scala_tools.javautils.Imports._
import com.mongodb._
import org.bson.types.ObjectId

def updateLastLogin(personId: String) {
   val criteria = new BasicDBObject(Map("_id" -> new ObjectId(personId)).asJava)
   val action = new BasicDBObject(Map("$set" -> new BasicDBObject(Map("lastLogin" -> new Date).asJava)).asJava)
   collection.update(criteria, action)
}

Now in Scala 2.8 there is a new mechanism for converting collection types from Scala to Java and thankfully all the asJava calls have disappeared from our codebase. Now we have cleaner functions like this one:

def markLogin(memberId: String) = set(new MemberId(memberId).asObjectId, "lastLogin" -> new Date)

Where:

def set(id: ObjectId, keyValues: (String, Any)*) = collection.update(Map("_id" -> id), $set (keyValues: _*), false, false)

Note that we are now also using the Casbah library. The update method above is defined on the MongoCollectionWrapper in the Casbah library.

2. JSON

We have a Scala backend API and a Django frontend. The backend serves JSON to our frontend. We are using the Lift Json library to do both the serialising in the backend webapp, as well as the de-serialising in our integration tests. Some of our tests use the Lift Json Library “LINQ” style JSON parsing, while others use the “Xpath” style (both of which are described here):

LINQ style:

import net.liftweb.json.JsonParser._
import net.liftweb.json.JsonAST._
...

val (data, status) = ApiClient.get(myUri)
val json = parse(data)

val myValues = for{JField("something", JString(myValue)) <- (json \ "somethingElse")} yield myValue
myValues should have length (3)
myValues should (
    contain("frog")
    and contain("tennis")
    and contain("phone")
)

Xpath style:

scenario("succesfully view something") {
   val json = getMemberLiftJson(someId)
   assert((json \ "bar").extract[String] === "ExpectedString")
}

The Xpath style is readable and compact and we’ve retained quite a few of these type of tests in our code base. Personally I find the LINQ style quite confusing. I need to refer to the Lift project github site every time I try to use it. In Scala 2.8 there is now JSON support and we have instead been using this for parsing in the tests. Specifically, we have been using the parseFull method like so:

scenario("bar") {
    val (content, status) = ApiClient.get(someUri)
    JSON.parseFull(content) match {
        case Some(data : List[Any]) => {
            data.length should be(4)
            data should not contain("Internet")
        }
        case json => fail("Couldn't find a list in response %s".format(json))
    }
}

Go Scala 2.8!

Written by lizdouglass

November 1, 2010 at 9:48 pm

Posted in Uncategorized

Tagged with

Migrating from Maven to Simple Build Tool

leave a comment »

A while ago I moved our Scala project build from Maven to Simple Built Tool (sbt).

Why sbt?

  • sbt is made for Scala projects. The buildfile is written in Scala and is as concise as the Buildr ones that I have worked with previously.
  • sbt has several project types including the basic and web project types. Each has multiple build tasks/actions defined and all of these can be customised.
  • sbt has support for dependencies to be declared in either Ivy or Maven configuration files. All our dependencies were already specified in pom files, so not having to migrate these (at least straight away) made the transition to sbt easier.
  • sbt compiles fast. Robert first added sbt into our project so that he could compile the code more quickly than is possible in Intellij.
  • sbt has support for ScalaTest – the framework that we use for all our unit and integration tests. When we were running our ScalaTest tests as part of our Maven build we found that we needed to include the word ‘Test’ somewhere in the class name. Forgetting the requirement had cost one of our project team members several hours on one occasion.
  • We can now run both our webapp and integration tests at the same time. We’d found that it wasn’t possible to configure Maven to do this. We had instead started up a Jetty server from within our integration test project in order to run the web project.
  • sbt promotes faster development. Continuous compilation, testing and redeployment means that our work cycle is faster. This is particularly noticeable when we are working on a feature that requires us to make changes in both our Django front end and Scala backend. We can make changes in the source code in both and have both projects automatically redeploy.

Creating the sbt buildfile:

At the time of the migration to sbt our Scala backend was divided into 3 subprojects:

  • Core : A Scala project with ScalaTest unit tests
  • IntegrationTests – ScalaTest integration tests
  • Webapp : Basic webapp

Part 1: Declare all the sub-projects in the buildfile:

The first step in creating our build file was to declare all three subprojects. The core and integration subprojects are sbt DefaultProjects and have tasks such as compile and test defined. The webapp is an sbt WebappProject and has additional tasks such as jetty-run. Both the webapp and integration projects depend on the core project:

lazy val core = project("my-api-core", "Core", info => new DefaultProject(info))
lazy val webapp = project(webappProjectPath, "Webapp", info => new WebappProject(info), core)
lazy val integration = project("integration", "IntegrationTests", info => new DefaultProject(info), core)

Part 2: Getting the webapp running from sbt:

Although the webapp was declared as a Webapp project, it wasn’t possible to run it without declaring some additional Jetty dependencies. These were specified as inline sbt dependencies in the WebappProject class. This class extends from the sbt DefaultWebProject (please see below). Note that the port and context path can also be specified.

class WebappProject(info: ProjectInfo, port: Int) extends DefaultWebProject(info) {

    val jetty7Webapp = "org.eclipse.jetty" % "jetty-webapp" % "7.0.2.RC0" % "test"
    val jetty7Server = "org.eclipse.jetty" % "jetty-server" % "7.0.2.RC0" % "test"

    override def jettyPort = 8069
    override def jettyContextPath = "/my-api"
}

Part 3: Getting the integration tests running from sbt:

As mentioned above we need to run our webapp project at the same time as our integration test project. This is so that our integration tests can make calls to the webapp project endpoints. In addition to this our integration tests need to have a MongoDB database populated with some test data.

1) Populating the Mongo database for testing:

All our integration test classes extend from a common trait (below). This trait populates the MongoDB database at the start of each test:

trait JsonIntegrationTest extends FeatureSpec with BeforeAndAfterEach with ShouldMatchers {
    implicit val formats = new Formats {
        val dateFormat = DefaultFormats.lossless.dateFormat
    }

    override def beforeEach = {
        IntegrationDB.init
        IntegrationDB.eval(suiteData)
    }
}

The init method populates the database. The name of the MongoDB database is taken from a configuration file:

object IntegrationDB extends Assertions {
    private val config = new ConfigurationFactory getConfiguration("my-api")
    val dbName = config.getStringProperty("mongo.db")
    val db = new Mongo().getDB(dbName)

    def init = {
        val testDataFile: java.lang.String = "mongodb/IntegrationBaseData.js"
        val testDataFileInputStream = getClass.getClassLoader.getResourceAsStream(testDataFile)
        if (testDataFileInputStream != null) {
            eval(Source.fromInputStream(testDataFileInputStream).mkString)
        } else {
        fail("a message")
        }
    }

    def eval(js: String) = {
        db.eval(js)
    }
}

We are using The Guardian Configuration project to manage the config of the all of our Scala sub-projects. One of the features of this library is that it enables you to read properties from a number of sources. All of the properties for our projects are Service Domain Properties, meaning that they will be loaded from files on the classpath. The Configuration project library loads the properties from whichever correctly named file it encounters first on the classpath.

As mentioned above, our IntegrationTests project has a dependency on the Core project and therefore the config files of both projects will appear in the classpath. Both projects had a properties file with the same name that specified the mongo.db property. The intention was for the integration test properties to override those in the Core project. This did not work as planned because the ordering of the two config files in the classpath could not be guaranteed. Sbt does allow you to add items additional items to the classpath of your project using the +++ method. I did try to promote the integration test properties using this method (below). Unfortunately this did not guarantee classpath order either.

class IntegrationProject(info: ProjectInfo) extends DefaultProject(info) {
    val pathFinder: PathFinder = Path.lazyPathFinder(mainResourcesPath :: Nil)
    override def testClasspath = pathFinder +++ super.testClasspath
}

The work around was to set a system property called int.service.domain. This is used by the Guardian Configuration project to define the name of the properties file that that should be loaded from the classpath. Our integration test project now has a properties file with a different name to the one in the core project. The test action in the IntegrationTests project calls a method to switch to the integration test properties before the tests are run:

Now in the integration tests:

val useIntegrationConfig = true;
override def testAction = super.testAction dependsOn (task {setSystemProperty(useIntegrationConfig); None})

private def setSystemProperty(integrationConfiguration: Boolean) = {
    if (integrationConfiguration)System.setProperty("int.service.domain", integrationTestConfigDomain)
    else System.setProperty("int.service.domain", defaultConfigDomain)
}

2) Starting the main web application from the Integration test project:

As I mentioned above we’d found it hadn’t been possible to start up the webapp as well as run the integration tests using Maven. With sbt it is possible to do this. Another webapp project has was declared inside the definition of the IntegrationTests project. This sbt project has a separate output path and jetty port to the main webapp. This enables us to keep the main webapp running and run the integration tests at the same time.

class IntegrationProject(info: ProjectInfo) extends DefaultProject(info) {
    val useIntegrationConfig = true;

    lazy val localJettyPort = 8071
    lazy val localWebappOutputPath: Path = "target" / "localWebappTarget"
    lazy val localWebapp = project(webappProjectPath, "IntegrationTestWebapp",
        info => new WebappProject(info, localJettyPort, localWebappOutputPath, useIntegrationConfig), core)

    override def testAction = super.testAction dependsOn (task {setSystemProperty(useIntegrationConfig); None}, localWebapp.jettyRestart)
    lazy val startLocalWebapp = localWebapp.jettyRestart
    lazy val stopLocalWebapp = localWebapp.jettyStop
}

class WebappProject(info: ProjectInfo, port: Int, targetOutputDir: Path, useIntegrationTestConfiguration: Boolean) extends DefaultWebProject(info) {
    override def outputPath = if (targetOutputDir != "default") targetOutputDir else super.outputPath

    val jetty7Webapp = "org.eclipse.jetty" % "jetty-webapp" % "7.0.2.RC0" % "test"
    val jetty7Server = "org.eclipse.jetty" % "jetty-server" % "7.0.2.RC0" % "test"

    override def jettyPort = port
    override def jettyContextPath = "/my-api"
    override def jettyRunAction = super.jettyRunAction dependsOn (task {setSystemProperty(useIntegrationTestConfiguration); None})
}

Note that the testAction for the IntegrationTests project starts up the webapp but does not shut it down after the tests have finished. I did try several techniques for getting this to happen including the one below.  This attempt started jetty, stopped jetty then tried to run the tests. Please let me know if you have any better ideas 🙂

</span></span>
<pre>def startJettyAndTestAction = super.testAction dependsOn (localWebapp.jettyRun)
override def testAction = task{None} dependsOn (startJettyAndTestAction, localWebapp.jettyStop)

Written by lizdouglass

November 1, 2010 at 9:47 pm

Posted in Uncategorized

Tagged with , ,

MongoDB

with 2 comments

Recently I started on a project that is using some interesting technologies including Scala, MongoDB and Django. Some are quite new to me and I’ve learnt a great deal. Here are some observations of the things I’ve learnt.

MongoDB

Mongo is a schema-less document-oriented database that stores data in binary encoded JSON documents – BSON documents. The online documentation is quite good and there is a good tutorial and an online interactive shell.

How have we been using Mongo so far?

We have two projects that interact with with Mongo; one is a RESTful API back-end and the other is a tool for populating a Mongo database with data from a MySQL database. Both these projects are written in Scala and use the Mongo Java API.

Why Mongo?

– The JSON-like documents allows us to store data about in a way that is obvious because they read like plain English.  We need to store information about the users of our system. In nearly all cases, the information available is different for each person – some people have no contact phone numbers, others have 6 children. Mongo has allowed us to build up profiles for our users and only include the pieces of information that we actually have available for them.

– The schema-less and denormalised nature of the database means that we can modify the structure frequently. We only started this project a couple of weeks ago and have already made several quite large changes. These include how we organise the Mongo documents that we’ve been generating from a legacy MySQL database. This sort of flexibility is fantastic, especially at the beginning of a new project.

Populating the Mongo database

Our data migration project extracts data for each record in the MySQL database using a SQL query. This data is then used to create a Person domain object, which is composed of microtypes like the one below. The Person type, as well as all of these types, implement the ConvertableToMongo trait:

class NextOfKin(val relationship: Relationship, val person: Person) extends ConvertableToMongo {
  def toMongoObject: DBObject = {
    new BasicDBObject(Map(
      "relationship" -> relationship.toMongoObject,
      "person" -> person.toMongoObject).asJava)
}

where:

class Relationship(val description: String) extends ConvertableToMongo {
  def toMongoObject(): DBObject = {
    new BasicDBObject(Map("relationship" -> description).asJava)
  }
}

ConvertableToMongo is a trait:

trait ConvertableToMongo {
  def toMongoObject: DBObject
}

Note that we need to use the asJava method from the scala-javautils library convert the Scala map to the requisite Java map required by the Mongo API.

The ConvertableToMongo trait has a single method that returns a Mongo DBObject. These are inserted into a Mongo collection like this:

val usersCollection = mongo.getCollection("user")
members foreach(user  => usersCollection insert(user toMongoObject))

The end result is a Mongo document like this one:

{
	"_id" : ObjectId("4c29f7fdbe924173a47a759f"),
	"firstName" : "Joe",
	"surname" : "Bloggs",
	"gender" : "Male",
	"nextOfKin" : {
		"relationship" : "Son",
		"person" : {
			"name" : "John Bloggs"
		},
	},
}

Note that unless specified every document added to a Mongo collection will automatically be assigned an ObjectId with the key _id.

Written by lizdouglass

July 19, 2010 at 8:31 am

Posted in Uncategorized

Tagged with