Go Style Directory Layout for Scala with SBT

Dec 7 2014

I’ve come to appreciate Go’s directory layout where test and build files are located side-by-side. This promotes a conscience testing priority. It also enables easy navigation to usage of a particular class/trait/object along with the implementation. After reading through the getting-better-every-day sbt documentation I noticed you can easily change default directories for sources, alleviating the folder craziness of default projects. Simply add a few lines to your build.sbt:

//Why do I need a Scala folder? I don't!
//Set the folder for Scala sources to the root "src" folder
scalaSource in Compile := baseDirectory.value / "src"

//Do the same for the test configuration. 
scalaSource in Test := baseDirectory.value / "src"

//We'll suffix our test files with _test, so we can exclude
//then from the main build, and keep the HiddenFileFilter
excludeFilter in (Compile, unmanagedSources) := HiddenFileFilter || "*_test.scala"

//And we need to re-include them for Tests 
excludeFilter in (Test, unmanagedSources) := HiddenFileFilter

Although breaking from the norm of java-build tools may cause confusion, if you like the way something works, go for it; don’t chain yourself to past practices. I never understood the class-to-file relationship of java sources, and I absolutely hate navigating one-item folders. Thankfully Scala improved the situation, but the sbt maven-like defaults are still folder-heavy. IDEs make the situation easier, but I prefer simple text editors; and to paraphrase Dan North, “Your fancy IDE is a painkiller for your shitty language”.

Running Consul on CoreOS

Nov 29 2014

I’m a big fan of Consul, Hashicorp’s service discovery tool. I’ve also become a fan of CoreOS, the cluster framework for running docker containers. Even though CoreOS comes with etcd for service discovery I find the feature set of Consul more compelling. And as a programmer I know I can have my cake and eat it too.

My first take was to modify my ansible-consul fork to run consul natively on CoreOS. Although this could work I find it defeats CoreOS’s container-first approach with fleet. Jeff Lindsay created a consul docker container which does the job well. I created two fleet service files: one for launching the consul container and another for service discovery. At first the service discovery aspect seemed weird; I tried to pass ip addresses via the –join parameter or use ExecStartPost for running the join command. However I took a cue from the CoreOS cluster setup: sometimes you need a third party to get stuff done. In this case we the built in etcd server to manage the join ip address to kickstart the consul cluster.

The second fleet service file acts as a sidekick:

For every running consul service there’s a sidekick process
The sidekick process writes the current IP to a key only if that key doesn’t exist
The sidekick process uses the value of that key to join the cluster with docker exec
The sidekick process removes the key if the consul service dies.

The two service files are below, but you should tweak for your needs.

You only need a 3 or 5 node server cluster. If your CoreOS deployment is large, use some form of restriction for the server nodes. You can do the same for the client nodes.
The discovery script could be optimized. It will try and join whatever ip address is listed in the key. This avoids a few split brain scenarios, but needs to be tested.
If you want DNS to work properly you need to set some Docker daemon options. Read the docker-consul README.

Clustering Akka Applications with Docker — Version 3

Nov 27 2014

The SBT Native Packager plugin now offers first-class Docker support for building Scala based applications. My last post involved combining SBT Native Packager, SBT Docker, and a custom start script to launch our application. We can simplify the process in two ways:

Although the SBT Docker plugin allows for better customization of Dockerfiles it’s unnecessary for our use case. SBT Native Packager is enough.
A separate start script was required for IP address inspection so TCP traffic can be routed to the actor system. I recently contributed an update for better ENTRYPOINT support within SBT Native Packager which gives us options for launching our app in a container.

With this PR we can now add our IP address inspection snippet to our build removing the need for extraneous files. We could have added this snippet to bashScriptExtraDefines but that is a global change, requiring /sbin/ifconfig eth0 to be available wherever the application is run. This is definitely infrastructure bleed-out and must be avoided.

The new code, on GitHub, uses a shell with ENTRYPOINT exec mode to set our environment variable before launching the application:

dockerExposedPorts in Docker := Seq(1600)

dockerEntrypoint in Docker := Seq("sh", "-c", "CLUSTER_IP=`/sbin/ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1 }'` bin/clustering $*")

The $* allows for command-line parameters to be honored when launching the container. Because the app leverages the Typesafe Config library we can also set via Java system properties:

docker run --rm -i -t --name seed mhamrah/clustering:0.3 -Dclustering.cluster.name=example-cluster

Launching the cluster is exactly as before:

docker run --rm -d --name seed mhamrah/clustering:0.3
docker run --rm -d --name member1 --link seed:seed mhamrah/clustering:0.3

For complex scripts it may be too messy to overload the ENTRYPOINT sequence. For those cases simply bake your own docker container as a base and use the ENTRYPOINT approach to call out to your script. SBT Native Packager will still upload all your dependencies and its bash script to /opt/docker/bin/<your app>. The Docker WORKDIR is set to /opt/docker so you can drop the /opt/docker as above.

Accelerate Team Development with your own SBT Plugin Defaults

Oct 13 2014

My team manages several Scala services built with SBT. The setup of these projects are very similar, from included plugins, dependencies, and build-and-deploy configurations. At first we simply copied and paste these settings across projects but as the number of services increased the hunt-and-change strategy became laborious. Time to optimize.

I heard of a few teams that created their own sbt plugins for default settings but couldn’t find information on how this looked. The recent change to AutoPlugins also didn’t help existing documentation. I found Will Sargent’s excellent post on writing an sbt plugin helpful but it wasn’t what I was looking for. I want a plugin which included other plugins and set defaults for those plugins. The goal is to “drop in” this plugin and automatically have a set of defaults: using sbt-native-packager, a configured sbt-release and our nexus artifact server good-to-go.

File Locations

As an sbt refresher anything in the project/ folder relates to the build. If you want to develop your own plugin just for the current project you can simply add your .scala files to project/. If you want to develop your own plugin as a standalone project you put those files in the src/ directory as usual. I mistakenly thought an sbt plugin project only required files in the project/ folder. Silly me.

SBT Builds

It’s important to note that the project folder–and the build itself–is separate from how your source code is built. SBT uses Scala 2.10, so anything in the project/ folder will be built against 2.10 even if your project is set to 2.11. Thus when developing your plugin use Scala 2.10 to match sbt.

Dependencies

Usually when you include a plugin you specify it in the project/plugins.sbt, right? But what if you’re developing a plugin that uses other plugins? Your code is in src/ so it won’t pick up anything in project/ as that only relates to your build. So you need to add whatever plugin you want as a dependency in your build so its available in within your project, just like any other dependency. But there’s a trick with sbt plugins. Originally I had the usual in build.sbt:

libraryDependencies += "com.typesafe.sbt" % "sbt-native-packager" % "0.8.0-M2"

but kept getting unresolved dependency errors. This made no sense to me as the plugin is clearly available. It turns out if you want to include an sbt plugin as a project dependency you need to specify it in a special way, explicitly setting the sbt and scala version you want:

libraryDependencies += sbtPluginExtra("com.typesafe.sbt" % "sbt-native-packager" % "0.8.0-M2", sbtV = "0.13", scalaV = "2.10")

With that, your dependency will resolve and you can use include anything under sbt-native-packager when developing your plugin.

Specifying your Plugin Defaults

With your separate project and dependencies satisfied you can now create your plugin which uses other plugins and defaults settings specific to you. This part is easy and follows the usual documentation. Declare an object which extends AutoPlugin and override projectSettings or buildSettings. This class looks exactly like it would if you were setting things manually in your build.

For instance, here’s how we’d set the java_server archetype as the default in our plugin:

package com.hamrah.plugindefaults

import sbt._
import Keys._
import com.typesafe.sbt.SbtNativePackager._

object PluginDefaults extends AutoPlugin {
 override lazy val projectSettings = packageArchetype.java_server
}

You can concatenate any other settings you want to project settings, like scalaVersion, scalacOptions, etc.

Using the Plugin

You can build and publish your plugin to a repo and include it like you would any other plugin. Or you can include it locally for testing by putting this in your sbt file:

lazy val root = project.in( file(".") ).dependsOn( defaultPluginSettings )
lazy val defaultPluginSettings = uri("file:///<full path to your plugin directory>")

Your default settings can be explicitly added to your project if not automatically imported with a simple:

PluginDefaults.projectSettings
//or
settings = PluginDefaults.projectSettings // in a .scala file

In Closing

As an FYI there could be better ways to do this. A lot of the above was trial and error, but works. If you have feedback or better suggestions please leave a comment!

Accessing the Docker Host Server Within a Container

Jun 29 2014

Docker links are a great way to link two containers together but sometimes you want to know more about the host and network from within a container. You have a couple of options:

You can access the Docker host by the container’s gateway.
You can access the Docker host by its ip address from within a container.

The Gateway Approach

This GitHub Issue outlines the solution. Essentially you’re using netstat to parse the gateway the docker container uses to access the outside world. This is the docker0 bridge on the host.

As an example, we’ll run a simple docker container which returns the hostname of the container on port 8080:

docker run -d -p 8080:8080 mhamrah/mesos-sample

Next we’ll run /bin/bash in another container to do some discovery:

docker run -i -t ubuntu /bin/bash
#once in, install curl:
apt-get update
apt-get install -y curl

We can use the following command to pull out the gateway from netstat:

netstat -nr | grep '^0\.0\.0\.0' | awk '{print $2}'
#returns 172.17.42.1 for me.

We can then curl our other docker container, and we should get that docker container’s hostname:

curl 172.17.42.1:8080
# returns 00b019ce188c

Nothing exciting, but you get the picture: it doesn’t matter that the service is inside another container, we’re accessing it via the host, and we didn’t need to use links. We just needed to know the port the other service was listening on. If you had a service running on some other port–say Postgres on 5432–not running in a Docker container–you can access it via 172.17.42.1:5432.

If you have docker installed in your container you can also query the docker host:

# In a container with docker installed list other containers running on the host for other containers:
docker -H tcp://172.17.42.1:2375 ps
CONTAINER ID        IMAGE                         COMMAND                CREATED              STATUS              PORTS                     NAMES
09d035054988        ubuntu:14.04                  /bin/bash              About a minute ago   Up About a minute   0.0.0.0:49153->8080/tcp   angry_bardeen
00b019ce188c        mhamrah/mesos-sample:latest   /opt/delivery/bin/de   8 minutes ago        Up 8 minutes        0.0.0.0:8080->8080/tcp    suspicious_colden

You can use this for some hakky service-discovery.

The IP Approach

The gateway approach is great because you can figure out a way to access a host from entirely within a container. You also have the same access via the host’s ip address. I’m using boot2docker, and the boot2docker ip address is 192.168.59.103 and I can accomplish the same tasks as the gateway approach:

# Docker processes, via ip:
docker -H tcp://192.168.59.103:2375 ps
# Other docker containers, via ip:
curl 192.168.59.103:8080

Although there’s no way to introspect the host’s ip address (AFAIK) you can pass this in via an environment variable:

docker@boot2docker:~$  docker run -i -t -e DOCKER_HOST=192.168.59.103 ubuntu /bin/bash
root@07561b0607f4:/# env
HOSTNAME=07561b0607f4
DOCKER_HOST=192.168.59.103
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/

If the container knows the ip address of its host, you can broadcast this out to other services via the container’s application. Useful for service discovery tools run from within a container where you want to tell others the host IP so others can find you.

Service Discovery Options with Marathon and Deimos

Jun 29 2014

I’ve become a fan of Mesos and Marathon: combined with Deimos you can create a DIY PaaS for launching and scaling Docker containers across a number of nodes. Marathon supports a bare-bones service-discovery mechanism through its task API, but it would be nice for containers to register themselves with some service discovery tool themselves. In order to achieve this containers need to know their host ip address and the port Marathon assigned them so they could tell other interested services where they can be found.

Deimos allows default parameters to be passed in when executing docker run and Marathon adds assigned ports to a container’s environment variables. If a container has this information it can register it with a service discovery tool.

Here we assign the host’s IP address as a default run option in our Deimos config file.

#/etc/deimos.cfg
[containers.options]
append: ["-e", "HOST_IP=192.168.33.12"]

Now let’s launch our mesos-sample container to our Mesos cluster via Marathon:

// Post to http://192.168.33.12/v2/apps
{
  "container": {
    "image": "docker:///mhamrah/mesos-sample"
  },
  "cpus": "1",
  "id": "www",
  "instances": 1,
  "mem": 512,
  "ports": [0],
  "uris": [],
  "cmd": ""
}

Once our app is launch, we can inspect all the environment variables in our container with the /env endpoint from mhamrah/mesos-sample:

curl http://192.168.33.12:31894/env
[ {
  "HOSTNAME" : "a4305981619d"
}, {
  "PORT0" : "31894"
}, {
  "PATH" : "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
}, {
  "PWD" : "/tmp/mesos-sandbox"
}, {
  "PORTS" : "31894"
}, {
  "HOST_IP" : "192.168.33.12"
}, {
  "PORT" : "31894"
}]

With this information some startup script could use the PORT (or PORT0) and HOST_IP to register itself for direct point-to-point communication in a cluster.

Setting up a Multi-Node Mesos Cluster running Docker, HAProxy and Marathon with Ansible

Jun 26 2014

UPDATE

With Mesos 0.20 Docker support is now native, and Deimos has been deprecated. The ansible-mesos-playbook has been updated appropriately, and most of this blog post still holds true. There are slight variations with how you post to Marathon.

The Google Omega Paper has given birth to cloud vNext: cluster schedulers managing containers. You can make a bunch of nodes appear as one big computer and deploy anything to your own private cloud; just like Docker, but across any number of nodes. Google’s Kubernetes, Flynn, Fleet and Apache Mesos, originally from Twitter, are implementations of Omega with the goal of abstracting away discrete nodes and optimizing compute resources. Each implementation has its own tweak, but they all follow the same basic setup: leaders, for coordination and scheduling; some service discovery component; some underlying cluster tool (like Zookeeper); followers, for processing.

In this post we’ll use Ansible to install a multi-node Mesos cluster using packages from Mesosphere. Mesos, as a cluster framework, allows you to run a variety of cluster-enabled software, including Spark, Storm and Hadoop. You can also run Jenkins, schedule tasks with Chronos, even run ElasticSearch and Cassandra without having to double to specific servers. We’ll also set up Marathon for running services with Deimos support for Docker containers.

Mesos, even with Marathon, doesn’t offer the holistic integration of some other tools, namely Kubernetes, but at this point it’s easier to set up on your own set of servers. Although young Mesos is one of the oldest projects of the group and allows more of a DIY approach on service composition.

TL;DR

The playbook is on github, just follow the readme!. If you want to simply try out Mesos, Marathon, and Docker mesosphere has an excellent tutorial to get you started on a single node. This tutorial outlines the creation of a more complex multi-node setup.

System Setup

The system is divided into two parts: a set of masters, which handle scheduling and task distribution, with a set of slaves providing compute power. Mesos uses Zookeeper for cluster coordination and leader election. A key component is service discovery: you don’t know which host or port will be assigned to a task, which makes, say, accessing a website running on a slave difficult. The Marathon API allows you to query task information, and we use this feature to configure HAProxy frontend/backend resources.

Our masters run:

Zookeeper
Mesos-Master
HAProxy
Marathon

and our slaves run:

Mesos-Slave
Docker
Deimos, the Mesos -> Docker bridge

Ansible

Ansible works by running a playbook, composed of roles, against a set of hosts, organized into groups. My Ansible-Mesos-Playbook on GitHub has an example hosts file with some EC2 instances listed. You should be able to replace these with your own EC2 instances running Ubuntu 14.04, our your own private instances running Ubuntu 14.04. Ansible allows us to pass node information around so we can configure multiple servers to properly set up our masters, zookeeper set, point slaves to masters, and configure Marathon for high availability.

We want at least three servers in our master group for a proper zookeeper quorum. We use host variables to specify the zookeeper id for each node.

[mesos_masters]
ec2-54-204-214-172.compute-1.amazonaws.com zoo_id=1
ec2-54-235-59-210.compute-1.amazonaws.com zoo_id=2
ec2-54-83-161-83.compute-1.amazonaws.com zoo_id=3

The mesos-ansible playbook will use nodes in the mesos_masters for a variety of configuration options. First, the /etc/zookeeper/conf/zoo.cfg will list all master nodes, with /etc/zookeeper/conf/myid being set appropriately. It will also set up upstart scripts in /etc/init/mesos-master.conf, /etc/init/mesos-slave.conf with default configuration files in /etc/defaults/mesos.conf. Mesos 0.19 supports external executors, so we use Deimos to run docker containers. This is only required on slaves, but the configuration options are set in the shared /etc/defaults/mesos.conf file.

Marathon and HAProxy

The playbook leverages an ansible-marathon role to install a custom build of marathon with Deimos support. If Mesos is the OS for the data center, Marathon is the init system. Marathoin allows us to http post new tasks, containing docker container configurations, which will run on Mesos slaves. With HAProxy we can use the masters as a load balancing proxy server routing traffic from known hosts (the masters) to whatever node/port is running the marathon task. HAProxy is configured via a cron job running a custom bash script. The script queries the marathon API and will route to the appropriate backend by matching a host header prefix to the marathon job name.

Mesos Followers (Slaves)

The slaves are pretty straightforward. We don’t need any host variables, so we just list whatever slave nodes you’d like to configure:

[mesos_slaves]
ec2-54-91-78-105.compute-1.amazonaws.com
ec2-54-82-227-223.compute-1.amazonaws.com

Mesos-Slave will be configured with Deimos support.

The Result

With all this set up you can set up a wildcard domain name, say *.example.com, to point to all of your master node ip addresses. If you launch a task like “www” you can visit www.example.com and you’ll hit whatever server is running your application. Let’s try launching a simple web server which returns the docker container’s hostname:

POST to one of our masters:

POST /v2/apps

{
  "container": {
    "image": "docker:///mhamrah/mesos-sample"
  },
  "cpus": ".25",
  "id": "www",
  "instances": 4,
  "mem": 512,
  "ports": [0],
  "uris": []
}

We run four instances allocating 25% of a cpu with an application name of www. If we hit www.example.com, we’ll get the hostname of the docker container running on whatever slave node is hosting the task. Deimos will inspect whatever ports are EXPOSEd in the docker container and assign a port for Mesos to use. Even though the config script only works on port 80 you can easily reconfigure for your own needs.

To view marathon tasks, simply go to one of your master hosts on port 8080. Marathon will proxy to the correct master. To view mesos tasks, navigate to port 5050 and you’ll be redirected to the appropriate master. You can also inspect the STDOUT and STDERR of Mesos tasks.

Notes

In my testing I noticed, on rare occasion, the cluster didn’t have a leader or marathon wasn’t running. You can simply restart zookeeper, mesos, or marathon via ansible:

#Restart Zookeeper
ansible mesos_masters -a "sudo service zookeeper restart"

There’s a high probability something won’t work. Check the logs, it took me a while to get things working: grepping /var/log/syslog will help, along with /var/log/upstart/mesos-master.conf, mesos-slave.conf and marathon.conf, along with the /var/log/mesos/.

What’s Next

Cluster schedulers are an exciting tool for running production applications. It’s never been easier to build, package and deploy services on public, private clouds or bare metal servers. Mesos, with Marathon, offers a cool combination for running docker containers–and other mesos-based services–in production. This Twitter U video highlights how OpenTable uses Mesos for production. The HAProxy approach, albeit simple, offers a way to route traffic to the correct container. HAProxy will detect failures and reroute traffic accordingly.

I didn’t cover inter-container communication (say, a website requiring a database) but you can use your service-discovery tool of choice to solve the problem. The Mesos-Master nodes provide good “anchor points” for known locations to look up stuff; you can always query the marathon api for service discovery. Ansible provides a way to automate the install and configuration of mesos-related tools across multiple nodes so you can have a serious mesos-based platform for testing or production use.

Akka Clustering with SBT-Docker and SBT-Native-Packager

Jun 19 2014

Since my last post on akka clustering with docker containers a new plugin, SBT-Docker, has emerged which allows you to build docker containers directly from SBT. I’ve updated my akka-docker-cluster-example to leverage these two plugins for a smoother docker build experience.

One Step Build

The approach is basically the same as the previous example: we use SBT Native Packager to gather up the appropriate dependencies, upload them to the docker container, and create the entrypoint. I decided to keep the start script approach to “prep” any environment variables required before launching. With SBT Docker linked to Native Packager all you need to do is fire

docker

from sbt and you have a docker container ready to launch or push.

Understanding the Build

SBT Docker requires a dockerfile defined in your build. I want to pass in artifacts from native packager to docker. This allows native packager to focus on application needs while docker is focused on docker. Docker turns into just another type of package for your app.

We can pass in arguments by mapping the appropriate parameters to a function which returns the Dockerfile. In build.spt:

// Define a dockerfile, using parameters from native-packager
dockerfile in docker <<= (name, stagingDirectory in Universal) map {
  case(appName, stageDir) =>
    val workingDir = s"/opt/${appName}"
    new Dockerfile {
      //use java8 base box
      from("relateiq/oracle-java8")
      maintainer("Michael Hamrah")
      //expose our akka port
      expose(1600)
      //upload native-packager staging directory files
      add(stageDir, workingDir)
      //make files executable
      run("chmod", "+x", s"/opt/${appName}/bin/${appName}")
      run("chmod", "+x", s"/opt/${appName}/bin/start")
      //set working directory
      workDir(workingDir)
      //entrypoint into our start script
      entryPointShell(s"bin/start", appName, "$@")
    }
}

Linking SBT Docker to SBT Native Packager

Because we’re relying on Native Packager to assemble our runtime dependencies we need to ensure the native packager files are “staged” before docker tries to upload them. Luckily it’s easy to create dependencies with SBT. We simply have docker depend on the native packager’s stage task:

docker <<= docker.dependsOn(com.typesafe.sbt.packager.universal.Keys.stage.in(Compile))

Adding Additional Files

The last step is to add our start script to the native packager build. Native packager has a mappings key where we can add files to our package. I kept the start script in the docker folder and I want it in the bin directory within the docker container. Here’s the mapping:

mappings in Universal += baseDirectory.value / "docker" / "start" -> "bin/start"

With this setting everything will be assembled as needed and we can package to any type we want. Setting up a cluster with docker is the same as before:

docker run --name seed -i -t clustering
docker run --name c1 -link seed:seed -i -t clustering

It’s interesting to note SBT Native Packager also has docker support, but it’s undocumented and doesn’t allow granular control over the Dockerfile output. Until SBT Native Packager fully supports docker output the SBT Docker plugin is a nice tool to package your sbt-based apps.

Spray Directives: Creating Your Own, Simple Directive

May 24 2014

The spray-routing package provides an excellent dsl for creating restful api’s with Scala and Akka. This dsl is powered by directives, small building blocks you compose to filter, process and compose requests and responses for your API. Building your own directives lets you create reusable components for your application and better organize your application.

I recently refactored some code in a Spray API to leverage custom directives. The Spray documentation provides a good reference on custom directives but I found myself getting hung up in a few places.

As an example we’re going to write a custom directive which produces a UUID for each request. Here’s how I want to use this custom directive:

generateUUID { uuid =>
  path("foo") {
   get {
     //log the uuid, pass it to your app, or maybe just return it
     complete { uuid.toString }
   }
  }
}

Usually you leverage existing directives to build custom directives. I (incorrectly) started with the provide directive to provide a value to an inner route:

import spray.routing._
import java.util.UUID
import Directives._

trait UuidDirectives {
  def generateUuid: Directive1[UUID] = {
    provide(UUID.randomUUID)
  }
}

Before I explain what’s wrong, let’s dig into the code. First, generateUuid is a function which returns a Directive1 wrapping a UUID value. Directive1 is just a type alias for Directive[UUID :: HNil]. Directives are centered around a feature of the shapeless library called heterogeneous lists, or HLists. An HList is simply a list, but each element in the list can be a different, specific type. Instead of a generic List[Any], your list can be composed of specific types of list of String, Int, String, UUID. The first element of this list is a String, not an Any, and the second is an Int, with all the features of an Int. In the directive above I just have an HList with one element: UUID. If I write Directive[UUID :: String :: HNil] I have a two element list of UUID and String, and the compiler will throw an error if I try to use this directive with anything other a UUID and a String. HLists sound like a lightweight case class, but with an HList, you get a lot of list-like features. HLists allow the compiler to do the heavy lifting of type safety, so you can have strongly-typed functions to compose together.

Provide is a directive which (surprise surprise) will provide a value to an inner route. I thought this would be perfect for my directive, and the corresponding test ensures it works:

import org.scalatest._
import org.scalatest.matchers._
import spray.testkit.ScalatestRouteTest
import spray.http._
import spray.routing.Directives._

class UuidDirectivesSpec
  extends FreeSpec
  with Matchers
  with UuidDirectives
  with ScalatestRouteTest {

  "The UUID Directive" - {
    "can generate a UUID" in {
      Get() ~> generateUuid { uuid => complete(uuid.toString) } ~> check  {
        responseAs[String].size shouldBe 36
      }
    }
  }
}

But there’s an issue! Spray directives are classes are composed when instantiated via an apply() function. The Spray docs on understanding the dsl structure explains it best, but in summary, generateUuid will only be called once when the routing tree is built, not on every request.

A better unit test shows the issue:

"will generate different UUID per request" in {
      //like the runtime, instantiate route once
      val uuidRoute =  generateUuid { uuid => complete(uuid.toString) }

      var uuid1: String = ""
      var uuid2: String = ""
      Get() ~> uuidRoute ~> check  {
        responseAs[String].size shouldBe 36
        uuid1 = responseAs[String]
      }
      Get() ~> uuidRoute ~> check  {
        responseAs[String].size shouldBe 36
        uuid2 = responseAs[String]
      }
      //fails!
      uuid1 shouldNot equal (uuid2)
    }
  }

The fix is simple: we need to use the extract directive which applies the current RequestContext to our route so it’s called on every request. For our UUID directive we don’t need anything from the request, just the function which is run for every request:

trait UuidDirectives {
  def generateUuid: Directive[UUID :: HNil] = {
    extract(ctx =>
        UUID.randomUUID)
  }
}

With our randomUUID call wrapped in an extract directive we have a unique call per request, and our tests pass!

In a following post we’ll add some more complexity to our custom directive, stay tuned!

Spray Directives: Custom Directives, Part Two: flatMap

May 24 2014

Our last post covered custom Spray Directives. We’re going to expand our UUID directive a little further. Generating a unique ID per request is great, but what if we want the client to pass in an existing unique identifier to act as a correlation id between systems?

We’ll modify our existing directive by checking to see if the client supplied a correlation-id request-header using the existing optionalHeaderValueByName directive:

def generateUuid: Directive[UUID :: HNil] = {
    optionalHeaderValueByName("correlation-id") {
      case Some(value) => provide(UUID.fromString(value))
      case None => provide(UUID.randomUUID)
    }
  }

Unfortunately this code doesn’t compile! We get an error because Spray is looking for a Route, which is a function of RequestContext => Unit:

[error]  found   : spray.routing.Directive1
[error]     (which expands to)  spray.routing.Directive[shapeless.::]
[error]  required: spray.routing.RequestContext => Unit
[error]       case Some(value) => provide(UUID.fromString(value))

What do we do? flatMap comes to the rescue. Here’s the deal: we need to transform one directive (optionalHeaderValueByName) into another directive (one that provides a UUID). We do this by using flatMap to focus on the value in the first directive (the option returned from optionalHeaderValueByName) and return another value (the UUID). With flatMap we are basically “repackaging” one value into another package.

Here’s the updated code which properly compiles:

def generateUuid: Directive[UUID :: HNil] = {
    //use flatMap to match on the Option returned and provide
    //a new value
    optionalHeaderValueByName("correlation-id").flatMap {
      case Some(value) => provide(UUID.fromString(value))
      case None => provide(UUID.randomUUID)
    }
  }

and the test:

"can extract a uuid value from the header" in {
      val uuid = java.util.UUID.randomUUID.toString

      Get() ~> addHeader("correlation-id", uuid) ~> uuidRoute ~> check {
        responseAs[String] shouldEqual uuid
      }
    }

There’s a small tweak we’ll make to our UUID directive to show another example of directive composition. If the client doesn’t supply a UUID, and we call generateUUID multiple times, we’ll get different uuids for the same request. This defeats the purpose of a single correlation id, and prevents us from extracting a uuid multiple times per request. A failing test shows the issue:

"can extract the same uuid twice per request" in {
      var uuid1: String =""
      var uuid2: String = ""
      Get() ~> generateUuid { uuid =>
        {
          uuid1 = uuid.toString
          generateUuid { another =>
            uuid2 = another.toString
            complete("")
          }
        }
      } ~> check {
        //fails
        uuid1 shouldEqual uuid2
      }
    }

To fix the issue, if we generate a UUID, we will add it to the request header as if the client supplied it. We’ll use the mapRequest directive to add the generated UUID to the header.

def generateUuid: Directive[UUID :: HNil] = {
    optionalHeaderValueByName("correlation-id").flatMap {
      case Some(value) => provide(UUID.fromString(value))
      case None =>
        val id = UUID.randomUUID
        mapRequest(r => r.withHeaders(r.headers :+ RawHeader("correlation-id", id.toString))) & provide(id)
    }
  }

In my first version I had the mapRequest call and the provide call on separate lines (there was no &). mapRequest was never being called, and it was because mapRequest was not being returned as a value- only the provide directive is returned. We need to “merge” these two directives with the & operator. mapRequest is a no-op returning a Directive0 (a Directive with a Nil HList) so combining it with provide yields a Directive1[UUID], which is exactly what we want.

Older Newer

Adventures in HttpContext All the stuff after 'Hello, World'