Easy Scaling with Fleet and CoreOS

Apr 10 2015

One element of a successful production deployment is the ability to easily scale the number of instances your process is running. Many cloud providers, both on the PaaS and IaaS front, offer such functionality: AWS Auto Scaling Groups, Heroku’s process size, Marathon’s instance count. I was hoping for something similar in the CoreOS world. Deis, the PaaS-on-CoreOS service, offers Heroku-like scaling, but I don’t want to commit to the Deis layer nor its build pack approach (for no other reason than personal preference). Fleet, CoreOS’s distributed systemd service, offers service templating, but you cannot say “now run three instances of service x”. Being programmers we can do whatever we want, and luckily, we’re only a little bash script away from replicating the “scale to x instances” functionality of popular providers.

You’ll want to enable the Fleet HTTP Api for this script to work. You can easily port this to the Fleet CLI, but I much prefer the http api because it doesn’t involve ssh, and provides more versatility into how and where you run the script.

Conceptually the flow is straightforward:

Given a process we want to set the number of running instances to some desired_count.
If desired_count is less than current_count, scale down.
If desired_count is more than current_count, scale up.
If they are the same, do nothing.

Fleet offers service templating so you can have a service unit named [email protected] with specific copies named my_awesome_app@1, my_awesome_app@2, my_awesome_app@N representing specific running instances. Currently Fleet doesn’t offer a way to group these related services together but we can easily pattern match on the service name to control specific running instances. The steps are straightforward:

Query the Fleet API for all instances
Filter by all services matching the specified name
See how many instances we have running for the given service
Destroy or create instances using specific service names until we match the desired_size.

All of these steps are easily achievable with Fleet’s HTTP Api (or fleetctl) and a little bash. To give our script some context, let’s start with how we want to use the script. Ideally it will look like this:

./scale-fleet my_awesome_app 5

First, let’s set up our script scale-fleet and set the command line arguments:

#!/bin/bash

FLEET_HOST=<YOUR FLEET API HOST>

# You may want to consider cli flags 
SERVICE_NAME=$1
DESIRED_SIZE=$2

Next we want to query the Fleet API and filter on all units with a prefix of SERVICE_NAME which have a process number. This will give us an array of units matching [email protected], not the base template of [email protected]. These are the units we will either add to or destroy as appropriate. The latest 1.5 version of jq supports regex expressions, but as of this writing 1.4 is the common release version, so we’ll parse the json response with jq, and then filter with grep. Finally some bash trickery will parse the result into an array we can loop through later.

# Curls the API and filter on a specific pattern, storing results in an array
INSTANCES=($(curl -s $FLEET_HOST/fleet/v1/units | jq ".units[].name | select(startswith(\"$SERVICE@\"))" | grep '\w@\d\.service'))

# A bash trick to get size of array
CURRENT_SIZE=${#INSTANCES[@]}
echo "Current instance count for $SERVICE is: $CURRENT_SIZE"

Next let’s scaffold the various scenarios for matching CURRENT_SIZE with DESIRED_SIZE, which boils down to some if statements.

if [[ $DESIRED_SIZE = $CURRENT_SIZE ]]; then
  echo "doing nothing, current size is equal desired size"
elif [[ $DESIRED_SIZE < $CURRENT_SIZE ]]; then
  echo "going to scale down instance $CURRENT_SIZE"
  # More stuff here
else 
  echo "going to scale up to $DESIRED_SIZE"
  # More stuff here
fi

When the desired size equals the current size we don’t need to do anything. Scaling down is easy, we simply loop, deleting the specific instance, until the desired and current states match. You can drop in the following snippet for scaling down:

until [[ $DESIRED_SIZE = $CURRENT_SIZE ]]; do
    curl -X DELETE $FLEET_HOST/fleet/v1/units/${SERVICE}@${CURRENT_SIZE}.service

    let CURRENT_SIZE = CURRENT_SIZE-1
  done
  echo "new instance count is $CURRENT_SIZE"

Scaling up is a bit trickier. Unfortunately you can’t simply create a new unit from a template like you can with the fleetctl CLI. But you can do exactly what the fleetctl does: copy the body from the base template and create a new one with the specific full unit name. With the body we can loop, creating instances, until our current size matches the desired size. Let’s walk it through step-by-step:

echo "going to scale up to $desired_size"
 # Get payload by parsing the options field from the base template
 # And build our new payload for PUTing later
 payload=`curl -s $FLEET_HOST/fleet/v1/units/${SERVICE}@.service | jq '. | { "desiredState":"launched", "options": .options }'`

 #Loop, PUTing our new template with the appropriate name
 until [[ $DESIRED_SIZE = $CURRENT_SIZE ]]; do
   let current_size=current_size+1

   curl -X PUT -d "${payload}" -H 'Content-Type: application/json' $FLEET_HOST/fleet/v1/units/${SERVICE}@${CURRENT_SIZE}.service 
 done
 echo "new instance count is $CURRENT_SIZE"

With our script in place we can scale away:

# Scale up to 5 instances
$ ./scale-fleet my_awesome_app 5

# Scale down
$ ./scale-fleet my_awesome_app 3

Because this all comes down to a simple bash script you can easily run it from a variety of places. It can be part of a parameterized Jambi job to scale manually with a UI, part of an envconsul setup with a key set in Consul, or it can fit into a larger script that reads performance characteristics from some monitoring tool and reacts accordingly. You can also combine this with AWS Cloudformation or another cloud provider: if you’re CPU’s hit a certain threshold, you can scale the specific worker role running your instances, and have your desired_size be some factor of that number.

I’ve been on a bash kick lately. It’s a versatile scripting language that easily portable. The syntax can be somewhat mystic, but as long as you have a shell, you have all you need to run your script.

The final, complete script is here:

Adventures in HttpContext All the stuff after 'Hello, World'

Easy Scaling with Fleet and CoreOS