guv: Automatic scaling of Heroku workers

At The Grid we do a lot of computationally heavy work server-side, in order to produce websites from user-provided content. This includes image analytics (for understanding the content), constraint solving (for page layout) and image processing (optimization and filtering to achieve a particular look). Currently we serve some thousand sites, with some hundred thousands sites expected by the time we’ve completed beta – so scalability is a core concern.

All computationally intensive work is put as jobs in a AMQP/RabbitMQ message queue, which are consumed by Heroku workers. To make it easy to manage many queues and worker roles we also use MsgFlo.
This provides us with the required flexibility to scale: the queues buffer the in-progress work, broker distributes evenly between available workers, and with Heroku we can change number of workers with one command. But, it still leaves us with the decision on how much compute capacity to provision. And when load is dynamic, it is tedious & inefficient to do it manually – especially as Heroku bills workers used by the second.

RabbitMQ and Heroku dashboards

Monitoring RabbitMQ queues and scaling Heroku workers manually when demand changes; not fun.

If we would instead regulate this every 1-5 minute based on demand, we would reduce costs. Or alternatively, with a fixed budget, provide a better quality-of-service. And most importantly, let developers worry about other things.

Of course, there already exists a number of solutions for this. However, some used particular metrics providers which we were not using, some used metrics with unclear relationship to required workers (like number of users), or had unacceptable limitations (only one worker per service, only run as a service with pay-by-number-of-workers).

guv

guv 0.1 implements a simple proportional scaling model. Based the current number of jobs in the queue, and an estimate of job processing time – it calculates the number of workers required for all work to be completed within a configured deadline.

guv system model

The deadline is the maximum time you allow for your users to wait for a completed job. The job processing time [average, deviation] can be calculated from metrics of previous jobs. And the number of jobs in queue is read directly from RabbitMQ.

# A simple guv config for one worker role.
# One guv instance typically manages many worker roles
'*':
  app: my-heroku-app
analyze:
  queue: 'analyze.IN' # RabbitMQ queue name
  worker: analyzeworker # Heroku dyno role name
  process: 20
  deadline: 120.0
  min: 1 # keep something always running
  max: 15 # budget limits

Now there are a couple limitations of this model. Primarily, it is completely reactive; we do not attempt to predict how traffic will develop in the future. Prediction is after all terribly tricky business – better not go there if it can be avoided.
And since it takes a non-zero amount of time to spin up a new worker (about 45-60 seconds), on a sudden spike in demand may cause some jobs to miss a tight deadline, as the workers can’t spin up fast enough. To compensate for this, there is some simple hysteresis: scale up more aggressively, and scale down a bit reluctanctly – we might need the workers next couple of minutes.

As a bonus, guv includes some integration with common metrics services: The statuspage.io metrics about ‘jobs-in-flight’ on status.thegrid.io, come directly from guv. And using New Relic Insights, we can analyze how the scaling is performing.

Last 2 days of guv scaling history on some of the workers roles at The Grid.

If we had a manual scaling with a constant number over 48 hours period, workers=35 (Max), then we would have paid at least 3-4 times more than we did with autoscaling (difference in size of area under Max versus area under the 10 minute line). Alternatively we could have provisioned a lower number of workers, but then with spikes above that number – our users would have suffered because things would be taking longer than normal.

We’ve been running this in production since early June. Back then we had 25 users, where as now we have several thousand. Apart from updating the configuration to reflect service changes we do not deal with scaling – the minute to minute decisions are all done by guv. Not much is planned in terms of new features for guv, apart from some more tools to analyze configuration. For more info on using guv, see the README.

imgflo 0.3: GEGL metaoperations++

Time for a new release of imgflo, the image processing server and dataflow runtime based on GEGL. This iteration has been mostly focused on ironing out various workflow issues, including documentation. Primarily so that the creatives in our team can be productive in developing new image filters/processing. Eventually this will also be an extension point for third parties on our platform.

By porting the png and jpeg loading operations in GEGL to GIO, we’ve added support for loading images into imgflo over HTTP or dataURLs. The latter enables opening local file through a file selector in Flowhub. Eventually we’d like to also support picking from web services.

Loading local file using html input type="file"

Loading local file using HTML5 input type=”file”

 

Another big feature is allowing to live-code new GEGL operations (in C) and load them. This works by sending the code over to the runtime, which then compiles it into a new .so file and loads it. Newly instatiated operations then uses that revision of code. We currently do not change the active operation of currently running instances, though we could.
Operations are never unloaded, due both to a glib limitation and the general trickyness of guaranteeing this to be safe for native code. This is not a big deal as this is a development-only feature, and the memory growth is slow.

Live-coding new image processing operations in C

Live-coding new image processing operations in C

 

imgflo now supports showing the data going through edges, which is very useful to understand how a particular graph works.

Selecting edges shows the buffer at that point in the graph

Selecting edges shows the buffer at that point in the graph

Using Heroku one can get started without installing anything locally. Eventually we might have installers for common OS’es as well.

Get started with imgflo using Heroku

 

Vilson Viera added a set of new image filters to the server, inspired by Instagram. Vilson is also working on our image analytics pipeline, the other piece required for intelligent automatic- and semi-automatic image processing.

Various insta filters

 

GEGL has for a long time supported meta-operations: operations which are built as a sub-graph of other operations. However, they had to be built programatically using the C API which limited tooling support and the platform-specific nature made them hard to distribute.
Now GEGL can load such operations from the JSON format also used by imgflo (and several other runtimes). This lets one use operations built with Flowhub+imgflo in GIMP:

imgflo-meta-ops-gimp

This makes Flowhub+imgflo a useful tool also outside the web-based processing workflow it is primarily built for. Feature is available in GEGL and GIMP master as of last week, and will be released in GIMP 2.10 / GEGL 0.3.

 

Next iteration will be primarily about scaling out. Both allowing multiple “apps” (including individual access to graphs and usage monitoring/quotas) served from a single service, and scaling performance horizontally. The latter will be critical when the ~20k+ users who have signed up start coming onboard.
If you have an interest in using our hosted imgflo service outside of The Grid, get in contact.

imgflo 0.2, The Grid launched

When I announced the first release of the imgflo project in April, it was perhaps difficult to see what exactly it was useful for and why we are developing it. This has changed now as 3 weeks ago we launched The Grid, our AI-based web publishing platform. We are on a bold mission to have “websites build themselves”; because until posting to personal websites becomes easier and more rewarding than posting to social media, content on the web will continue to pile up in closed silos.

thegrid-5k-join

To help solve this problem we built several open source technologies:

NoFlo: for creating highly testable, component-based, distributed software.
Flowhub: for visually and interactively building programs and extensions.
GSS: for building constraint-based, responsive layouts
And of course imgflo: for on-demand server-side image processing.

In total over 100k lines of code, and around 5000 commits over the last 12 months. Some of the stack is expained in more detail in a recent interview with Libre Graphics World.

imgflo on The Grid

thegrid.io launch site is of course built with The Grid. In the particular layout filter used, the look & feel is driven largely by the content. Colors for text captions are extracted from tweets and social media posts, and the featured images are largely unfiltered. Other Grid layout filters may style all provided content, including images, towards a uniform look specified by a color scheme. Or a layout filter may mix-and-match content- versus style-driven design.

The background texture on this section was created with imgflo, by passing the featured image through a blur graph:

thegridio-imgflo-bg-texture
https://imgflo.herokuapp.com/graph/vahj1ThiexotieMo/1ff47cef6f354fe0fbdefb…Fimages%2Fgrid-chrome.jpg&width=1300&height=768&std-dev-x=25&std-dev-y=25

It is important to note that no-one chose this exact image to be used in the particular layout section (and thus have the given image filter applied), which is why processing happens on-demand. The layout section with image inside a computer screen is available for content which has images of type “screenshot”. This property may be automatically detected by our image analytics pipeline, or manually annotated by user. The system allows describing many other such constraints, which are all taken into account when it works to create the appropriate layout for given content.

Even without considering styling, imgflo has a couple of important roles on a Grid site. Important is the ability create multiple scaled down versions to optimize download size. For this we also created a helper library called RIG, which is used to generate a set of CSS media-queries with imgflo request urls.


> rig = require 'rig-up'
> css = rig content, serverconfig, 'passthrough', parameters, ... 
 # passthrough is name of the graph to process through
 
@media (max-width: 503px) {
  .media, .background {
    background-image: url('https://imgflo.herokuapp.com/graph/apikey/6bb56129dc707894baa88d10a02a12b9/passthrough?input=https%3A%2F%2Fa.com%2Fb.png&width=400&height=225');
  }
}
@media (min-width: 504px) and (max-width: 1007px) {
  .media, .background {
    background-image: url('https://imgflo.herokuapp.com/graph/apikey/d099f7222293d335a6192d742f523bfa/passthrough?input=https%3A%2F%2Fa.com%2Fb.png&width=800&height=450');
  }
}

Processing images through imgflo also means that they are cached. So if the original image becomes unavailable, the website still has versions it can use. This can happen for instance on Twitter when people change their profile picture.
Note that while we optimize images when presented on site, we don’t touch the original image (non-destructive). This means image uploaded to The Grid has the full data & metadata preserved, unlike on some other social/web services. However, we are currently not preserving metadata in processed images.

 

imgflo 0.2

imgflo is now split into three repositories, the GEGL-based Flowhub runtime, the HTTP API server and the native dependencies. The runtime itself is plain C with glib, and could be used in non-web applications for desktop, mobile or embedded.

A major feature is that processing requests can now be authenticated, so that non-legitimate users cannot disrupt legitimate ones by overloading the server. We also use Amazon S3 for caching processed images, offloading a large portion of the work. Servicing 10k++ visits a day with a 2-dyno Heroko app has been no problem with this setup.

In imgflo-server we’ve also added support for using different processors than imgflo (which uses GEGL), in particular NoFlo with noflo-canvas. One can now build and deploy image processing pipelines using JavaScript, including all the libraries that work with the <canvas> element.

Building NoFlo image processing graph in Flowhub, then requesting from imgflo-server

Building NoFlo image processing graph in Flowhub, then requesting from imgflo-server

Full details about the changes can be found in the changelogs: server, runtime.

Scale

Flowhub provides imgflo a node-based visual & interactive IDE for developing new image filters for The Grid. It is similar to etablished tools like FilterForge, the Blender compositor,  vvvv and nuke – which many designers and visual artists are familiar with. However there are still many snags in the workflow for non-technical people. Smoothing out these is major part of the next imgflo milestone.
After that the focus will be on horizontal scalability, to handle the load as The Grid enters beta and opens to founding members in spring.

MicroFlo 0.3 & Flowhub Beta

Its been nearly 6 months since the previous release of MicroFlo, which was the first that allowed you to visually program your Arduino using NoFlo UI. While we are still just getting started, lots of things have changed since then.

Flowhub

Flowhub is the name of the officially supported, packaged and hosted version of the open source NoFlo UI. This is the IDE used for programming with MicroFlo, and a lot of work has been put into it the last couple of months. Today we released the beta version.

You can test it out for browser, node.js and microcontrollers now.

Other runtimes are in development, and you can add your own by implementing the FBP runtime protocol. For instance there are runtimes for desktop development for GNOME, image processing using GEGL, audio synthesis using SuperCollider and for Python.

Heterogenous FBP

Lightbulb idea: two microcontrollers programmed with MicroFlo used as components in a NoFlo program

Often microcontrollers act as sensors and actuators in a larger system, where embedded computers, mobile devices and servers are used to provide the data storage and processing as well as user interfaces and connectivity with other systems.

With MicroFlo 0.3 one can visually create a microcontroller program, and then export ports on this program to make the entire microcontroller available as a component in NoFlo on node.js. This allows to seamlessly create programs which combine microcontrollers and embedded computers.
We made use of this functionality when we created an interactive table that shows the status of the Ingress virtual reality game.

Platform support

Lunchbox electronics: Some boards that can run MicroFlo

MicroFlo has worked on AVR-based Arduinos from day 1, but it was always the goal to not be specific to Arduino. Therefore I’m happy to say that there are now basic platform implementations for:

  • AVR-based Arduino and derivatives
  • Atmel AVR8 (without using Arduino)
  • mbed LPC1768 (ARM Cortex M3)
  • Texas Instruments Tiva-C (ARM Cortex M4)
  • Embedded Linux (RPi, BeagleBone Black)

Basic bring-up up of new platform can be done in a couple of days, and components which do not use platform-dependent libraries can be used immediately. The goal is to be portable enough that you can pick up whatever capable device you find in your parts-bin, prototype your initial code there – then move the program over to a more ideal device if/when appropriate.
If you have particular platforms you’d like to see supported, leave a note.

Simulation & Automated testing

Automated testing in the embedded world using C/C++ is very painful compared to that of recent web technologies. CoffeScript (or JavaScript) in combination with a modern BDD framework like Mocha makes for simple and beautiful tests.

These tests are ran in a simulator which implements the MicroFlo I/O backend (C++) in JavaScript using a Node.js addon. Unfortunately there are not many good open source instruction-level simulators for the platforms MicroFlo support, so the platform backends can only be tested once we support on-device testing.

Automated testing is a critical piece to making sure that MicroFlo is not only a fun and rewarding way to create microcontroller programs, but also an excellent way to make industrial quality devices.

Easier to get started

A Chrome app is slightly less intimidating for users than a terminal

MicroFlo now ships a Chrome app, used to let Flowhub communicate with the MicroFlo runtime running on device over serial/USB/Bluetooth. This means  it is no longer neccesary to run node.js in a terminal, removing a usability issue in getting started.

In the future, this functionality will be baked into the Flowhub Chrome app itself. With time component code editing, building the firmware and flashing the device will also be available there, making Flowhub a true integrated development environment for MicroFlo.

 

 

 

imgflo 0.1: An image processing server and Flowhub runtime

At TheGrid we are in need for a flexible service for doing server-side image processing. So after some discussion at LGM in Leipzig, I started writing one based on GEGL, the image processing library that will power the upcoming GIMP 2.10 release. The library provides a demand-driven graph-based API , with a ton of operations included and support for GPU processing using OpenCL.

Runtime

For creating image processing pipelines for the server, imgflo acts as a runtime for Flowhub, our open source visual programming IDE. It adds to the existing NoFlo browser, Node.js and MicroFlo microcontroller runtime targets, all possible thanks to the runtime-agnostic protocol.

The preview is live and changes whenever changes are made to the graph. This allows to quickly experiment and develop new graphs.

Server

As a server, imgflo provides a simple HTTP API where you specify the input image as an URL, the graph to process it through and any parameters exposed on that graph.

Requests being made to the imgflo server HTTP API on demo page

Processed images are cached, so that subsequent request on the same url just returns the image out of the cache.
The git repository includes configuration and build setup for Heroku, so deploying an instance is a 5 minute job.

Next

imgflo 0.1 is now minimally useful as image processing server, but there are many more enhancements on the todo list. Scalability and integration with NoFlo are two big topics, as as expanding the pool of available operations and graphs. Porting filters from GIMP to GEGL is a way of helping with the latter.

Additionally it would be interesting to provide a way of using imgflo with GIMP. First of all one could use graphs made with Flowhub via GEGL meta-operations. A more crazy idea is to integrate directly,  using Flowhub as a companion node-based editor.