5 years of emlearn

Back in 2018, during my master in Data Science, I started the emlearn project – implementing classic Machine Learning models designed to run on microcontrollers and embedded devices (aka “TinyML”).
Back then, the primary purpose was to learn the ins and outs of these algorithms, to make sure I really understood them well. Thinking about how the algorithms can be implemented efficiently in terms of RAM usage, storage space and CPU time is excellent for that purpose. But I always made sure to keep the code at production-quality, with a decent set of automated tests.

emlearn in use

Fast-forward 5 years, and emlearn has actually been used in a wide range of real-world projects, which is fantastic. Due to the open-source nature, everyone can just download it and use it for whatever they want – so there are probably many uses we will never hear about. And that is fine 🙂 But, there are some uses that I do know of. So here is a small selection of some of the projects using emlearn.

These projects highlight some of the typical characteristics for projects where emlearn is a good fit. Sensor data is analyzed on-device using a combination of Digital Signal Processing and Machine Learning. The extracted information is a low rate data-stream which can then be transmitted wirelessly. All of this can be done on low-cost hardware using general-purpose microcontrollers. Many tasks can also be done at low-power and run on battery.

Locating faults in the electrical power grid

Paper: Micro Random Forest: A Local, High-Speed Implementation of a Machine-Learning Fault Location Method for Distribution Power Systems

Researchers from Sandia National Laboratories developed a system to identify the origin of faults in the power grid. When an abrupt fault occurs in a power transmission line, this causes a wave that propagates along the line. The wave is influenced by the electrical characteristics of the line as it travels along to other locations in the grid. The researchers used a combination of Digital Signal Progressing and Random Forest classifier to analyzes the signature of an incoming wave, and determines where the fault originated. They tested the system by injecting faults in a simulated model of a grid, and then ported the procedure to a DSP board from Texas Instruments.

Being able to automatically locate failures that occur in grid power lines is key to being able to dispatch maintenance crews to the right place, and could enable quicker recovery times after failures.

Breathing rate estimation in wearables

Paper: Remote Breathing Rate Tracking in Stationary Position Using the Motion and Acoustic Sensors of Earables

Researchers at Samsung used a combination of motion data from the accelerometer and sound data from the microphone on Samsung Galaxy earbuds to estimate breathing rate. First, breathing rate estimation is attempted using the accelerometer. If no estimation can be made, then the audio microphone data is used to estimate breathing rate. This multi-modal approach handles a wide range of cases, which they tested both in a laboratory and home setting. The battery consumption was low enough that the measurements could run continuously.

This kind of technology could be especially useful for people with respiratory diseases, who often need to check their breathing rate to manage and quantify their lung condition.

Activity monitoring for milk cows

Paper: Power Efficient Wireless Sensor Node through Edge Intelligence

Abhishek Damle, who was a master student at Virginia Tech in 2022, developed a battery-powered sensor that tracks cattle activity. The sensor is mounted on the cow’s halter, and uses a low-power accelerometer to sense motion. A decision tree is used to classify whether the cow is lying/walking/standing/grazing/ruminating. The detected activity is then transmitted LoRaWAN to a central location. They estimate that doing the machine learning on-sensor it consumes under 1 mW, which is 50 times less than if transmitting the raw data.

The activity data can be used to detect abnormal behavior in a cow, which can indicate health issues or other problems.

Recent developments

Classification, regression and anomaly detection

emlearn allows converting scikit-learn and Keras models trained in Python into efficient and portable C code. The first versions of only supported classification. Over the last years, we have also added support for regression and anomaly/outlier detection for a range of models.

Examples of outputs for some of the models supported by emlearn, in different types of ML tasks

Improved documentation

To facilitate further usage, I decided to do a pass to improve the documentation this summer.
Now there is more detailed documentation for Getting Started, that covers setting up and running the first model – be it on your PC, on Arduino, or running in a Web browser. There are also task-oriented pages in the user-guide – covering Classification, Regression, Anomaly Detection, Event Detection (for now).

emlearn documentation now has a User Guide, C API reference, Python API reference and Examples

MicroPython support

Whereas C is the lingua franca for embedded/firmware engineers, Python is the lingua franca for Data Scientists and Machine Learning Engineers.
emlearn itself is written in portable C99, and makes it easy to train in Python (on computer) and deploy to C (on device).
This is practical, but for some users and applications it could be even nicer to do also the device code in Python. This is now made easy with emlearn-micropython.

The MicroPython project is a Python (3.x) implementation designed for microcontrollers. It is around 10 years old by now, and has a good level of maturity. It is a very efficient and compact Python implementation, and runs on 32-bit devices with at least 256k of code space and 16k of RAM.

The emlearn-micropython are released as .mpy modules, which can be added to an existing board by copying it over USB/serial/WiFi. No rebuilding or flashing of the firmware needed! This uses the dynamic native modules feature in MicroPython. This is really quite magical, and lets us realize a workflow which has the convenience of Python, and the efficiency of C.

Example code for host
# Train a model using scikit-learn
estimator = RandomForestClassifier(n_estimators=3, max_depth=3, max_features=2, random_state=1)
estimator.fit(X, y)

# Convert model using emlearn
cmodel = emlearn.convert(estimator, method='inline')

# Save as loadable .csv file
cmodel.save(file='xor_model.csv', name='xor', format='csv')
Example code for device
import emltrees
import array

model = emltrees.new(5, 30) # ensemble of up to 5 trees and 30 decision nodes

# Load a CSV file with the model
with open('xor_model.csv', 'r') as f:
    emltrees.load_model(model, f)

# run inference with the model
examples = [
    array.array('f', [0.0, 0.0]),
    array.array('f', [1.0, 1.0]),
    array.array('f', [0.0, 1.0]),
    array.array('f', [1.0, 0.0]),
]
for ex in examples: 
    result = model.predict(ex)
    print(ex, result)

There are some existing projects that allow to generate (Micro)Python code for ML models, such as m2cgen and everywhereml. But while MicroPython is a pretty efficient Python, a C implementation should still be able to do better. To test this, I ran some quick benchmarks. These were performed on a RP2040, an affordable and popular ARM Cortex M0 chip.

Inference speed comparison of emlearn-micropython compared to other frameworks.
Y axis is the number of samples per second (higher is better).

As expected, the emlearn approach is many times faster than pure-Python based approaches. This could be further increased by enabling fixed-point/integer support in emlearn.

The TinyML wave has only just begun

emlearn is a library for Machine Learning on microcontrollers and embedded systems – often referred to as TinyML. There are over 20 billion microcontrollers shipped every year. From 2022 it is estimated that over 2 billion of these are TinyML devices, up from 15 million in 2019[1].
So it is still very early days for this field. And it will be very interesting to see where emlearn goes in the next 5 years.

Optimizing latency of an Arduino MIDI controller

Update: The dhang is now available for preorder, and you can join a workshop to build it yourself!

Feedback from first user testing of the dhang digital hand drum was that the latency was too high. How did we bring it down to a good level?

dhang: A MIDI controller using capacitive touch sensors for triggering. An Arduino board processes the sensor data and sends MIDI notes over USB to a PC or mobile device. A synthesizer on the computer turns the notes into sound.

Testing latency

For an interactive system like this, what matters is the performance experienced by the user. For a MIDI controller that means the end-to-end latency, from hitting the pad until the sound triggered is heard. So this is what we must be able to observe in order to evaluate current performance and the impact of attempted improvements. And to have concrete, objective data to go by, we need to measure it.

My first idea was to use a high-speed camera, using the video image to determine when pad is hit and the audio to detect when sound comes from the computer. However even at 120 FPS, which some modern cameras/smartphones can do, there is 8.33 ms per frame. So to find when pad was hit with higher accuracy (1ms) would require using multiple frames and interpolating the motion between them.

Instead we decided to go with a purely audio-based method:

Test setup for measuring MIDI controller end2end latency using audio recorded with smartphone.

  • The microphone is positioned close to the controller pad and the output speaker
  • The controller pad is tapped with the finger quickly and hard enough to be audible
  • Volume of the output was adjusted to be roughly same level as sound of physically hitting the pad
  • In case the images are useful for understanding the recorded test, video is also recorded
  • The synthesized sound was chosen to be easily distinguished from the thud of the controller pad

To get access to more settings, the open-source OpenCamera Android app was used. Setting a low video bitrate to save space, and enabling macro-mode for focusing close objects easier. For synthesizing sounds from the MIDI signals we use LMMS, a simple but powerful digital music studio.

Then we open the video in Audacity audio editor to analyze the results. Using Effect->Amplify to normalize the audio to -1db makes it easier to see the waveforms. And then we can manually select and label the distance between the starting points of the sounds to get our end-to-end latency.

Raw sound data, data with normalized amplitude and measured distance between the sound of tapping the sensor and the sound coming from speakers.

How good is good enough?

We now know that the latency experienced by our testers was around 137 ms. For reference, when playing a (relatively slow) 4/4 beat at 120 beats per minute, the distance between each 16th notes is 125 ms. In the following soundclip the kickdrum is playing 4/4 and the ‘ping’ all 16 16th notes.

So the latency experienced would offset the sound by more than one 16th note! We can understand that this would make it tricky to play.

For professional level audio, less than <10 ms is a commonly cited as the desired performance, especially for percussion. From Action-Sound Latency: Are Our Tools Fast Enough?

Wessel and Wright suggested that digital musical
instruments should aim for latency less than 10ms [22]

Dahl and Bresin [3] found that in a system
with latency, musicians execute their gestures ahead of the
beat to align the sound with a metronome, and that they
can maintain synchronisation this way up to 55ms latency.

Since the instrument in question is going to be a kit targeted at hobbyists/amateurs, we decided on an initial target of <30ms.

Sources of latency

Latency, like other performance issues, is a compounding problem: Each operation in the chain adds to it. However usually a large portion of the time is spent in a small parts of the system, so an important part of optimization is to locate the areas which matter (or rule out areas that don’t).

For the MIDI controller system in question, a software-centric view looks something like:

A functional view of the system and major components that may contribute to latency. Made with Flowhub

There are also sources of latency outside the software and electronics of the system. The capacitive effect that the sensor relies on will have a non-zero response time, and it takes time for sound played by the speakers to reach our ears. The latter can quickly be come significant; at 4 meters the delay is already over 10 milliseconds.

And at this time, we know what the total latency is, but don’t have information about how it is divided.

With simulation-hardened Arduino firmware

The system tested by users was running the very first hardware and firmware version. It used a an Arduino Uno. Because the Uno lacks native USB, a serial->MIDI bridge process had to run on the PC. Afterwards we developed a new firmware, guided by recorded sensor data and host-based simulation. From the data gathered we also decided to switch to a more sensitive sensor setup. And we switched to Arduino Leonardo with native USB-MIDI.

Latency with new firmware (with 1 sensor) was reduced by 50 ms (35%).

This firmware also logs how long each sensor reading cycle takes. It was under 1 ms for the recorded single-sensor setup. The sensor readings went almost instantly from low to high (1-3 cycles). So if the sensor reading and triggering takes just 3 ms, the remaining 84 ms must be elsewhere in the system!

Low-latency audio, a hard real-time problem

The two other main areas of the system are: the USB/MIDI communication from the Arduino to the PC, and the sound synthesis/playback. USB MIDI should generally be relatively low-latency, and it is a subsystem which we cannot influence so easily – so we focus first on the sound aspects.

Since a PC must be able to do multi-tasking, audio is processed in chunks: a buffer of N samples. This allows some flexibility. However if processing is interruptedfor toolong or too often, the buffer may not be completely filled. The resulting glitch is usually heard as a pop or crackle. The lower latency we want, the smaller the buffer, and the higher chance that something will interrupt for too long. At 96 samples/buffer of 48kHz samplerate, each buffer is just 2 milliseconds long.

With JACK on on Linux

I did the next tests on Linux, since I know it better than Windows. Configuring JACK to 256 samples/buffer, we see that the audio configuration does indeed have a large impact.

Latency reduced to half by configuring Linux with JACK for low-latency audio.

 

With ASIO4ALL on Windows

But users of the kit are unlikely to use Linux, so a solution that works with Windows is needed (at least). We tried all the different driver options in LMMS, switching to Hydrogen drum machine, as well as attempting to use JACK on Windows. None of these options worked well.
So in the end we tried going with ASIO, using the ASIO4LL replacement drivers. Since ASIO is proprietary LMMS/PortAudio does not support it out-of-the-box. Instead you have to manually replace the PortAudio DLL that comes with LMMS with a custom one 🙁 *nasty*.

With ASIO4ALL we were able to set the buffer size as low as 96 samples, 2 buffers without glitches.

ASIO on Windows achieves very low latencies. Measurement of single sensor.

Completed system

Bringing back the 8 other sensors again adds around 6 ms to the sensor reading, bringing the final latency to around 20ms. There are likely still possibilities for significant improvements, but the target was reached so this will be good enough for now.

A note on jitter

The variation in latency of a audio system is called jitter. Ideally a musical instrument would have a constant latency (no jitter). When a musical instrument has significant amounts of jitter, it will be harder for the player to compensate for the latency.

Measuring the amount of jitter would require some automated tools for the audio analysis, but should otherwise be doable with the same test setup.
The audio pipeline should have practically no variation, but the USB/MIDI communication might be a source of variation. The CapacitiveSensor Arduino library is known to have variation in sensor readout time, depending on the current capacitance of the sensor.

Conclusions

By recording audible taps of the sensor with a smartphone, and analyzing with a standard audio editor, one can measure end-to-end latency in a tactile-to-sound instrument. A combination of tweaking the sensor hardware layout, improving the Arduino firmware, and configuring PC software for low-latency audio was needed to aceive acceptable levels of latency. The first round of improvements brought the latency down from an ‘almost unplayable’ 134 ms to a ‘hobby-friendly’ 20 ms.

Comparison of latency betwen the different configurations tested.

 

Host-based simulation for embedded systems

Embedded systems and microcontroller programs can be really hard to understand. Here are some techniques that can help.

An embedded system is a computer system with a dedicated function within a larger mechanical or electrical system …
– Wikipedia

Examples of embedded systems include everything from a fridge thermostat, to the airbag sensors in your car, to consumer electronics like a electronic keyboard for music. Programming such a system can be extra challenging compared to writing software for a general-purpose computer. The complicating factors may include:

  • Limited computing power, memory, storage or bandwidth
  • Need for low and deterministic response times is needed (soft or hard real-time)
  • Running on battery, within a constrained power budget
  • Need for high reliability without user intervention over long periods of time (months,years)
  • A malfunctioning system may cause material damage or harm people
  • Problem may require considerable domain-specific knowledge

But before all that, we typically need to come to terms with a more mundane and practical problem:
that it is usually very hard to understand what is going on with our program!

Why embedded systems are harder to debug

This problem if hard-to-understand software systems is not at all unique to embedded, it happens frequently also when developing for a PC or mobile device. But several aspects typically make the problem more severe.

  • It is often hard to stimulate the system with inputs automatically, as they are typically physical/external in nature
  • Inputs, outputs and internal state of the system may change faster than humans can observe in real-time
  • The environment of the system typically influences results, sometimes in unforseen ways
  • Low-level programming languages and techniques is still the norm
  • The target device does not have a UI or development tools
  • The connection to a system is often (or at best) a slow serial connection

To make working with the system more pleasant and productive, I have found two techniques very useful:

  • Recording sensor data from device, and then analyzing it ‘offline’ on the PC
  • Running the firmware logic on the PC, by abstracting away hardware dependencies

Case: Triggering MIDI with capacitive sensors

A friend of mine is building a electronic music instrument, a Hang-like thing with 9 pads. It uses capacitive touch sensors and sends MIDI notes over USB for triggering a sampler or synthesizer to make sound.

Core components of system. Capacitive sensor(s) connected to Arduino, sending MIDI messages to computer software over USB.

The firmware on the device was an Arduino sketch, using the CapacitiveSensor library to read the capacitance of the pads.
Summing up N consecutive samples gives basic filtering, and a note is triggered if the value exceedes a specified threshold TH.

The setup worked in principle, and in practice for some ways of hitting the drumpad. But it seemed impossible to find a combination of N and TH where the device would trigger correctly in all cases. If N was too high, then fast taps would be dropped/ignored. If N was low, it caused occational double triggering. If TH was too high, it would not trigger on single finger taps. If TH was too low, it would trigger when a palm was just hovering over the pad.

How to make it work reliably?

To solve a problem, you first have to understand it

Several hours had been spent tweaking the values, recompiling the sketch, uploading and hitting the pads a bit, listening if it did the right thing. This gave us some rough intuitions about things that worked and not, but generally our understanding of the situation was quite sparse. Which is understandable; there is only so much one can learn from observing a system in real-time from the outside with the naked eye/ear.

It seemed that how the pad was hit had an influence. Which is plausible, the surface area might influences how effective the capacitive coupling is.

Example poses that should trigger. Hitting with fingertip, finger flat, 3-fingers and side of thumb.

Some cases which should *not* trigger. Hand hovering over pad, touching casing with thumb but not the pad.

What we needed to understant was: What is the input data in different scenarios?

First added logging to the Arduino sketch, sending the time between readings and the current sensor value (for one sensor).

  const long beforeRead = millis();
  const Input input = readInputs();
  const long afterRead = millis();
  Serial.print("(");
  Serial.print(afterRead-beforeRead);
  Serial.print(",");
  Serial.print(input.values[0].capacitance);
  Serial.println(")");

The stream showed that with N=70, the reading the sensors took a long time, over 40ms. This is unacceptable for a musical instrument, so it was clear that this value *had* to go down. Any issues caused would have to be fixed in some other way.

Then a small Python script to read the serial port, and writing the raw data to a file.
This allowed us to record the data from a whole scenario. For instance we recorded things like ‘tapping-3-times-quickly’ or ‘hovering-then-touching’, using the filename as a description for what the scenarios was, and directories to group sets of recordings.

Another Python script was then used for analysing the data, parsing the raw values and using matplotlib to plot it out.

Now we could finally *see* what was going on, over longer periods of time and compare different scenarios against eachother.

This case illustrated the crux of our problem. The red areas indicate where we are hovering *over* the pad with palm, but the sensed capacitance values are higher than when touching with a fingertip.
If the threshold is set too high (orange line) we miss the finger tap, and if it is too low *yellow) we will false trigger on a hovering palm.

Can we do better with an alternative detection algorithm? Maybe a high-pass filter to detect the changes at the edges, it may be possible be possible to identify both cases. Plugging in a high-pass filter in the Python analysis script and playing with the values seemed to support this.

But we cannot run Python for real-time processing. We need to be able to implement the filter in the Arduino firmware.

Exponential Moving Average filters

An Exponential Moving Average (EMA or EMWA) was selected as the basis of the filter. It has many desirable properties for use in a latency-sensitive application on a microcontroller: It only requires storing one number, is computationally simple, and is robust against variation in sampling time (jitter). And unlike a FIR filter, it does not introduces latency (apart from the time-constant of the filter itself). Here is a nice introduction for Arduino usage.

static int
exponentialMovingAverage(const int value, const int previous, const float alpha) {
  return (alpha*value) + ((1-alpha)*previous);
}
...
next.highfilter = exponentialMovingAverage(input.capacitance, previous.highfilter, appConfig.highpass);
next.highpassed = input.capacitance - next.highfilter;
...

What value should highpass have and how do we know if works correctly?

Host-based simulation

A regular Arduino sketch can generally only run on the target microcontroller. This is because the application logic is mixed with the hardware-dependent I/O libraries, in this case CapacitiveSensor and MidiUSB.
But Arduino is just C++. Nothing prevents us from separating out the application logic and making it hardware-independent so it can also execute on our host. The easiest method is to put the code into a .hpp, and then include that in our sketch and any host-only tools we have.

#include "./hangdrum.hpp"

This lets us use all the regular C++ tools and practices for testing and validating code, without needing access to the hardware. Automated unit- and integration-testing, fuzz-testing, mutation testing, dynamic analysis like Valgrind, using a continious integration services like Travis CI. In a project with custom hardware, it lets you develop most parts of the software before the hardware is finalized, potentially saving a lot of time.

I like to express the entire application logic of the firmware as a pure function which takes Input and current State, and returns the new State. This formulation lets us know exactly what may affect the system – no hidden dependencies or state.

State
calculateState(const State &previous, const Input &input, const Config &config) {
  State next = previous;
  next.time = input.time;

  for (unsigned int i=0; i<N_PADS; i++) {
    next.pads[i] = calculateStatePad(previous.pads[i], input.values[i], config.pads[i], config);
  }
  calculateMidiMessages(next, config, next.messages);
  return next;
}

Because all the inputs and outputs of the functions are plain-old-data, we can safely and meaningfully serialize and deserialize them.
To get better visibility into the internals of the system and help our understanding, we also store intermediate values:

PadState
calculateStatePad(const PadState &previous, const PadInput input,
                  const PadConfig &config, const Config &appConfig) {
...
  next.raw = input.capacitance;
  next.highfilter = exponentialMovingAverage(input.capacitance, previous.highfilter, appConfig.highpass);
  next.highpassed = input.capacitance - next.highfilter;
  next.lowpassed = exponentialMovingAverage(next.highpassed, previous.lowpassed, appConfig.lowpass);
  next.value = next.lowpassed;
...
}

If we had used a dataflow system like MicroFlo, we would have gotten this introspectability for free.

Combining the recorded input data logs with this platform-independent application logic, we can now make a simulator for our firmware:

    std::vector<hangdrum::Input> inputEvents = read_events(filename);
    std::vector<hangdrum::State> states;

    hangdrum::State state;
    for (auto &event : inputEvents) {
        state = hangdrum::calculateState(state, event, config);
        states.push_back(state);
    }
    write_flowtrace(trace_filename, trace);

To store the execution this I used a Flowtrace, a JSON-based format for tracing Flow-based-programming/dataflow system.

Because time is just data in our programming model (part of Input or State), we can run through hours of input scenarios in seconds.
I made another plotting tool, this time reading the flowtrace, visualizing all the steps in our signal processing pipeline, and the detected notes.

Avoiding false triggering for hovering hand, by looking at changes instead of absolute values.

By going over a range of different input scenarios and seeing how different values perform, we get a decent confidence that the algorithm works. But does it actually run fast enough on the Arduino?

Profiling on device

The Atmel AVR chip on the Arduino Leonardo is an 8-bit processor without a floating point unit. So I was a bit worried about the exponential averaging filter using several expensive features: 16bit `int`, divisions and a multiplication with a float. Using a Arduino sketch to do some simple profiling showed that my worries were unfounded.

const long beforeCalculation = millis();
State next = state;
for (int i=0; i<100; i++) {
  next = hangdrum::calculateState(state, input, config);
}
state = next;
const long afterCalculation = millis();
Serial.print("calculating: ");
Serial.println(afterCalculation-beforeCalculation);

The 100 iterations of the application logic executed it took 80 ms with both a high-pass and low-pass, or less than 1ms per execution. Since sensor readout is up to 10 ms, it dominates the time spent. So if we want lower latency, optimization efforts should be focused on sensor readout first. Only when sensor readout is down to around 1ms does it make sense to optimize the filtering.

Don’t forget the hardware

Testing the code with highpass-based in practice showed that yes, it did correctly detect tapping while supressing false triggers from a hovering palm over the sensor. Another benefit when using change detection a notes will trigger even if a finger is currently touching, and hitting the pad with another finger. With absolute value thresholding, the second finger tap is not detected.

However, we also found that by moving the sensor to the outside, the data quality increases a lot. In this case, even the simple absolute threshold code was able to correctly discriminate a hovering palm. The higher data quality may also enable other features like velocity or aftertouch.

Putting sensor on outside of body gives better readings, but requires additional manufacturing steps.

 

In conclusion

Sensor-recording together with a separting hardware from application logic,  and host-based simulation form powerful tools that help you better understand an embedded system. By visualizing the input data, the internal state and the outputs of the firmware, you can more easily see and understand the conditions which cause problems. The effects of changing the firmware can be quickly understood, as re-running the simulation suite on a wide range of inputs can be done in seconds. It can be implemented easily in C++ firmware, and any scripting language can be used for the data analysis.

The techniques here form a baseline level of tooling which can be extended in many ways.
If we had recorded a high-framerate video stream together with the input data, it would be easier to see how the input data corresponds to physical actions.

We could annotate the input data to indicate the correct locations of notes (expected output of system), and write an automated test against the output trace to check how well the firmware detects them. By using data-driven testing, we could generate test variations over different inputs and filter configuration. And then use machine learning techniques to help find the best values for the filters.

We could also create an end-to-end test covering the vast majority of the code by inject input sensor data in the on-device firmware over serial, and then verify that it produces the expected MIDI messages.

Live programming IoT systems with MsgFlo+Flowhub

Last weekend at FOSDEM I presented in the Internet of Things (IoT) devroom,
showing how one can use MsgFlo with Flowhub to visually live-program devices that talk MQTT.

If the video does not work, try the alternatives here. See also full presentation notes, incl example code.

Background

Since announcing MsgFlo in 2015, it has mostly been used to build scalable backend systems (“cloud”), using AMQP and RabbitMQ. At The Grid we’ve been processing hundred thousands of jobs each week, so that usecase is pretty well tested by now.

However, MsgFlo was designed from the beginning to support multiple messaging systems (including MQTT), as well as other kinds of distributed systems – like a networks of embedded devices working together (one aspect of “IoT”). And in MsgFlo 0.10 this is starting to work pretty nicely.

Visual system architecture

Typical MQTT devices have the topic names hidden in code. Any documentation is typically kept in sync (or not…) by hand.
MsgFlo lets you represent your devices and services as FBP/dataflow “components”, and a system as a connected graph of component instances. Each device periodically sends a discovery message to the broker. This message describing the role name, as well as what ports exists (including the MQTT topic name). This leads to a system architecture which can be visualized easily:

Imaginary solution to a typically Norwegian problem: Avoiding your waterpipes freezing in the winter.

Rewire live system

In most MQTT devices, output is sent directly to the input of another device, by using the same MQTT topic name. This hardcodes the system functionality, reducing encapsulation and reusability.
MsgFlo each device *should* send output and receive inports on topic namespaced to the device.
Connections between devices are handled on the layer above, by the broker/router binding different topics together. With Flowhub, one can change these connections while the system is running.

Change program values on the fly

Changing a parameter or configuration of an embedded device usually requires changing the code and flashing it. This means recompiling and usually being connected to the device over USB. This makes the iteration cycle pretty slow and tedious.
In MsgFlo, devices can (and should!) expose their parameters on MQTT and declare them as inports.
Then they can be changed in Flowhub, the device instantly reflecting the new values.

Great for exploratory coding; quickly trying out different values to find the right one.
Examples are when tweaking animations or game mechanics, as it is near impossible to know up front what feels right.

Add component as adapters

MsgFlo encourages devices to be fairly stupid, focused on a single generally-useful task like providing sensor data, or a way to cause actions in the real world. This lets us define “applications” without touching the individual devices, and adapt the behavior of the system over time.

Imagine we have a device which periodically sends current temperature, as a floating-point number in Celcius unit. And a display device which can display text (for instance a small OLED). To show current temperature, we could wire them directly:

Our display would show something like “22.3333333”. Not very friendly, how does one know what this number means?

Better add a component to do some formatting.

Adding a Python component

Component formatting incoming temperature number to a friendly text string

And then insert it before the display. This will create a new process, and route the data through it.

Our display would now show “Temperature: 22.3 C”

Over time maybe the system grows further

Added another sensor, output now like “Inside 22.2 C Outside: -5.5 C”.

Getting started with MsgFlo

If you have existing “things” that support MQTT, you can start using MsgFlo by either:
1) Modifying the code to also send the discovery message.
2) Use the msgflo-foreign-participant tool to provide discovery without code changes

If you have new things, using one of the MsgFlo libraries is a quick way to support MQTT and MsgFlo. Right now there are libraries for Python, C++11, Node.js, NoFlo and Arduino.

MicroFlo 0.2.0, visual Arduino programming

Two months after MicroFlo 0.1.0, another important milestone has been reached. This release brings a basic visual programming environment and initial support for all major desktop platforms (Win/OSX/Linux). The project is still very much experimental, but it is now starting to demonstrate potential advantages over traditional Arduino programming.

Official release notes and announcement here.

The start of something visual

The “Hello World” adopted from Arduino, a program that blinks the built-in LED a couple of times per second. Pressing Play (>) uploads the program to the Arduino using MicroFlo.

The IDE shown is NoFlo UI, a visual programming environment which can also be used to program JavaScript for the browser and Node.js using the NoFlo runtime. This project is developed by Henri Bergius and rest of the NoFlo team. For more details about the NoFlo IDE project, check their latest update and follow their Kickstarter project.

Talk

At Piksel 2013 in Bergen, I also presented MicroFlo for the first time, to an audience of mostly new media and experimental sound artists. The talk goes into detail about the motivations behind the project, from the quite practical to the more philosophical considerations. Not my most coherent talk, but it gives some insight.

Link

Next

For the next milestone, MicroFlo 0.3, several things are already planned. Focus is mostly on practical improvements to the system, but I also hope to complete prototype support for “heterogeneous FBP”: Allowing to program systems consisting of both host computer and microcontroller programs in a unified manner using NoFlo+MicroFlo.

I am also planning a MicroFlo workshop at Bitraf some time in December and to demo the project at Maker Faire Oslo.

In the meantime, you can get started with MicroFlo for Arduino by following this tutorial. Feedback and contributions welcomed!

MicroFlo 0.1.0, and an Arduino powered fridge

Lately I’ve been playing with microcontrollers again; Atmel AVRs with and without Arduino boards. I’ve make a couple of tiny projects myself, helped an artist friend do interactive works and helped to integrated a microcontroller it in an embedded product at work. With Arduino, one does not have to worry about interrupts, registers and custom hardware programmers to get things done using a microcontroller. This has opened the door for many more people that pre-Arduino. But the Arduino language is just a collection of C++ classes and functions, users are still left with telling the microcontroller how to do things; “first do this, then this, then this…”.

I think always having to work on such a a low level limits what people make with Arduino, both in who’s able to use it and what current users are able to achive. So, I created a new experimental project: MicroFlo. It has a couple of goals, the two first being the most important:

People should not need to understand text-based, C style programming to be able to program microcontrollers. But those that do know it should be able to use that knowledge, and be able to mix-and-match it with higher-level paradims within a single program.

It should be possible to verify correctness of a microcontroller program in an automated way, and ideally in a hardware-independent manner.

Inspired by NoFlo, and designed for integration with it, MicroFlo implements Flow-based programming (FBP). In FBP, a program is constructed by connecting a set of independent components. Each component has in-ports and out-ports, and can only communicate with eachother through these. The connections can be defined using programatically, using a declarative text language,  or using a visual editor. 2D/3D artists will recognise this the concept from node compositors like in Blender, sound artists from applications like Reaktor.

Current status: A fridge

I have an old used fridge, by the looks of it made in the GDR some time before I was born. Not long after I got it, the thermostat broke and the cooler would not turn off. Instead of throwing it away and getting a new one, which would be the cool and practical* thing to do, I decided to fix it. Using an Arduino and MicroFlo.
* especially considering that it is several months since it broke…

A fridge is a simple system, something that should be simple for hobbyists to create. So it was a decent first usecase to test the framework on. Principially, such a system looks something like this:

 

The thermostat decides whether to turn the cooler on or off, and the cooler switch realizes this decision. There are many alternative methods of implemening each of these two components. I used a DS1820 digital thermometer IC to read temperature, and a hacked NEXA remote controlled relay for the switch.
All the logic, including temperature threshold is done in software on an Arduino Uno.

The code below for the cooler switch would have been simpler (a oneliner, left as excersise for the reader) if I instead had used a active high relay directly on the mains (illegal if not a certified electrician). Or alternatively reverse-engineered the 433Mhz protocol used.

 

MicroFlo code for the fridge, in the .FBP domain specific language (examples/fridge.fbp)
# Thermostat
timer(Timer) OUT -> TRIGGER thermometer(ReadDallasTemperature)
thermometer() OUT -> IN hysteresis(HysteresisLatch)

# On/Off switch
hysteresis() OUT -> IN switch(BreakBeforeMake)
switch() OUT1 -> IN ia(InvertBoolean) OUT -> IN turnOn(DigitalWrite)
switch() OUT2 -> IN ic(InvertBoolean) OUT -> IN turnOff(DigitalWrite)
# Feedback cycle to switch required for syncronizing break-before-make logic
turnOn() OUT -> IN ib(InvertBoolean) OUT -> MONITOR1 switch()
turnOff() OUT -> IN id(InvertBoolean) OUT -> MONITOR2 switch()

# Config
‘5000’ -> INTERVAL timer() # milliseconds
‘2’ -> LOWTHRESHOLD hysteresis() # Celcius
‘5’ -> HIGHTHRESHOLD hysteresis() # Celcius
‘[“0x28”, “0xAF”, “0x1C”, “0xB2”, “0x04”, “0x00”, “0x00”, “0x33”]’ -> ADDRESS thermometer()
board(ArduinoUno) PIN9 -> PIN thermometer()
board() PIN12 -> PIN turnOff()
board() PIN11 -> PIN turnOn()

Is the above solution nicer than using the Arduino IDE and writing in C++? At the moment maybe not significantly so. But it does prove that this kind of high-level dynamic programming model is feasible to implement also on devices with 2kB RAM and 32kB program memory. And it is a starting point for more interesting exploration.

Next steps

I will continue to experiment with using MicroFlo for new projects, to develop more components and test/validate the architecture and programming model. I also need to read through all of the canonical book on FBP by J. Paul Morrison.

Some bigger things that I want to add include:

  • Ability to introspect the graph running on the device, in particular the packets moving between components.
  • Automated testing (of the framework, individual components and application graphs)  using  JavaScript BDD test frameworks like Mocha or Vows.
  • Ability to change graphs at runtime,  and then persist it to EEPROM so it will be loaded on next reset.

And eventually: Allowing to manipulate and monitor running graphs visually, using the NoFlo development environment. See bug #1.

Curious still? Check out the code, and ask on the FBP mailing list if you have any questions!