OpenSensors are pioneers in open data and the internet of things, surfacing a wide range of data sets for open analysis. As an open data aggregator we deliver content over a common infrastructure; whether air quality or transport data, you only have to think about one integration point. Future cities need low data transaction costs for friction free operation, bridging technical gaps slow progress, so keeping the number of integration points low makes sense everybody.
Our journey starts here, as we build out our open data content expect to see more stories, more insight and hopefully some catalysts for positive change.
Before our first story, consider what will make open data and the Internet of things useful.
We must bridge the gap from data to information, allow consumers to abstract away the complexity of IoT to ask questions that makes sense to them.
Take data from the London Air Quality network (LAQN), the network is sparse so it’s improbable our need maps directly to a sensor. By coupling some simple python code with OpenSensors data we’ll mash some LAQN data together to get some insight about air quality in wapping.
In this story I’ll show how we can bridge the information gap with some simple code, yielding valuable insight along the way!
Chapter 1: OpenSensors.io Primer
First a quick primer on how data is structured in OpenSensors.io (for more detail check out our forum and glossary of terms)
- Devices – Each connected ‘thing’ maps to a secured device, things map one-to-one to a device
- Topics – Data is published by devices to topics, a topic is a URI and is the pointer to a stream of data
- Organisations (orgs) – An organisation owns many topics and is the route of an orgs topic URI
- Payloads – Payloads are the string content of messages sent to topic URI’s, typically JSON
Also check out our RESTful and streaming APIs on the website for more background and online examples.
Chapter 2: Putting JSON to Work
You can use the OpenSensors REST API to gather data for research, but it comes in chunks of JSON which isn’t great for data science. For convenience i wrapped up some common data sources for London into a python class. Since IoT data is rarely in a nice columnar form it’s valuable to build some simple functions to shape the data into something a bit more useful.
Chapter 3: Introducing the Turks Head
I’m fortunate to spend a lot of time in Wapping, in and around the community of the Turk’s Head Workspace and Cafe, but unfortunately we don’t have a local LAQN sensor. With a bit of data science and OpenSensors.io open data we can estimate what NO2 levels might be around the cafe and workspace.
A simple way to estimate NO2 is a weighted average of all the LAQN sensors, in this case we derive the weights from the distance between the sensor and our location. Since we want to overweight the closest sensors we can use an exponential decay to deflate towards zero for those far away.
For the Turks Head sensors in Aldgate, Southwark and Tower hamlets and the City are the closest and have the biggest impact on our estimate.
Chapter 4: Getting into the Data
With our air quality time series, and our weights we can dig into what our estimates for the Turks Head look like (NO2 * weight). Here’s the series for NO2 over the last 20 days, it looks like the peaks and troughs repeat, and the falling or rising trend is persistent in between.
Trend followers in finance use moving averages to identify trends, for example the MACD indicator (moving average convergence divergence). MACD uses the delta between a fast and slow moving average to identify rising or falling trends, we’ll do the same. For our purposes we’ll speed the averages up using a decay of 3 and 6 periods (LAQN data is hourly and we are resampling to give estimates on the hour).
What can we conclude from the charts for The Turks Head? From the left hand chart we can see the data is little noisy, with a flat line showing some missing or ‘stalled’ data. Looking at the 3 and 6 period decayed averages the data is smoother, with the faster average persistently trending ahead of the slower one.
Even with fast moving decays the averages cross only a couple of times a day, showing persistence when in trend. So using a simple trend indicator and the LAQN we can build a simple air barometer for the Turks Head.
Good 3 period exp average < 6 period average (green) Bad 3 period exp average > 6 period average (red)
This is helpful because, given a persistent trend state, where we have a ‘good’ air now, we’ll probably have ‘good’ air for the following hour.
Chapter 5: What’s the trend across London?
So we now have means of defining how NO2 levels at the Turk’s Head are trending, but is the trend state predictable over a 24 hour period?
Remember we define good or bad air quality trend as:
Good ‘fast’ average < ‘slow’ average = falling NO2 Bad ‘fast’ average > ‘slow’ average = rising NO2
If we aggregate data into hourly buckets we can visualise how much of the time, over the past 20 days, a sensor has been in a up trend (‘good’) for a given hour.
x = hour of the day y = percentage of bucket that is in a ‘good’ state
We can see that for each 1 hour bucket (24 in total) there is a city wide pattern; if we aggregate across the city (using the same measure, the percentage of sensors in up or down trend) we get an idea of how NO2 trends over a typical day.
Our right hand chart shows the percentage of ‘good’ versus ‘bad’ NO2 sensor states across London over the past 20 days (collected from about 80 sensors over 20 days)
Now this is a really simple analysis but it suggests the proportion of ‘good’ trends across London is high before 7am, and then falls away dramatically during the morning commute. No surprises there.
But the pattern isn’t symmetrical; after peaking around lunchtime, when only ~20% of the cities sensors having improving NO2, NO2 falls throughout the afternoon. From a behavioural standpoint this makes sense; there is a more concentrated morning commute relative to the evening. Most of us arrive at the workplace between say 8 and 9am, but in the evening we may go to the gym, we may go out for dinner, or just work late. The dispersion of our exits from the city is wider than when we enter.
Chapter 6 – PM versus NO2
So we have considered NO2 as our core measure, in part because there are more sensors in the LAQN delivering this data than particulates. But let’s consider particulates for a moment, LAQN deliver PM10 and PM2.5 measures, the definition can be found here.
Our temporal curves for particles differ from NO2 taking longer to disperse during the evening rush hour (remember we are measuring percentage of sensors in a ‘good’ state). As a measure of air quality NO2 builds up faster, and decays faster once peak traffic flows have completed, whereas particles linger only fading deep into the night (on average).
In our data set, NO2 and PM measures differ in their average behaviour over a typical 24 hour period.
- Behavioural interventions will need to consider whether particulates or N02 are the most impactful.
- How can we communicate air quality to our citizens, and relate their personal needs to the measures most impactful on their lives?
- Do we need additional sensors to create a more dense air quality resource? How can we allocate funds to optimally support network expansion and air quality services?
- Knowing the characteristics of a sensor (location, calibration, situation (elevated, kerb side, A or B road) will improve estimates, how can we deliver this meta-data?
Plenty of food for thought…………..information
Notes and Resources
Our stories are quick and dirty demonstrators to promote innovation and should be treated as such. All data science and statistics should be used responsibly 🙂
All of the code supporting this can be found on github with data sourced from opensensors’ LAQN feed, and I use a postcode lookup to get long/lat locations for wapping. I’ve also taken some inspiration from https://github.com/e-dard/boris and https://github.com/tzano/OpenSensors.io-Py so thanks for their contribution!