Bird feeders live




Following my blog post on Dear Data where I wished for more collecting, curating and sharing of small-scale, local data, I started measuring how much birdseed I topped up in two feeders in my garden. Here, they get converted to rates, those rates smoothed using cubic splines, and plotted over the year. The customers are mostly house sparrows but there's occasionally a robin, and at certain times of the year, great tits (spring) and goldfinches (summer). Now that the collection has extended over more than one year, you can see whether there are consistent seasonal patterns. I don't know if this represents an ornithological breakthrough as much as a dataviz exercise, but reflections and technical notes are below for the curious.

Reflections

July 2015 seemed to be a busy time and a lot of seeds were getting munched each day. However, when I returned home one day, having topped up in the morning, I was surprised to see the entire lot scoffed. I suspected a squirrel attack and made a note to disregard that day's data. Much later, I tried out the splines and ran it with and without the suspicious day. It seemed to make little difference, not because of the smoothing but because it wasn't really as unusual in retrospect as it seemed at the time. This seems to be an important lesson: why be systematic about the whole analytical process but make choices about validity of the data based on nothing more than a hunch? But that's often what we do.

On the other hand, your data are never really totally free from some guesswork, assumptions and that'll-do. There is, as far as I know, a small amount of mammalian pilfering, spillage etc, but I decide to ignore that. The sharp drop in March 2016 was the result of work going on to replace the roof on our garage, which introduced scary humans into the garden all day. Big wood pigeons have sometimes managed to get at the feeders and cram their crops full of No-Mess Sunflower Seeds, and I've moved it away from them (if you want to get fed in an English garden, be small and cute) but haven't sought to adjust the data at all; I tend to think that's a slippery slope to go down. None of these are a priori decisions but made up as I went along.

I interpret the pattern along these lines: in mid-summer, the consumption increases massively as all the chicks leave the nest and start learning how to feed themselves. The sparrows in particular move around and feed in flocks of up to 20 birds. Once seeds and berries are available in the country though, it is safer for them to move out there than to stay in the suburbs with prowling cats everywhere. But as the new year arrives, the food runs out and they move back in gradually, still in large flocks, before splitting into small territories to build nests. Cycle of life and all that.

Technical notes

It's a little bit naughty to call this "live" because the data are captured using those cornerstones of science, the pencil and the tape measure.

Every time I topped up the seeds, I wrote down the date and the height that had been added to the feeders. They are not quite the same cross-sectional area, but close enough. Later, this went into a .csv file, so now I just add new measurements to that using a text editor (you could use a spreadsheet if you prefer). Next, I have an R script to convert this to rates and thence to splines. (Splines are a way of getting a smooth curve through your data. In the same way you could interpolate between points with a straight line, splines use simple curve shapes and these can be fitted to your data relatively easily by the computer.) The rates are important because measurements are made only when topping-up, and not at regular intervals. There is an assumption here that consumption is approximately constant between top-ups, but this seems justified because the splines will smooth out bumps anyway, and because the feeders are not large: when there is a feeding frenzy, they could be emptied in a day, so if there is a sudden increase, it will soon be reflected in the data. If I went on holiday for a few weeks in the summer, that would lead to missing data (and hungry birds) but in 2015 and 2016, that didn't happen.

The R script produces two .csv files, one for rates and one for splines. Each year has its own pair of these files. These are uploaded and the page you are looking at contains D3 JavaScript code that turns those files into the chart before you. (You can view that in the 'source code' in your browser - how to do that depends on what browser you have, but you can just search for instructions in the browser's help documentation.) If the data files change, the chart changes too, so when I top up, I measure the heights, add them to the raw .csv file, run the R script, and upload the resulting pair of .csv files. Processing the data through R before uploading is not quite as elegant as having the rates and splines all done inside the JavaScript, so at some point in the future I will switch to that, probably using the smooth.js library, or respectfully stealing from Paul Lambert's script.