DCSF Statistical Releases, the BBC and Better Data Formats

Simon Dickson picks up an interesting story from the BBC’s Editors’ blog about official releases of statistics.

Usually, when the Department of Children, Schools & Families releases new statistics, they’re given to the media in advance. The media need this lead time to be able to format all their articles and tables and make sure everything is correct and works properly: this is fair enough, given that they’re the data they’re working with lives in lots of Excel spreadsheets, with multiple sections, differing layouts and everything else you really don’t want if you’re tasked with this kind of job.

Given what they have to work with, the BBC’s anger is understandable, but perhaps misplaced: why are we still dealing with bunch of spreadsheets in the information age? Why isn’t there an API that allows this data to be queried, or at the very least, a standard data format that doesn’t change from year to year, and doesn’t reply on proprietary technologies that are hard to work with?

An API or standard data format would allow media organisations to write code which generates the statistics they need every year. They wouldn’t have to create new tools to be tweaked and tested every time there are new statistics. Better still, it would create a market for someone to create a tool that did this for them, saving them money. Even better than that, it would allow anyone who wants to do something innovative with these statistics to do so far more easily.

I think I’m not alone in saying that the case for releasing data properly — in reusable formats, to everyone, for free, whenever it is possible — has been made, has been heard and has been widely accepted as valid.

Why are we still fiddling with messy spreadsheets, and bemoaning the fact that we only have days to do what should take hours?