The Future of White Papers

Or how linked data can strengthen the connection between policy-making and evidence

The Housing White Paper

On the 7th February, the Department for Communities and Local Government published a white paper — ‘Fixing Our Broken Housing Market’. White papers like this attract a lot of interest from a broad range of people, because they deal with big issues, and issues don’t come much bigger than housing, especially when the acknowledgement in the foreword from the Prime Minister says, “Our broken housing market is one of the greatest barriers to progress in Britain today”.

The paper describes the scale of the challenge, and the sorts of things that need to be done to tackle the housing shortage. Other people have written about the paper, so I’m not going to do that. What was of particular interest to me was the amount of data quoted in the paper. Right from the beginning, the opening foreword by Prime Minister Theresa May states that “Today the average house costs almost eight times average earnings — an all-time record”, and the rest of the paper is littered with numbers, and maps, and charts, and facts.

In an unrelated article published on the Guardian website on the 31 January, National Statistician John Pullinger wrote about the fact that “in a post-truth world, statistics could provide an essential public service”. In it, he says that statistics can help us make decisions based on good evidence, rather than on prejudice. This is something that I’m particularly interested in — both in terms of how available data is used to inform decision-making (note: inform, rather than drive — see this talk from Jeni Tennison, CEO of the Open Data Institute), and also how the people that make the decisions provide this evidence back to the wider population.

So back to the white paper. As I mentioned, the paper is full of data, which doesn’t definitely mean that data is being used to inform decisions, but it’s a pretty good indication that it’s being considered. The problem with this, is that it’s not as accessible as it could be:

Page 10, Paragraph 2
Page 10 Footnotes 8–12

This section of a paragraph on page 10 quotes some statistics, and these statistics have a superscripted number. This number points the reader to a footnote at the bottom of the page, which then directs the reader to the source of the data. In this case, the English Housing Survey 2014/15. Googling the specific stat takes you to a series of news articles, while the English Housing Survey website on GOV.UK is a long list of links.

Now footnotes have been in use for almost 500 years, so I’m certainly not suggesting we get rid of them. But I do think there’s a better way of dealing with this — providing better links to the data/statistics used in documents like this.

Linked Open Data from the DCLG

This particular white paper was written and published by the Department for Communities and Local Government (DCLG). DCLG has a team within its Analysis and Data Directorate called Open Data Communities (ODC). ODC have responsibility for DCLG’s open data, and as part of that, they host a website, also called Open Data Communities. This is actually a linked data site, using Swirrl’s PublishMyData platform.

ODC currently hosts around 200 datasets from DCLG. One of the great things about the linked data on ODC is that each ‘thing’ in the datastore has its own webpage. This means each theme, dataset, or even datapoint (observation) has its own URL that anyone can browse to, and view the information in a wider context — metadata, associated data, and other related resources (depending on the thing being viewed). Because it has a URL, anyone can copy it and send it to someone else, in exactly the same way you might share a news article.

So back to the white paper. The paper already contains hyperlinks to other documents:

Footnotes 20–23 on Page 22

So where other documents are mentioned in the paper, they are linked to in the footnotes to provide additional context. Because of the way DCLG stores some of its data on Open Data Communities, there is a fantastic opportunity to link any data or statistics to the equivalent observation’s linked data page on the internet.

In-line Data Provenance

To try and explain this a little better — I’ve annotated a few pages of the white paper with some of the possibilities.

Page 9 — Annotated

Page 9 contains a statement that says “Since 1998, the ratio of average house prices to average earnings has doubled”. In the actual white paper, the footnote suggests that this has come from DCLG Live Table 577. This table has been created as a linked dataset on Open Data Communities, and so we could link directly to the dataset page, as if it were any other document.

This is useful, as it allows the reader immediate access to data and metadata, so things like the definition of the dataset can be checked. You can see that there is regional data provided, so it would be simple to see whether all regions saw the same increase in the ratio. It’s also easy enough to look at other years, to see whether 1998 was specifically chosen as the baseline to inflate the message (it doesn’t seem to be…).

In a similar vein, page 10 has a chart, and mentions some specific figures within the body of the text:

Page 10 — Annotated

As before, because the datasets being referenced here are hosted on Open Data Communities, we can link directly to the dataset’s page on the internet. We can actually go further than this, though. The chart at the top of the page is a simple visualisation of a single row of data. This row of data can be viewed on ODC, and so we could link to that, and we can also link directly to the chart.

There are also a couple of figures referenced on the page about home ownership amongst 25–34 year olds, which we can link to two observations:

Home ownership amongst 25–34 year olds just over a decade ago

Home ownership amongst 25–34 year olds today

Again, we can use this to find out more information about the figures being quoted: who publishes the data, and when will it be next updated, for example. We can also see the figures in context: looking at the trend, or even seeing what the other 63% of 25–34 year olds do:

And because of the way the data is structured, we can use this to explore the data — so clicking ‘Social renters’ in the chart above will open up a different slice of the data:

These references could be repeated throughout the document, wherever data / stats are mentioned — providing direct links into the evidence that has been used to inform the policy choices. As an exercise, this shouldn’t be too onerous, because the data has already been sourced for the document.

Whilst I’ve used the recent white paper as the example, the methods here could be applied to all sorts of things, from official reports, to academic papers and even blog / social media posts. And creating truly web-enabled documents using linked open data is a powerful way to make documents more useful, and decisions more transparent, by supplying the provenance of the data being used to inform those decisions.

Find out more about Open Data Communities here.