Latest News

Open Data And Policy Evidence Are Here To Stay

Over the first year of Covid-19 in the UK, the demand for data analysis boomed as never before. It seems that more was shared by government departments, agencies and other authoritative sources than ever before. Journalists took this up and news outlets competed to deliver the most up-to-date, meaningful and understandable data-driven content. And critically, the public engaged with data as never before. This transformation of public understanding and appetite for data is here to stay, which means higher standards for transparency around evidence and decision making.

In this article, Robert Grant considers what this will mean for public sector and NGO analysts and decision-makers.

Government

Open data and accountable decision-making are not new, but the public and organisational appetite for data is. This data (or statistics) helps us understand whether policy is justified, and plan for our own circumstances. That places demands on public sector and charitable data sources as never before. Politically, it is very unlikely that the cat will go back into the bag. It no longer seems trustworthy to ask those who are not privy to the data -- the public, companies and smaller public sector organisations -- to follow policy because some unspecified analysis happened on some unspecified data. Only three years ago, it was perfectly feasible for Brexit planning to be based on an expert analysis which the government refused to publish.

If this journey toward open data flows and understanding statistics within the timeframe of covid is a microcosm of a longer term trend, then what is new? Two rapid changes have appeared: a further degree of openness in national official statistics supporting policy decisions, and a critical consumption by first journalists, then the public.

Open data has been an imperative of UK government since the Cabinet Office's white paper on the subject in 2012. Availability has steadily gone up and latency has come down. There is even a requirement to move from static publication formats to interactive, queryable APIs. This has given an opportunity to local organisations with the technical skills in house to create their own tools that draw on national government data. The Open Data Institute has been actively promoting this efficient reuse of data and hosts a variety of case studies on their website.

Before Covid, government data, however much it may have technically been "open", was not something that journalists invested effort into checking and investigating, let alone members of the public.

Over the course of 2020, we saw a gradual expansion of the detail supporting policy decisions, from national to regional to local statistics, and from point predictions to predictions with uncertainty to alternative parallel models. Interestingly, this was not enough for the a parliamentary committee, which issued a public complaint about the government's lack of data sharing and information supporting policy.

Some topics, like positive test results, were also broken down by age groups to illustrate particular trends. This may have been an exciting new level of transparency, but it quickly became the new normal; when the same granularity was not provided for vaccination statistics, it attracted official complaint from the UK Statistics Authority. It seems that at least this official regulator of government statistics will not be content to return to pre-Covid levels of openness.

The public

However useful the local statistics were, it soon became apparent that infections spread along edges in social networks, which are largely not known to government, and spill over (also in the economic sense) geographical boundaries. The obvious next questions for critical consumers of the stats is "what is MY risk?" There is usually some way in which it is obvious that each of us will differ from the averages for our nation or even neighbourhood, but it is not at all obvious just how much the numbers will change.

The public have grappled with several new concepts such as incidence, prevalence, exponential growth, and competing statistics for diagnostic test accuracy. These are well-known pitfalls for students in medical statistics and epidemiology, and the fact that the public are stumbling across them means that the general level of understanding is rising fast.

By April 2020, there was even a TV comedy show joke about armchair epidemiology. When the public gained the insight to spot poor analyses, the era of the "data bros" was over.

Journalism

In those early days, when there was a lack of information, we also witnessed the phenomenon of "armchair epidemiologists": in the absence of authoritative forecasting early on, anyone with an Excel spreadsheet might produce a compelling-looking forecast, and be taken seriously by those who are searching frantically for any information.

Among the sins committed in this time were fitting normal probability density functions to cases, fitting polynomial curves to cases, and comparing countries on disease burden by counting cases (Monaco is doing really well!). It's easy to laugh at these errors in retrospect (and in possession of a degree in statistics), but each was adopted briefly by, let's just say, prominent organisations (there's nothing to be gained by naming and shaming after we have all learnt so much in a short time). In short, if accountable people who have the data don't communicate, someone else will. And if those in power don't have data either, they might just get pulled in by the allure of certainty.

David Spiegelhalter and Tim Harford, popular translators of complex analyses, were busier than ever explaining and critiquing the numbers. Often, a reframing of the same number brings a new insight. For example, from reporting the number of Covid deaths (hard to contextualise) to the % of deaths which were due to Covid.

"We are four weeks behind Italy" also came from the period (10 March 2020) when we had little information except for some confirmed cases by PCR test, which at the time was only being used on the most seriously ill people. But it had the advantage of narrative and referred to demonstrable, empirical events, and mobilised action in government and concern in the public.

Widespread dispute of, first, the Imperial epidemiological model (16 March 2020), which provided only a point prediction for any given timepoint, and later Patrick Vallance's "illustration" of exponential growth (21 Sep 2020), seem to show an intolerance of prognostication based only on theory without data, while predictions without uncertainty will almost inevitably be wrong. I think this is a new development in public discourse. It must be led by journalism and so, perhaps, comes out of a longer trend in data journalism, data visualisation, and numeracy in the media.

What's next?

The UK has an unusual system for data sharing from government: there is a policy of open data, and an independent statistics regulator. That makes it less likely that this recent trend will be reversed here, though it may be elsewhere. We might expect to see other parts of government, central and local, being held to the same standards, but it is not as simple a comparison as that.

Local government (including public health) have to work hard to build and maintain the infrastructure and systems needed for low-latency, high-quality data. Even where they succeed, local data is small data, which means more noise and more risk of identifying individuals.

Also, there are many aspects of policy-making that elude a simple number, notably, where competing interests have to be balanced. This is more the realm of economics than statistics, to develop utility models of the various harms, benefits and costs accruing to different parts of society in different ways. Even then, the politician is traditionally tasked with making a value judgement to synthesize the evidence.

Beyond the numbers, we have all been confronted with the fact that policy succeeds or fails purely on the extent of public understanding and support, or at least faith. Previously, faith was more the norm than critical querying of the statistics behind policy decisions, not least because the stats were hidden from view, or presented in confusing and, to be honest, boring formats.

Analysts and policy-makers need to be prepared to justify decisions more, whether that's in public health or elsewhere. You should expect your audience to be more critical, more quantitatively minded, and more curious than ever before. Covid-19 did that. But before you fear this new age of scrutiny, remember that they also appreciate your efforts more.

Robert Grant will be presenting the Introduction to Model Building Techniques with Stata, 23 June 2021. How do you know if your models are useful? Or maybe even wrong? This one-day course provides an introduction to the techniques used in model building.

This article is written by Robert Grant, a chartered statistician at BayesCamp.

Robert's email: robert@bayescamp.com