Many of us spent much of 2020 looking at news reports and websites filled with visual representations of COVID-19 data. We saw epidemiological modelling, financial modelling, graphs, mortality rates, and much more all represented in vivid colors and complex, intersecting lines and bars. Among the many reasons for which we will remember 2020, one will be that it was the year the benefits of data went mainstream.
Yet the COVID-19 pandemic also exposed many of the ways our policies concerning data sharing, quality, and governance have fallen short and failed to provide decision-makers with the information they need when they need it. Although the pandemic is far from over, the development of multiple vaccines that will slow its advance means we must start assessing what we have learned, creating new plans of action, and preparing for making sure we are better prepared next time.
The Trinity Challenge is a coalition of industry thought-leaders including Facebook, Imperial College, University of Cambridge, and the Bill and Melinda Gates Foundation. It is dedicated to improving the use of data and analytics to predict, identify, respond to, and recover from future health emergencies. In December 2020, the coalition published its report Better Decisions to Protect Against Health Emergencies. The lessons it contains will be of critical importance to crafting public health policies that can meet future crises.
What are the immediate challenges?
Efficiency: The right people need to access and analyze the right data at the right time.
Data is “right” when it supports the questions we need to ask. In the case of COVID-19, this would include not only questions about distribution and cause, but those about such themes as economic behavior, agricultural prices, and financial transactions. The people are “right” when they are data analysts, data translators, and leaders who can get the access to the data they need without having to search for it or find complex workarounds to make up for a lack of timely data-sharing agreements or lack of access to privately held data that is intended for commercialization. The “right” time is when it is needed, which is immediately. This means finding ways around regulatory challenges or process inefficiencies.
Effectiveness: Insights from data analysis should not remain siloed and fragmented.
Data needs to be shared to be useful. Collecting data and then sitting on it accomplishes nothing. However, doing this is more complicated than simply handing raw or analyzed data over to another party. A lack of global standards and definitions in many areas can hinder analysis and prevent knowledge sharing. COVID-19, for example, lacks universal definitions for measuring mortality and distribution, which makes it difficult to share information between agencies in different geographies. Data is also frequently fragmented and incomplete. During the COVID-19 pandemic, a lack information about supply preparedness, supply chain resiliency, and consumer behavior made for decision-making that was slow or poor, with the likely cost of an inestimable amount of human suffering and financial loss. Finally, academic and business institutions must be incentivized to share the data they have instead of protecting it in anticipation of publication or commoditization.
Sufficiency: Decision-makers should have the information they need.
In 2002, United States Secretary of Defense Donald Rumsfeld said the following about the situation regarding the presence of weapons of mass destruction in Iraq: “There are known knowns: there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult ones.” While a data scientist would probably add another important category, that of the things we don’t know we know, this is an example of the power of data. Data that is collected, subject to quality processes, analyzed, and used can bring everything into the category of known knowns so that the leadership can make the right decisions. This means that traditional barriers between private, public, and academic organizations need to be overcome to create a universal approach to the sharing of critical data.
What Can We Do Right Now?
Accelerate efforts to make better use of publicly available data.
Data governance and quality initiatives are vital for creating reusable, sustainable data that can be easily accessed by the people who most need it, including researchers and decision-makers.
Increase accessibility to privately held data.
While private data will continue to be an important commodity for commercialization, regulatory and commercial considerations are not sufficient to protect data during a time of crisis. Private organizations must be incentivized to share critical data with those who need it during public emergencies.
Data providers and decision-makers must be in closer contact.
Decision-makers will have a better understanding of the benefits of data governance and sharing when they spend more time communicating with analysts and data generators, which will contribute to more opportunities for collaboration to respond to complex challenges.
The Importance of Data Quality
In addition to governance and sharing of data, organizations must consider data quality. The COVID-19 pandemic has shown us that while the way in which we use data is important for understanding and predicting crises, the quality of the data is crucial to ensuring that errors and flaws do not undermine those efforts. There are six data quality dimensions that must be considered:
- Completeness: The presence or absence of expected values.
- Accuracy: The extent to which the data represents the real world.
- Validity: The compliance to data standards.
- Uniqueness: The lack of duplicate data.
- Consistency: The consistency of the data across different sources.
- Timeliness: The availability of data for consumption.
While the term “data-driven” is one that has increasingly come to prominence during the COVID-19 pandemic, barriers such as proprietary and regulatory barriers, poor data quality, and lack of standardization mean that there is still considerable work to be done before the power of data can be fully brought to bear on future health crises.
The Trinity Challenge. (December 2020). Better Decisions to Protect Against Health Emergencies. https://thetrinitychallenge.org/media/1279/ttc_better-decisions-to-protect-against-health-emerg_final.pdf. Accessed January 5, 2020.
World Economic Forum. (December 15, 2020). Data Can Prevent the Next Global Health Emergency. Here’s How. https://www.weforum.org/agenda/2020/12/data-can-prevent-the-next-global-health-emergency/. Accessed January 4, 2021.