What Good Public Data Looks Like
Public sector data — collected, stewarded, and shared by government in the public interest — is foundational to an informed and thriving democracy. Over the last year, Hawaiʻi Data Collaborative has written about concerns with shifts in federal data priorities, our dependencies on federal data, and promising efforts to bolster local public data capacities. Changes at the federal level have sparked rich engagement across the country about the need for robust public data, and what it would mean for that data to be compromised.
Guiding Principles for Public Data
This month, the Association of Public Data Users (APDU), a national membership nonprofit that strengthens public data through linking public data stakeholders, education and advocacy, released their APDU Guiding Principles for Public Data. Building from the guiding principle that “public data should be public” — of, by, and for the people, APDU offers six supporting principles to ensure public data meets that promise.
Public Data Are Public and Accessible
When data is collected by government entities, it should be made available in conformance with privacy and ethical guidelines, and made accessible through standardized formats, documentation, readily navigable open data platforms, and multiple modes of access (e.g. API, downloadable files, and interactive tools).
Privacy and Confidentiality are Protected
When public data is collected from individuals, households, or private organizations, there is tremendous opportunity to aggregate and share that data for broader public benefit. It is of primary importance, however, that robust safeguards are in place to ensure privacy and confidentiality throughout the data lifecycle (see below). While tradeoffs across privacy, utility, and accessibility are inherent to all data collection, privacy should be the priority.
Every Step of the Data Lifecycle Supports Ethical Data
The lifecycle of data is the pathway it follows from collection, to processing, to storage, to utilization/analysis, and finally to archiving/destruction. At every step, stewards of public data are responsible for ensuring the process balances maximal public benefit with minimal risks to the individuals and entities the data represent.
Public Data Are Transparent
Public data sources, and the lifecycle through which the data is developed, should not be a black box. The purpose of data should be made clear, and the management of data through the lifecycle should be clearly documented and accessible to the widest possible audience (i.e. documentation should be both robust and understandable to those with a non-technical background). Changes to public data procedures should undergo a public review process, with decisions ultimately being determined both by use case and public comment.
Public Data Are Fit-for-Purpose
The utility of data is a function of its timeliness, accuracy, completeness, and how well it represents what it is supposed to measure. Data should be maximally useful to the needs of the agency(s) collecting the data, and for broader identified use cases. Data should also be equitable in its representation of people, with procedures to ensure completeness for the established use case. Finally, there should be regular evaluation of public data sources to identify opportunities for improvement, and to ensure data continues to be relevant.
Public Data Systems are Sustainable
Public data is information infrastructure. Like roads and buildings, it requires a commitment of resources to sustain, upgrade, and modernize over time. Much like physical infrastructure, there is a risk that the utility and quality of data, as well as the safety of those the data represents, will degrade over time if maintenance is not taken seriously.
Recommendations for Applying the Principles
The APDU report provides numerous recommendations for how each of these principles can be achieved and sustained. The recommendations encompass the following themes:
Supportive policies and budgeting: Much of what agencies are able (or compelled) to do with data depends on clear policies established in statute or local law.
Strong data governance: Established rules and procedures for access to, and management of, data held within and across agencies ensures data is both accessible and protected.
Robust platforms: Holders of public data need sufficient IT infrastructure to manage and utilize data effectively internally, and to make that data broadly accessible to public users.
Empowered data stewards: Roles within an organization must be held by qualified individuals who are resourced, hold authority, and are tasked with managing data effectively.
External oversight: Formal bodies composed of external experts, with sufficient access and authority, should have the authority to evaluate the adequacy and sustainability of public data sources.
For many, these guidelines may seem obvious, but given recent events at the federal level, they should not be taken for granted. Locally, these recommendations can serve as the foundational principles by which we assess current systems and build more robust data capacities in the years ahead.