Reconciling Open Data and Privacy

privacy_google

Digital rights advocates have a hard time these days because the ability to manifest things online and the growth of online data have brought two “rights” into conflict.  On one hand transparency and accountability advocates argue that getting government data online in all its forms will contribute to accountability.  On the other hand privacy advocates are concerned that our right to privacy may be undermined through the collection of data for whatever end virtuous or nefarious.

opendata_googleThis is challenging because the very serendipity that the Open Data movement hopes for in the innovative combining of datasets to add value, may end up compromising the privacy of individuals or groups through the “jigsaw effect”.  More about  this in a previous post.

Today I’d like to propose a scenario that might allow Open Data and Privacy to get along more easily.  Here are the steps I think need to be taken to do it:

  1. I think the public statements like the G8 Open Data Charter need to explicitly acknowledge privacy as a core issue. The recently published Open Economic Principles do a lovely job of this and are a great step in this direction.  I see no reason why Economics should be unique in this regard.
  2. “Open by Default” is arguably the most essential principle of Open Data.  Open by Default is an attempt to effect cultural change in data sharing, to create a culture of openness.  We need to preserve that cultural orientation in any discussion of privacy.
  3. My suggestion therefore is that every open dataset should have a privacy statement, even if it is as simple as something that states “There are no known privacy issues related to this dataset”. Any dataset without a privacy statement shouldn’t be published. Existence of a privacy statement reveals that someone has at least thought about privacy related to the dataset, however briefly.  There is a saying that the most important thing about a strategic plan is that it reveals that strategic thinking  has taken place.  Similarly here, a privacy statement reveals that someone has thought about the privacy implications of releasing a given dataset.
  4. It will be important to be able to attribute privacy statements in order to facilitate an ongoing dialogue.  Privacy changes with time and context.  Any privacy statement may require revision and new input as things change.  We ought to be able to follow reasoning back to its source.
  5. To assist this, we need to develop methodologies and processes for having a structured conversation about privacy related to any particular dataset.  A list of structured questions might help as a beginning to help people think through privacy issues as efficiently as possible.
  6. I think we also need to acknowledge that there may be degrees of openness / privacy which provide limited access in some cases. The binary notion of open or closed isn’t going to get us far enough.

In summary, for Open Data,  the existence of datasets ought to be mandated to be open under any circumstances, and openly accessible subject to the publication of a privacy statement.