Tag Archives: Open Data

Reconciling Open Data and Privacy


Digital rights advocates have a hard time these days because the ability to manifest things online and the growth of online data have brought two “rights” into conflict.  On one hand transparency and accountability advocates argue that getting government data online in all its forms will contribute to accountability.  On the other hand privacy advocates are concerned that our right to privacy may be undermined through the collection of data for whatever end virtuous or nefarious.

opendata_googleThis is challenging because the very serendipity that the Open Data movement hopes for in the innovative combining of datasets to add value, may end up compromising the privacy of individuals or groups through the “jigsaw effect”.  More about  this in a previous post.

Today I’d like to propose a scenario that might allow Open Data and Privacy to get along more easily.  Here are the steps I think need to be taken to do it:

  1. I think the public statements like the G8 Open Data Charter need to explicitly acknowledge privacy as a core issue. The recently published Open Economic Principles do a lovely job of this and are a great step in this direction.  I see no reason why Economics should be unique in this regard.
  2. “Open by Default” is arguably the most essential principle of Open Data.  Open by Default is an attempt to effect cultural change in data sharing, to create a culture of openness.  We need to preserve that cultural orientation in any discussion of privacy.
  3. My suggestion therefore is that every open dataset should have a privacy statement, even if it is as simple as something that states “There are no known privacy issues related to this dataset”. Any dataset without a privacy statement shouldn’t be published. Existence of a privacy statement reveals that someone has at least thought about privacy related to the dataset, however briefly.  There is a saying that the most important thing about a strategic plan is that it reveals that strategic thinking  has taken place.  Similarly here, a privacy statement reveals that someone has thought about the privacy implications of releasing a given dataset.
  4. It will be important to be able to attribute privacy statements in order to facilitate an ongoing dialogue.  Privacy changes with time and context.  Any privacy statement may require revision and new input as things change.  We ought to be able to follow reasoning back to its source.
  5. To assist this, we need to develop methodologies and processes for having a structured conversation about privacy related to any particular dataset.  A list of structured questions might help as a beginning to help people think through privacy issues as efficiently as possible.
  6. I think we also need to acknowledge that there may be degrees of openness / privacy which provide limited access in some cases. The binary notion of open or closed isn’t going to get us far enough.

In summary, for Open Data,  the existence of datasets ought to be mandated to be open under any circumstances, and openly accessible subject to the publication of a privacy statement.

The Open Data Cart and Twin Horses of Accountability and Innovation


Let me start by saying that the ideals of the Open Data movement: transparency; accountability; and, citizen innovation, are ones that I hold near to my heart.  Further, I admire and respect many of the leaders of the Open Data movement.  However, and it may simply be my own myopia, I have yet to see many voices for caution with respect to Open Data.  In particular, what concerns me most at the moment is the air of euphoria in which Open Government Data in particular is being pitched to the public.  The recent series of articles around the proposed G8 Open Data Charter such as this one by Martin Tisné and this by Professor Sir Nigel Shadbolt create an expectation about data that will, at best, be very hard to fulfil and, at worst, actually shift the attention from where it ought to be.  Here are some of the things I worry about.

Data as Fetish

Children’s charity organiser Benita Refson is recently quoted in the Guardian as saying: “If you’re not using data, you’re just another person with an opinion”. Somehow this quotation sums up all of my pent up apprehension about the Open Data movement. Implicit in the above is the assumption that “data”, “facts”, and “truth” are roughly equivalent.  We see this in common expressions like “The data just doesn’t back it up”.  This amounts to a kind of fetishisation of data as having some mystical, immutable truth quality, a quality which of course does not exist.  Data is collected by people.  Even when it is collected by machines, it is collected by machines designed by people.  And this means that data is vulnerable to all of the vagaries that humans are prey to, bias, laziness, hidden purpose, myopia, etc.  We choose what data to collect and how often and where.  We choose what level of quality we would like.  We choose how to represent that data and what story we think the data is backing up.


As more data accumulates our ability to see whatever we want to see increases.  Researcher Kate Crawford and author Nassim Taleb have both eloquently pointed out that with larger amounts of data the ease of coming to a false conclusion only increases.  Taleb says:

big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is.

Researchers ranging from Duncan Watts to Daniel Kahneman have documented our human tendency to apply our expectations and biases to what we see.  The image at the left, which recently became a popular meme on the Internet, illustrates the dangers of what can be done with too much data.

If we allow “data” to retain this mystical aura of “fact”, we run the risk of allowing ourselves to be swayed by the truthiness of the data sources we happen to like.  Far more reassuring for me is the Open Science Data movement which seeks to expose these very issues by allowing other scientists to corroborate or contradict the findings of other researchers using the same data.  The next Open movement I would like to see is one about Open Corroboration where we come up with better means for assessing the validity and meaning of data, news, research, etc.

Not to mention the fact that the very representation of data can influence how it is interpreted.  The rise of infographics has made data even more difficult to interrogate and can be used to skew the interpretation of data.  See this article on how the venerable pie chart can be used more to confuse and mislead than inform.

Data and Privacy

Also a key issue for me are the privacy implications of Open Government Data policies.  If the new default for all government data is to be open, then it is inevitable that at some point the Mosaic Effect will come into play.  The Mosaic Effect occurs when the information in an individual dataset, in isolation, may not pose a risk of identifying an individual (or threatening some other important interest such as security), but when combined with other available information, could pose such risk.  A recent US Government policy memo instructed agencies to they must  account for the “mosaic effect” of data aggregation.  This can be quite hard to do and there is concern among the Open Government Data community that such a requirement could be used as an excuse to block the publication of data. A better strategy would be for the Open Government Data community to move towards a more demand-driven approach.  Indeed this is the recommendation of UK researcher Kieran O’Hara in his paper, Transparent Government, Not Transparent Citizens: A Report on Privacy and Transparency for the Cabinet Office, where he suggests:

In a demand-driven regime, information entrepreneurs would ask for the datasets they felt they needed, or felt that they could use to create value, whether social value or commercial value (profit) for their own firms. This suggests two requirements.
1. Entrepreneurs must know what datasets there are.
2. There must be a screening process to ensure that privacy-threatening releases (and other problematic releases, such as ones which might threaten national security) could be challenged and blocked.

The issue of privacy is not limited to that of personal information.  For instance, what right does an endangered species have to not have its location disclosed.  The impact of making datasets public is hard to predict.  Martin and Prof Shadbolt have done a great job in highlighting the positive but there are negative outcomes as well.  Consider the saga of the map of registered gun owners that was published in the wake of the Newtown shooting.  Patrick Meier has a thoughtful analysis of what happened and its implications.  He ends by drawing on a basic principle from the hippocratic oath, do no harm.  Addressing the “do no harm” standard in some way should be an essential pre-requisite for publishing any dataset.

Data and Complexity

Yet another concern for me is the danger of looking a complex systems in an overly simplistic way.  The fact that we can measure one aspect of a complex system may give us the temptation to intervene without fully understanding the systemic role of the thing we measure.  It may give us a false sense of understanding.  If we have complete scrutiny of government, will that affect the way civil servants and politicians behave in an entirely positive way?  Or will it simply move existing bad behaviour into areas beyond scrutiny.

A top-down, data-driven, supply-side approach to Open Government is also likely to mean a focus on what is available as opposed to what is needed.  As Einstein (and others) famously said, “Not everything that counts can be counted, and not everything that can be counted counts.”

Open Data as Symptom or Cause of an Open Government

Finally, real Open Government is a cultural issue not a data issue.  I worry that the disproportionate focus on datasets will distract from the harder challenge of building a culture of openness.  I can’t help but wonder whether Open Government Data is a symptom of Open Government and not the other way around.  Of course it is a bidirectional relationship but perhaps it only becomes bidirectional when there is a sufficient level of cultural openness in government.

This may be less of a worry in the G8 nations where there is a reasonably robust civil society to advocate for openness but for countries without that luxury, a focus on Open Government Data may not bring about the desired results.

A move towards a more demand-driven approach would have the benefit, not only of allowing due consideration of the merits and impact of releasing specific datasets but would also hopefully catalyse more of a dialogue (as I have argued for previously) between civil society and government.  To paraphrase JFK, ask not what your Open Government can do for you, ask what you can do to open your government!

Right Openness If You Meet the Open Guru on the Road, Kill Him

Buddha waiting in trafficLet me start by saying how much I love Open Source software, peer production, the tide that raises all ships, Wikipedia, all things “open”. It is part of how I define myself. I love what happens when people share expertise, resources, their spare time. It makes me feel like I am part of something larger. It makes me feel powerful and creative, only my effort and imagination can hold me back. Yet, for some time, I have felt a growing unease with the “open” movement.

I think it started back in 2006 when the South African government established a policy directing the use of Open Source software within government departments “unless proprietary software is demonstrated to be significantly superior”.

This policy did not achieve its aim of converting government departments to the use of Open Source. If anything it probably alienated civil servants more than it made them converts to Open Source. It made them feel like FOSS was some kind of second class solution they were obliged to use because they couldn’t afford the best. I knew I didn’t like this policy at the time but I couldn’t really put my finger on why except for the basic awareness that nobody likes to be forced to do something, even if its good for them.

Fast forward a year to 2007 and we have Tony Blair saying:

“Open v closed” is as important today in politics as “left v right”. Nations do best when they are prepared to be open to the world. This means open in their economies, eschewing protectionism, welcoming foreign investment, running flexible labour markets. It means also open to the benefit of controlled immigration.

Once again, I respond positively to openness and indeed open vs closed does seem a more practical unit of political analysis than left vs right these days, at least on some levels. However, again I experience a trace of unease and this time I can’t put my finger on it. And in the ensuing 5-6 years we see openness emerging as a western agenda with the implicit assumption that open = good and by extension more open = more good.

I expressed some of my concerns when I wrote about this in 2009. I likened open vs. closed as a kind of yin and yang, two parts that can’t live without each other.  That was close to what I want to say here but I think the problem is bigger now and that analogy misses some critical nuance

More recently I’ve been thinking about “open” in the context of Open Data and how that relates to personal privacy. Clearly more open cannot always equate to more good in this context. If we acknowledge that the need for privacy is contextual, then it is axiomatic that the need for openness is also contextual. The problem with making a virtue of “open” is that it tends to steamroller nuance and context.

I am reminded, as I often am, of the Taoist parable of a farmer and his “good” fortune.  Nothing is inherently good or bad but is defined by the context in which we understand.  The more I think about it, the more I think the crux of the problem lies in an essentially manichean worldview where open is now equated with virtue, where we must fight the forces of closed-ness wherever we find it.

This is wrong in the same way that the Golden Rule is wrong. Do unto others as you would have them do unto you. What could be simpler? Yet, the angry drunk who enjoys getting into fights in bars has no problem with this rule. It helps him get into more fights. Absolute rules tend not to fair very well in complex environments.

So, let’s consider some other “good” words.  How about “kindness” and “cleanliness”? Any parent or anyone who has been in the wrong relationship knows that you have to be “cruel to be kind” sometimes.  Doing the right thing might actually involve not being “kind”.  So perhaps kindness is not a very good goal.  And cleanliness.  We know it is next to godliness and on the surface of things how could one argue with it.  But a quest for cleanliness actually has led to some surprisingly negative outcomes such as the growth of allergies.  The more we understand about our bodies, the more we realise that what we previously thought of as “unclean” is actually a part of what makes us human.

So, would a “kindness” movement serve us well?  Or a “cleanliness” movement? Well, the answer is not yes or no, it is “mostly”.  Mostly being kind is a good idea and mostly being clean is a good idea but they are bad when turned into doctrine and orthodoxy. The rationale for orthodoxy is that if you don’t keep things pure enough, then it is a slippery slope to the increasing adulteration of all you hold dear.

The problem with purity is that is that it leads to fragility.  In Anti-Fragility, Nassim Taleb argues that all complex systems need to be stressed in order to grow stronger, to reduce fragility. Perhaps open works need proprietary works to stress them into improving and evolving.  As I wrote previously, the evidence from multi-party prisoner’s dilemma simulations would seem to support this, namely that “open” strategies succeed very well in a very closed ecologies and “closed” strategies succeed very well in a very open environments.

So what’s an Open Source advocate to do?  Well, if you were Evgeny Morozov you could rubbish the entire open movement but that doesn’t work for me because I really do see and live the benefits of open all around me.   I think what is needed is a new concept, that of “Right Openness”.  In Buddhist philosophy, one of the principal teachings is the Noble Eightfold Path, which describes the “path” to enlightenment.  Each path begins with the word “right”, Right Intention, Right Speech, Right Action, Right Livelihood, etc.  What is notable about this is that there is no prescribed behaviour.  There is the overall goal of reducing suffering and being compassionate but the way you achieve that is not specified.  Mostly kindness is a great way of being compassionate but not always.  Mostly openness is a great way of achieving good outcomes in the growth of knowledge  in good governance, etc but not always. One need only look at a for-profit 3-D gun printing initiative to see how “open” as orthodoxy can lead to the wrong sort of outcome.

Let’s take the example of Right Speech. In the west, we value truth and freedom when it comes to speech. Yet, anyone who has uttered the words “Let’s be honest” in a relationship, knows that there are truths and there are truths. Time and geography matter when it comes to truth. Learning the truth about a murder that happened hundreds of years ago, half-way around the world is not the same as learning the truth about a murder that happened an hour ago, next door. The same with freedom of speech. Encouraging someone to help their neighbour is not the same as encouraging someone to kill their neighbour. We know this, yet we still defend freedom of speech when I think what we really mean is Right Speech, speech that does not harm others, that is timely, etc.

What then would Right Openness look like? It would recognise that “openness” is not an inherent virtue but rather a contextual good. What would that look like in practice? Well it would always hark back to the question of the larger goal, whether more equitable sharing of knowledge, good governance, etc.  It would then ask what right openness looks like in that context. It would lead by example, not by doctrine. In the Open Data world, it would embrace privacy issues as being fundamental to effectiveness openness without getting hung up on privacy as a violation of openness.

As we struggle to understand the complexity of the world we live, we look for simple rules to help guide us through the storm.  That’s great as look as we treat them as rules of thumb.  To paraphrase George Box, all rules are flawed, but some are useful.

So let’s hear it for Right Openness and remember kids, “Only a Sith deals in absolutes

Photo courtesy Stephanie Davidson 2008 CC BY SA


If I Had 50 Million Dollars

Money TreeIf I had any poetic talent, I would have done this in rhyming couplets set to the music of the Barenaked Ladies but sadly today you are left with my prose.  Hum along with me anyway.

If you work in the area of Open Government, Open Data, Transparency,  or even just ICTs and Development in general, you have probably heard of the Making All Voices Count (MAVC) initiative.  MAVC is a Grand Challenge for Development which brings together the UK Department for International Development (DFID/UKAID), U.S. Agency for International Development (USAID), Omidyar Network (ON), and the Swedish International Development Cooperation Agency (SIDA) to create a $50 million fund to “support innovation, scaling-up, and research that will deepen existing innovations and help harness new technologies to enable citizen engagement and government responsiveness.

On Saturday, in response to an increasing number of interactions around MAVC that I’ve had over the last few weeks, I tweeted the following:

which garnered a few reactions.  Most interestingly @wayanvota issued a challenge to speak up and say just what was wrong with MAVC.  I was a little surprised by his reaction as I thought there were some fairly self-evident problems, mostly related to what happens when there is a large pot of money on the table.  It then occurred to me that perhaps this was perhaps not self-evident to all or perhaps even that I was simply jaded and cynical.  My first thought was to blog about the challenges that I think MAVC will face.  But frankly it’s easy to be a critic and anyone engaged in the field of philanthropy knows that it is hard, very hard indeed to do well.  If I had the time, I could pick holes in development initiatives all day.  Like shooting fish in a barrel but not as much fun.

So, perhaps more challenging, more constructive, and more fun would be to say, well *what would I do*. That is to say if someone said, here take this bag of 50 million dollars and go forth to create more open governance in the South.  A challenging prospect.  Achieving impact through the giving away of money is much more difficult than achieving impact in the private sector. Philanthropy lacks that marvellous feedback loop called the market which provides plenty of data for self-correction.  This doesn’t, as some suggest, make philanthropy bad. It just makes it more challenging.  So herewith my suggestion as to how to most effectively spend 50 million dollars on Open Government in the South.

My recipe is very simple.  I would pick 10 universities, one each in 10 countries in the South.  I would endow a chair for Cyber Law and Governance in each university for 10 years giving each university 5 million dollars.  That’s it.  Maybe I’d keep a million or two back to fund and facilitate networking among the universities but that’s it.  Here is my rationale:

  1. Open Governance, if it happens at all, has to be home-grown.  The power imbalance in development assistance hasn’t gone away.  Putting southern researchers in control of the agenda is a start towards mitigating that problem.
  2. Open Governance is still in its infancy.  There is no significant body of evidence of it making a difference in the South.  Granted MAVC plans to fund research, as do others, but what is really needed is sustained dialogue in the south between informed civil society and government.  Think of the role that someone like Michael Geist plays in Canada or Rufus Pollock in the UK.  Universities were and are the critical enablers for them.  We need more of that in the South, that is to say dialogue not solutions.  Solutions emerge naturally from constructive dialogue.
  3. Open Government is complex.  There is a kind of naive optimism around Open Government which comes from the forty thousand foot view that many donors have.  Kenya, the poster child for Open Government in Africa, has experienced its own challenges with Open Government. There are vested interests, entrenched centres of power, contradictory priorities (protection of privacy, cyber security, etc), lack of capacity and many other issues, all of which take time and engagement to deal with.  This calls more for sustained local dialogue, engagement, and capacity building than for entrepreneurs building open data apps.
  4. Most countries in the South have a critical lack of institutions that can engage on cyber governance issues.  It is not just Open Government but digital privacy, surveillance, cyber security, Internet governance and a host of other issues that demand a generation of researchers and policy-makers with the interest and capacity to lead their countries and probably the world to better decision-making on these issues.  Invest in those institutions and you will get Southern leadership on these issues and make it easier for future funders to find the right places to engage.
  5. Policy work is a long game.  Institutions need to know they can commit beyond a few years.  This would allow them the time and resources needed to bring about real change.

My 2 cents or 50 million dollars as the case may be.