Care.data, pseudonymised data and the ICO
I find the ICO’s response to Dr Neil Bhatia’s request for clarification on care.data and the Data Protection Act (DPA) to be very strange.
Care.data is the name for NHS England’s program to centralise all GP patient health records together with all hospital visit records in one big data warehouse available to researchers. While originally intended to go live this month, it has been delayed for six months.
Dr Neil Bhatia is a privacy activist opposed to the program. Where a patient objects to his or her data being shared outside of the NHS, the intention is to share ‘pseudonymised’ health data. Dr Bhatia wrote to the ICO for clarification on the Data Protection Act and pseudonymised health records.
The ICO replied that anonymised data is not covered by the DPA. If the subject cannot be identified there can be no privacy loss.
But on pseudonymised data he has no clear response:
Pseudonymised data on its own would not constitute personal data, as it does not enable individuals be identified.
However, it is possible that pseudonymised data may become personal data if it is held by an organisation which holds other information which could be used in conjunction with the pseudonymised data to identify individuals.
As such, whether pseudonymised data would be covered by the DPA would depend on other information which is in the data controller’s possession.
ICO’s letter to Dr Neil Bhatia
There are two problems with the ICO’s statement — both the first and last of these sentences. For the first, there is a growing academic consensus that you simply cannot pseudonymise data so that the individuals concerned cannot be re-identified.
On Friday, Professor Ross Anderson blogged in Light Blue Touchpaper and provided both audio and slides for a talk he gave at the Open Data Institute. The talk is titled ‘Why Anonymity Fails’. In the inference section he notes:
- If you link episodes into longitudinal records, most patients can be re-identiﬁed
- Add demographic, family data: worse still
- Active attacks: worse still (Iceland example)
- Social network stuﬀ: worse still
[The ‘Iceland example’, incidentally, refers to the occasion in 1998 when DeCODE offered Iceland free IT systems in return for access to medical records. The funding came from the Big Pharma company, Roche.]
Now check the 33 Bits of Entropy blog operated by Arvind Narayanan (a Computer Science/CITP Assistant Professor at Princeton and affiliate scholar at Stanford Law School’s CIS):
The title refers to the fact that there are only 6.6 billion people in the world, so you only need 33 bits (more precisely, 32.6 bits) of information about a person to determine who they are.
…If your hometown has 100,000 people, then knowing your hometown gives me 16 bits of entropy about you, and only 17 bits remain.
About 33 Bits
It is clear that a determined adversary will be able to obtain the 33 necessary bits of entropy from within the pseudonymised data — but the ICO seems to ignore this as if simply labeling a dataset as pseudonymised actually makes it impossible to re-identify the subject.
And this, of course, is without marrying the health data to other easily obtainable databases — such as the edited version of the electoral register which includes details of everyone who hasn’t specifically opted out of being included. There are no restrictions on the use of this data.
The second problem with the ICO’s statement is that he talks about ‘other information’ actually in the data controller’s possession — nothing about the other data that the data controller could subsequently obtain. This means that an unscrupulous operator could easily obtain care.data because he has no ‘other information’ and then later marry it with separately obtained other databases. He would then become subject to the Data Protection Act, but might no longer be within the ICO’s jurisdiction. Having got the data, he would have everything necessary to sell personal information to whoever would buy it.
The simple reality is that it is impossible to protect the anonymity of patient health data while retaining the value that the researchers (big pharma, insurance, credit companies etcetera) demand. With big business and David Cameron both singing from the same hymn sheet, care.data may have been delayed, but it is a long way from being abandoned — particularly when we have an ICO more determined to apply his own interpretation of the law than protect the people.