I find the ICO’s response to Dr Neil Bhatia’s request for clarification on care.data and the Data Protection Act (DPA) to be very strange.
Care.data is the name for NHS England’s program to centralise all GP patient health records together with all hospital visit records in one big data warehouse available to researchers. While originally intended to go live this month, it has been delayed for six months.
Dr Neil Bhatia is a privacy activist opposed to the program. Where a patient objects to his or her data being shared outside of the NHS, the intention is to share ‘pseudonymised’ health data. Dr Bhatia wrote to the ICO for clarification on the Data Protection Act and pseudonymised health records.
The ICO replied that anonymised data is not covered by the DPA. If the subject cannot be identified there can be no privacy loss.
But on pseudonymised data he has no clear response:
Pseudonymised data on its own would not constitute personal data, as it does not enable individuals be identified.
However, it is possible that pseudonymised data may become personal data if it is held by an organisation which holds other information which could be used in conjunction with the pseudonymised data to identify individuals.
As such, whether pseudonymised data would be covered by the DPA would depend on other information which is in the data controller’s possession.
ICO’s letter to Dr Neil Bhatia
There are two problems with the ICO’s statement — both the first and last of these sentences. For the first, there is a growing academic consensus that you simply cannot pseudonymise data so that the individuals concerned cannot be re-identified.
On Friday, Professor Ross Anderson blogged in Light Blue Touchpaper and provided both audio and slides for a talk he gave at the Open Data Institute. The talk is titled ‘Why Anonymity Fails’. In the inference section he notes:
- If you link episodes into longitudinal records, most patients can be re-identiﬁed
- Add demographic, family data: worse still
- Active attacks: worse still (Iceland example)
- Social network stuﬀ: worse still
[The 'Iceland example', incidentally, refers to the occasion in 1998 when DeCODE offered Iceland free IT systems in return for access to medical records. The funding came from the Big Pharma company, Roche.]
Now check the 33 Bits of Entropy blog operated by Arvind Narayanan (a Computer Science/CITP Assistant Professor at Princeton and affiliate scholar at Stanford Law School’s CIS):
The title refers to the fact that there are only 6.6 billion people in the world, so you only need 33 bits (more precisely, 32.6 bits) of information about a person to determine who they are.
…If your hometown has 100,000 people, then knowing your hometown gives me 16 bits of entropy about you, and only 17 bits remain.
About 33 Bits
It is clear that a determined adversary will be able to obtain the 33 necessary bits of entropy from within the pseudonymised data — but the ICO seems to ignore this as if simply labeling a dataset as pseudonymised actually makes it impossible to re-identify the subject.
And this, of course, is without marrying the health data to other easily obtainable databases — such as the edited version of the electoral register which includes details of everyone who hasn’t specifically opted out of being included. There are no restrictions on the use of this data.
The second problem with the ICO’s statement is that he talks about ‘other information’ actually in the data controller’s possession — nothing about the other data that the data controller could subsequently obtain. This means that an unscrupulous operator could easily obtain care.data because he has no ‘other information’ and then later marry it with separately obtained other databases. He would then become subject to the Data Protection Act, but might no longer be within the ICO’s jurisdiction. Having got the data, he would have everything necessary to sell personal information to whoever would buy it.
The simple reality is that it is impossible to protect the anonymity of patient health data while retaining the value that the researchers (big pharma, insurance, credit companies etcetera) demand. With big business and David Cameron both singing from the same hymn sheet, care.data may have been delayed, but it is a long way from being abandoned — particularly when we have an ICO more determined to apply his own interpretation of the law than protect the people.
I had to visit the hospital the other day. I’m not going to say why, because that’s private, personal and confidential. Suffice it to say that the condition isn’t one that I wouldn’t tell my mother; but it is one that I’d prefer potential employers and insurers know nothing about unless I tell them (it’s probably nothing anyway). I would most certainly not want the pharmaceutical industry to know — the drugs they offer make the (possible) condition much worse, and introduce new ones.
But I don’t need to worry, do I? At the bottom of the hospital appointment letter, in bold type, is the statement:
All personal information about you is kept confidential at all times and is only shared when necessary to support your care and treatment. If we want to use your information for any other purpose, with the exception of when the law requires us to do so, we will talk with you and obtain your consent. If you have any concerns regarding this, please talk to the person providing your care and treatment.
(see grammatical note at the end of this post)
But that’s a lie. While the government wants to start centralizing our GP records in the autumn, it is already doing so with HES (Hospital Episode Statistics). These are already held by the Health and Social Care Information Centre (HSCIC) which is where all of the records will eventually be held. According to the HSCIC website,
HES is a data warehouse containing details of all admissions, outpatient appointments and A&E attendances at NHS hospitals in England.
This data is collected during a patient’s time at hospital and is submitted to allow hospitals to be paid for the care they deliver. HES data is designed to enable secondary use, that is use for non-clinical purposes, of this administrative data.
It is a records-based system that covers all NHS trusts in England, including acute hospitals, primary care trusts and mental health trusts. HES information is stored as a large collection of separate records – one for each period of care – in a secure data warehouse.
We apply a strict statistical disclosure control in accordance with the HES protocol, to all published HES data. This suppresses small numbers to stop people identifying themselves and others, to ensure that patient confidentiality is maintained.
Compare the two statements. It is perfectly clear that the hospital is lying. But the reality is, so is HSCIC.
Back in 2012, the marketing firm PA Consulting bought a copy of the HES data.
So we bought the data and installed it (with certain security restrictions) on our own hardware… [But querying the data took too long.] The alternative was to upload it to the cloud using tools such as Google Storage and use BigQuery to extract data from it… Within two weeks of starting to use the Google tools we were able to produce interactive maps directly from HES queries in seconds.
(That document seems to have been removed from the PA site, or hidden away. Anyway, I can no longer find it, and have to rely on the copy I have. It seems to have been replaced by a press statement from PA here and another from HSCIC here in a coordinated release. Neither of these should satisfy any patient.)
The HES data sold by the government is pseudonymised — but still includes postcode and age (PA denies that it received DOB or address, but doesn’t specify whether that included ‘age’ and ‘postcode’). In other words, standard HES data specifies very clearly exactly who 98% of the patients actually are and where they live.
And then there’s Beacon Dodsworth, a firm that “provides geographical information system (GIS) mapping software and marketing technology to clients in a wide range of industries” including Estee Lauder, Trinity Mirror Group and Boots. It says
Hospital Episode Statistics (HES) have now been integrated with our P2 People & Places people classification thanks to some hard work from our clever developers.
This means you can now better understand the health needs of local communities and populations and identify trends and patterns in order to target health improvement more effectively.
So we seem to have a system that quite readily sells our hospital records to any marketing company that will pay for them, and then allows those marketing firms to advertise the ability to target us on the basis of our health. And at the same time, the NHS itself tells us something completely different: that the data is only seen by those involved in our treatment.
Now Ross Anderson, chair at the Foundation for Information Policy Research; Phil Booth, coordinator at medConfidential; and Nick Pickles, director at Big Brother Watch, have all filed a complaint with the ICO requesting that the issue be examined in relation to the Data Protection Act.
It will be interesting to see how the ICO can reconcile what to everyone else is a clear but hidden breach of confidential patient data — and the Data Protection Act — with this government’s desire to sell and share everything about us to anyone willing to pay for it, irrespective of our own wishes. Because the one thing we can be very sure about in all of this is that the ICO will do all he can to avoid doing anything at all.
The first sentence is a complete statement. The second sentence is also a complete sentence. There is nothing in the second sentence to indicate that it qualifies the first sentence. There is nothing in these two sentences from which a reasonable patient could infer that it really means, “We will not share your personal data with anyone other than the centralised government database operated by HSCIC, with whom we will always provide all of your details all of the time, and over which we have not the slightest control nor responsibility for your personal data.
Goodle (that is, the UK’s ICO) is friendly with Google. You can see that in its behaviour over Street View (the collection, inadvertent or otherwise, of personal wifi data while driving round the streets of the world). Germany fined Google over it. Goodle just said stop it, don’t do it again, and get rid of what you’ve got.
When Google didn’t get rid of it, Goodle had to get really tough, and say get rid of it now, because we really, really mean it this time!
But back to Article 29. Problematically, Goodle, it is one of six EU member states chosen to take enforcement action against Google. CNIL, the French regulator, has already completed its task. It has instructed Google in exactly what it must do to come into conformance with French laws. Google has three months to comply before CNIL levies a fine.
Spain is likely to be next. The Spanish regulator announced on Thursday that it has “found evidence of five serious privacy law breaches — each punishable with fines of up to 300,000 euros ($395,000).” (AFP) An enforcement notice with threats will likely follow shortly.
Germany is hardly likely to take a softer line – generally speaking it is tougher than most other EU nations on matters of personal privacy (some can remember Nazi Germany, and most can remember Stasi Germany).
Then we have Italy, the Netherlands, and of course Goodle. My bet is that Italy and the Netherlands do the same as France and Spain. But what then? What about the UK? What’s a good Goodle to do if all the other nations slap Google as hard as they can? It’s a difficult position for a loyal Google Poodle.
I’ve never been convinced on the value of the UK’s data protection regulator, the ICO. There are numerous reasons for this. Firstly, the Data Protection Act is a law. Upholding the law is a job for the police and courts, not a government-controlled quango. Secondly, to uphold the law you need a grounding in the law: the ICO should be a lawyer not a marketer. And thirdly, the whole premise of the Data Protection Act is absurd. The way it is established means that proof of compliance is not getting hacked, while proof of non-compliance is getting hacked. And getting hacked is a lottery that has little relationship to security spend.
But I think I lost all respect when the ICO published an ‘independent’ report on the GDPR last month. It was undertaken by London Economics and is reliant on statistics (a survey of 506 data protection professionals working in UK companies). Statistics always reflect the bias of the author, so they’re always pretty meaningless. But that’s not the issue. It was what the ICO said about it:
Today’s report is the latest contribution from the ICO to this debate. We’d urge the European Commission to take on board what it says, and to refocus on the importance of developing legislation that delivers real protections for consumers without damaging business or hobbling regulators.
This is gobbledygook. ‘Without damaging business or hobbling regulators’ is rather confused since it is protecting business that hobbles regulators. He claims to want ‘real protections for consumers’ when what he is advocating limits the genuinely real protections for consumers proposed by the EC.
But above all, what is the Information Commissioner doing in advocating for business rights? His mission, in his own words, “is to uphold information rights in the public interest.” Yet here he is trying to uphold business rights to the detriment of the public interest. Lobbying against the GDPR on behalf of business is none of his concern, and a betrayal of the people he is supposed to protect.
He is, however, toeing the UK government line; which in turn is toeing the US government and US corporate line. PRISM shows us that the US government cannot be trusted with our personal data. GCHQ’s involvement with PRISM and the MPs’ call to get the Snoopers’ Charter back on course show that UK politicians cannot be trusted with our personal data.
And where is the ICO on PRISM? God knows. He has published no statement, and posted no blog on the subject. Instead, he is lobbying on behalf of business to make the transfer of our personal data (via Google, Facebook, Microsoft et al) to the NSA all the easier.
It’s time for the ICO to be abolished and replaced by something more meaningful, and someone more willing to fight for the people rather than lobby for business at the behest of government.
Compliance – at least European regulatory compliance – bothers me. Whenever I speak to a security expert, those concerns are allayed for just so long as we talk; and then they come back again.
The problem is that Europe passes principle-based legislation (the US is more likely to pass rule-based legislation). The former tells you what must be achieved (the principle), while the latter tells you how it must be done (the rules).
The European Data Protection Directive is a perfect example of principle-based legislation. It says that personal information must be held securely; but it doesn’t tell you how it should be done.
Here’s my problem. Data that hasn’t been lost or stolen has, de facto, been held securely and the company is in compliance – even if it spends nothing on compliance. Data that has been lost or stolen has not, de jure, been held securely and the company fails compliance even if it has spent many ££millions on compliance. The existence or lack of infosecurity defences is irrelevant: if you lose that data, then you are in breach of the act; if you do not lose the data then you are not in breach of the act.
I’m not interested in claims that proof you spent money on security will make the ICO (a marketing man, mark you – not a lawyer) go easy on you. That’s just marketing dross to hide the underlying contradiction.
What I want to know is quite simple. How can it possibly be right to frame a law that states someone who tries to comply can fail compliance, while someone who ignores compliance can be compliant? The result is that there is no logical reason to spend money on securing personal data – just hope you don’t get hacked. This is aggravated by the common and growing perception that if you get targeted, you will get breached. So if you get targeted, you will have failed compliance whether you try to comply or not. Why bother?
Have I mentioned that the ICO is a waste of both space and money? Well, if you ever doubted me, doubt no more. It has been treated with utter contempt by Google, and there’s not a damn thing it can do.
Do you remember Spy-Fi, when Google engaged in its very own version of drive-by downloading? Well the ICO said, “No! Stop it. Don’t do it again. And delete what you’ve got.” And Google said, ever-so politely, yes sir – we will. Only it didn’t. “Google has recently confirmed that it still has in its possession a small portion of payload data collected by our Street View vehicles in the UK.” It says it was an error and will work with the ICO to remedy the situation.
But how does the ICO know? How does the ICO know what Google has done with that payload data, what it may do with that payload data, or how many copies of that payload exist in what parts of Google’s vast and nebulous cloud? It wrote back, even more politely, asking for Google to store the data securely for examination before being told what to do with it.
But how does the ICO know, and what can it do? Nothing, except take the word of big business.
Nick Pickles, director of privacy and civil liberties group Big Brother Watch, has no doubt on what should happen:
The Information Commissioner is hampered by a woeful lack of powers and is forced to trust organisations to tell the truth. Given Google’s behaviour has called into question if that really is a proper way to protect our personal data, it must be right to now demand a proper regulator with the powers and punishments to fully protect British people’s privacy.
It’s time to get rid of the self-congratulatory lap dog and replace it with an angry pit bull.