Personal Data Industry; The New Tobacco?
01 December 2021
Isn’t Data more like Oil?
“Data is the new oil”, said Clive Humby in 2006. Clearly it is not but the metaphor has some merits. It’s a new source of economic power. They both involve extraction, refinement and infrastructure. Both can be used to create many new, useful and sometimes toxic products. But whereas oil is finite and extracted from the earth, new data is always being generated and comes from many different sources. If we consider those sources we might helpfully classify them as of two types.
We have impersonal data from systems, machines, documents, and the list goes on; hence the term Internet of Things. The second source is about you; your behaviour, words, photos, sentiments, expressions, movement and to cap it all your DNA. We call this personal data.
If we chose to call impersonal data the new oil, what do we call personal data? Let’s call it the new tobacco. Like tobacco it’s unhealthy as sold and used.
"If Impersonal Data is the new Oil, then Personal Data is the new Tobacco"
The metaphor only goes so far, but you get the point. Don’t forget all that 20th century propaganda about the positive aspects of tobacco. It portrayed you as cool, thoughtful and powerful.
Are you falling for 21st century propaganda about personal data. Have you bought the line that personal data, extracted and refined as AI makes you empowered and discerning, through “personalisation” and “targetting”?
Personal Data is Personal
Recognising the importance of personal data, in the digital age governments have sought to control the transfer and usage of personal data. The most notable and leading example is the EU GDPR.
But the GDPR has not stopped the rise of the new tobacco industry, aka surveillance capitalism. In the name of convenience, and through digital “nudges”, we all have been handing over our personal data. The GDPR is flouted at massive scale (see this video for a shocking explanation) and until recent years we failed to question the propaganda of surveillance capitalism, notably about the benefits of “personalisation” and “targetting”.
At Mojeek we have always recognised the difference. We collect impersonal data by crawling the open web at scale, storing it effectively and sorting that data for you on request. As with oil, we extract (respectfully), store and refine. Personal data is, and was always for us, a total no-go zone; so we have never collected it from our users. Indeed we go further and actively seek to avoid others collecting it behind the scenes. No Google analytics, captchas or fonts for us.
But we do recognise that in our modern economy data can and does fuel important innovations in many fields. In the last few years governments have recognised this and made data a key part of their national strategy. Indeed some are now involved in a race for AI supremacy. There are many incredible and useful innovations that have and will come from the application of impersonal data and AI. But there is also a dangerous trend to even more bias and manipulation of behaviour, as personal data continues to to be harvested and used recklessly by surveillance capitalists.
And so the tension grows between those seeking tools and regulations that protect our privacy, and those wanting flexibility to extract, process, use and trade in data. The Chinese-American and Big Tech race for AI supremacy is putting pressure on other countries to also engage in what is now a World War III: a data and cyber war. How do the EU, the UK and other countries keep up or respond?
Data: a new direction?
The pressure is on, with new legislation being considered to deal with the new this world (dis)order. In the EU, reform of the GDPR is anticipated soon in the new Data Act. Meanwhile taking advantage some would say of Brexit, the UK seeks to diverge from the existing UK GDPR, with it’s announcement of “Data: a new direction”.
As part of this the UK government is conducting a consultation on “reforms to create an ambitious, pro-growth and innovation-friendly data protection regime that underpins the trustworthy use of data.” At Mojeek we responded to this consultation, doing so where we felt we had important things to say and relevant expertise. Specifically we provided our response to four of the consultation questions and made some constructive suggestions for reform of privacy policies. As you may imagine, we were heavily critical of what we see as support for the new tobacco industry, aka surveillance capitalism.
Here we summarise our response to one of the questions, which is the proposal to drop the need for the GDPR “balancing test” for the processing of personal data. For context the GDPR has a three-part legitimate interest test regarding the use personal data. The UK GDPR has a similar structure which the ICO (Information Commissioner’s Office) explains as follows:
- Purpose test – is there a legitimate interest behind the processing (of personal data)?
- Necessity test – is the processing necessary for that purpose?
- Balancing test – is the legitimate interest overridden by the individual’s interests, rights or freedoms?
The UK government is proposing to drop the balancing test for a “limited, generic but exhaustive list of activities”. A proposed list of activities includes:
- Monitoring, detecting or correcting bias in relation to developing AI systems
- Using audience measurement cookies or similar technologies to improve web pages that are frequently visited by service users
- Improving the safety of a product or service that the organisation provides or delivers
- Using personal data for internal research and development purposes
- Using personal data for business innovation purposes aimed at improving services for customers
Sounds great, doesn’t it?
No, it does not. We consider it to be well-meaning but dangerous, and perhaps naive.
In a world where companies cannot be trusted to act with responsibly with the GDPR as it is, why would we make it easier for surveillance capitalists to harvest more personal data, and without consent? Without the balancing test, there will be even more freedoms for surveillance capitalists to act in their sole interest and against yours. The proposal is a potential bonfire on an individual's interests, rights and freedoms.
With some reflection and knowledge about how data is used, and how machine learning works you might imagine many scenarios, for each of the activities suggested in the consultation, where advantage is taken of individuals.
A New Tobacco Loophole
To illustrate we outline below one example of how a loophole, created by the proposed dropping of the balancing test, can be exploited. This is a stark and real example based on what we observe of some current projects being pursued by GAFAM, and no doubt others.
What follows is an example of next-generation surveillance-based digital advertising service that can be developed and deployed without requiring user consent at any stage, and all without the knowledge of users or regulators under this proposed data regime.
1. A new form of surveillance-based digital advertising (“Cohort Ads”) is envisaged by a Big Tech company (“GAME”), which already has large datasets at its disposal.
2. Since the Cohort Ads will, it is viewed by GAME, provide more relevance for users it is decided that there is a legitimate interest.
3. A new innovative machine learning model is trained using the existing large datasets and new personal data harvested from users, during a pilot phase developed with a subset of the GAME user base in the UK, under this reformed GDPR.
4. GAME decides this falls under one of the exempted activities: “business innovation purposes aimed at improving services for customers”, and so no balancing test is required.
5. This new model and prototype software works well. It is therefore decided it will be suitable for use in “Cohort Ads” across all users globally.
6. Cohort Ads, originally trained on personal data, uses machine learning inference and so without now accessing what they consider to be personal data. GAME consider it to be privacy-by-design, even if it is in reality a new form of profiling.
7. Before roll-out a second consideration is made by GAME, of the balancing test. Personal data is not being used explicitly in the processing (inference) of the model, even if it is implicitly embedded in the machine learning model. GAME therefore decide that the no balancing test is required, so Cohort Ads can be deployed without consent.
8. Cohort Ads become the next generation of surveillance capitalist ads product.
One can probably come up with numerous scenarios for the proposed exempt activities, where technology companies will figure out how to take advantage of this proposed “balancing test” loophole. As it happens, and unbeknown to us, the UK ICO and CMA in their responses to the consultation have also expressed similar concerns about the proposals on the balancing test. The CMA actually refers to the same activity as mentioned in the example above:
“We have particular concerns about the broad exemption to remove the need to undertake a balancing test when processing personal data for ‘business innovation purposes aimed at improving services for customers'.”
Anything that encourages extraction, processing, usage and trade in personal data is a dangerous road to take. At Mojeek we refuse to take it.
As it happens the ICO has also just published their opinion on “Data protection and privacy expectations for online advertising proposals” and expect any new initiatives to address user choice, as follows (5.1B):
“Individuals must be offered the ability to receive adverts without tracking, profiling or targeting based on personal data, eg contextual advertising that does not require any tracking of user interaction with content.”
We welcome this naturally. After all, we have always advocated for, and practiced, search and search ads without surveillance.
If you are interested, here is Mojeek's full response to the UK government data consultation.
01 December 2021