Skip to main contentSkip to navigationSkip to navigation
The MI6 headquarters in London
The MI6 headquarters in London. UK intelligence agencies are already using bulk data to train AI. Photograph: Jeff Greenberg/Universal Images Group/Getty Images
The MI6 headquarters in London. UK intelligence agencies are already using bulk data to train AI. Photograph: Jeff Greenberg/Universal Images Group/Getty Images

UK spy agencies want to relax ‘burdensome’ laws on AI data use

This article is more than 9 months old

GCHQ, MI6 and MI5 propose weakening safeguards that limit training of AI models with bulk personal datasets

The UK intelligence agencies are lobbying the government to weaken surveillance laws they argue place a “burdensome” limit on their ability to train artificial intelligence models with large amounts of personal data.

The proposals would make it easier for GCHQ, MI6 and MI5 to use certain types of data, by relaxing safeguards designed to protect people’s privacy and prevent the misuse of sensitive information.

Privacy experts and civil liberties groups have expressed alarm at the move, which would unwind some of the legal protection introduced in 2016 after disclosures by Edward Snowden about intrusive state surveillance.

The UK’s spy agencies are increasingly using AI-based systems to help analyse the vast and growing quantities of data they hold. Privacy campaigners argue rapidly advancing AI capabilities require stronger rather than weaker regulation.

However, a recent but little-noticed review of surveillance powers reveals how the intelligence agencies are arguing for a reduction in the safeguards regulating their use of large volumes of information, known as bulk personal datasets (BPDs).

These datasets often contain information, some of which may be sensitive, about extremely large groups of people, most of whom are unlikely to be of intelligence and security interest.

MI5, MI6 and GCHQ frequently use BPDs that are drawn from a wide range of closed and open sources and can also be acquired through covert means.

The agencies, who argue these datasets help them identify potential terrorists and future informants, want to relax rules about how they use BPDs in which they believe people have a “low or no expectation of privacy”.

The proposed changes were presented to David Anderson, a senior barrister and member of the House of Lords, whom the Home Office commissioned earlier this year to independently review changes to the Investigatory Powers Act.

In his findings, Lord Anderson said the agencies’ proposals would replace existing safeguards, which include a requirement for a judge to approve examination and retention of BPDs, with a quicker process of self-authorisation.

Anderson said the agencies had used AI for many years and were already training machine-learning models with BPDs. He said significant increases in the type and volume of the datasets meant machine learning tools “are proving useful” to British intelligence.

But he said the existing regulations relating to BPDs were perceived by the agencies as “disproportionately burdensome” when applied to “publicly available datasets, specifically those containing data in respect of which the subject has little or no reasonable expectation of privacy”.

The intelligence services have argued this information should be placed into a new category of BPDs which, according to Anderson, could include content from video-sharing platforms, podcasts, academic papers, public records, and company information.

The cross-bench peer concluded the law should be amended to create “a less onerous set of safeguards” for the new category of BPDs and said the “deregulatory effect of the proposed changes is relatively minor”.

However, he recommended retaining a degree of ministerial and judicial oversight in the process, rather than allowing intelligence officers alone to decide which BPDs are placed into the new category.

While considering how the intelligence services would use the new category of BPDs, Anderson acknowledged that it seemed the “use of data for training models might be a factor pointing towards a lower level of oversight”.

Last week, during a Lords debate about AI, Anderson said that “in a world where everybody is using open-source datasets to train large language models” the intelligence agencies are “uniquely constrained” by the current legislation.

skip past newsletter promotion

“I found that these constraints … impinge in certain important contexts on [the intelligence agencies’] agility, on its cooperation with commercial partners, on its ability to recruit and retain data scientists, and ultimately on its effectiveness,” the peer said.

A source familiar with the agencies’ proposals said their desire to use AI-based tools, in particular to train large language models, was “definitely a driver” for putting them forward. However, frustrations about time-consuming administrative processes when using certain datasets were also a factor.

Do you have information about this story? Email harry.davies@theguardian.com.

During Anderson’s review, the human rights organisations Liberty and Privacy International urged the peer to oppose any reduction in existing safeguards relating to BPDs, which they argue are already weak, ineffective and unlawful.

“It should not be made easier to store the data of people who are not under suspicion by the state, especially such large datasets affecting so many people,” a lawyer for Liberty told him. “Any temptation in this review to recommend legislative changes which widen bulk powers or lessen safeguards should be fiercely resisted.”

Both organisations argued their opposition was supported by findings made earlier this year by a specialist surveillance court, which ruled MI5 had committed “serious failings” by unlawfully processing large volumes of data in systems that breached legal requirements.

Responding to Anderson’s review, a leading privacy and surveillance expert, Ian Brown, wrote on his website that “data scientists’ disappointment they don’t get to play with all their wonderful new toys isn’t a good justification for weakening fundamental rights protection”.

“Given the rapid advances in machine learning techniques in the last decade, this will make it particularly difficult” for intelligence officials and the judges overseeing their work “to decide which datasets could be included in a ‘low/no expectation of privacy’ regime”, he added.

According to a Whitehall source, the government is now considering Anderson’s recommendations and will publish its response later this year.

Most viewed

Most viewed