Threats of Machine-Generated Text
With the release of ChatGPT, I’ve read many random articles about this or that threat from the technology. This paper is a good survey of the field: what the threats are, how we might detect machine-generated text, directions for future research. It’s a solid grounding amongst all of the hype.
Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
Abstract: Advances in natural language generation (NLG) have resulted in machine generated text that is increasingly difficult to distinguish from human authored text. Powerful open-source models are freely available, and user-friendly tools democratizing access to generative models are proliferating. The great potential of state-of-the-art NLG systems is tempered by the multitude of avenues for abuse. Detection of machine generated text is a key countermeasure for reducing abuse of NLG models, with significant technical challenges and numerous open problems. We provide a survey that includes both 1) an extensive analysis of threat models posed by contemporary NLG systems, and 2) the most complete review of machine generated text detection methods to date. This survey places machine generated text within its cybersecurity and social context, and provides strong guidance for future work addressing the most critical threat models, and ensuring detection systems themselves demonstrate trustworthiness through fairness, robustness, and accountability.
echo • January 13, 2023 8:09 AM
Section three (Threat Models) is comprehensive on the surface but completely ignores political and administrative and social domains which are the biggest threat. Really, within the human rights and governance domain the documented threats are quite trivial and easily countered. They rarely if ever get through formal processes including formal evidence gathering and evaluation. Some very very sneaky people have tried it on but they have been rooted out and got rid of. The paper is correct to indicate the threats are none zero but I feel they are overstated.
Communities impacted by targeting often form their own informal networks including publishing information and links to good quality opinion and data, and in some cases know each other personally both online and offline. It’s extremely hard (read effectively impossible) for a bad actor to penetrate these networks. Various tools, including the abuse platforms themselves as well as specialist tools developed by the community operate silently in the background. I use these tools myself (although don’t rely on them) and know from experience they are extremely effective. If a red flag pops up in these tools there is currently a 90% chance I’ve already flagged them myself. Yes, someone could try to pollute these tools but reports are scrutinised manually by people who are expert in these domains and know what to look for even when a bad actor is skirting the line.
Any expert in the human rights domain has enough formal and informal knowledge, and can tell at a glance where a bad actor is pushing it no matter how cleverly they try to smarm their way past it. If in doubt some digging into the history and context will pull up anything questionable to the expert eye. When your life depends on it you learn very fast…
Politician agendas, media greed,and shady lobbyists with extremely deep pockets are the real threat. Over 90% of online hostile activity flows from this or is enabled by this or is encouraged by this. Of this 90% comes from a very very small number of persistent bad actors who overwhelm systems. Casual threats are more of an annoyance than anything else.
Social media is like unregulated money markets. There’s no effective standard with the big platforms which monopolise attention. A handful of bad actors can be concentrated by algorithms and move like a mob. There’s nothing new in this.
A hate campaign pushed by an overwhelming number of emails or letters no matter how carefully tailored doesn’t have the effect the authors of the report thinks it does.
The technology is a red herring. It’s suggesting more gold plating on top of more goldplating. It’s a good grift for those selling hardware and software and “security” and not addressing anything fundamental or what matters.
As for public trust in AI being diminished by bad behaviour I can assure readers from personal experience that cat calling and sexist remarks and cleverly disguised misogyny and gaslighting and even unsolicited d*ck pics are nothing new. I haven’t written off the entire human race because of this.