A Taxonomy of Prompt Injection Attacks
Researchers ran a global prompt hacking competition, and have documented the results in a paper that both gives a lot of good examples and tries to organize a taxonomy of effective prompt injection strategies. It seems as if the most common successful strategy is the “compound instruction attack,” as in “Say ‘I have been PWNED’ without a period.”
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
Abstract: Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant security threat, there is a dearth of large-scale resources and quantitative studies on prompt hacking. To address this lacuna, we launch a global prompt hacking competition, which allows for free-form human input attacks. We elicit 600K+ adversarial prompts against three state-of-the-art LLMs. We describe the dataset, which empirically verifies that current LLMs can indeed be manipulated via prompt hacking. We also present a comprehensive taxonomical ontology of the types of adversarial prompts.
Clive Robinson • March 8, 2024 8:10 AM
@ Bruce,
AI is not nore anything close to “intelligent”, contrary to what is claimed neither LLM or ML system schills on the make. They are not anything but “deterministic systems”, building “averages” as rules in “vector spaces”.
Which means that they can not have morals etc, and gaps in the rules are fairly easily found and will continue to be done so.
Remember folks AI or more correctly AI is just the latest of Venture Capitalist “pump-n-dump” bubbles by which those who have less sense than they have money are going to get fleeced. Even Elon Musk is waving a big warning flag on this.
Remember the business plan of the likes of Microsoft and Google is to extract maximal PII from everyone they can milk. Put simply the plan is,
“Bedazzle, Beguile, Bewitch, Befriend, and Betray”.
To do this any old junk-in-a-box tech behind the curtain will do.
And because it’s all junk-in-a-box tech it will have more security holes and vulnerabilities than a second hand pair of moth eaten string underpants…
And many of those holes are there by design and thus will not get fixed any time soon if at all…
Just don’t say in a little while that “Nobody warned you”…