New SHA-1 Attack

There’s a new, practical, collision attack against SHA-1:

In this paper, we report the first practical implementation of this attack, and its impact on real-world security with a PGP/GnuPG impersonation attack. We managed to significantly reduce the complexity of collisions attack against SHA-1: on an Nvidia GTX 970, identical-prefix collisions can now be computed with a complexity of 261.2rather than264.7, and chosen-prefix collisions with a complexity of263.4rather than267.1. When renting cheap GPUs, this translates to a cost of 11k US$ for a collision,and 45k US$ for a chosen-prefix collision, within the means of academic researchers.Our actual attack required two months of computations using 900 Nvidia GTX 1060GPUs (we paid 75k US$ because GPU prices were higher, and we wasted some time preparing the attack).

It has practical applications:

We chose the PGP/GnuPG Web of Trust as demonstration of our chosen-prefix collision attack against SHA-1. The Web of Trust is a trust model used for PGP that relies on users signing each other’s identity certificate, instead of using a central PKI. For compatibility reasons the legacy branch of GnuPG (version 1.4) still uses SHA-1 by default for identity certification.

Using our SHA-1 chosen-prefix collision, we have created two PGP keys with different UserIDs and colliding certificates: key B is a legitimate key for Bob (to be signed by the Web of Trust), but the signature can be transferred to key A which is a forged key with Alice’s ID. The signature will still be valid because of the collision, but Bob controls key A with the name of Alice, and signed by a third party. Therefore, he can impersonate Alice and sign any document in her name.

From a news article:

The new attack is significant. While SHA1 has been slowly phased out over the past five years, it remains far from being fully deprecated. It’s still the default hash function for certifying PGP keys in the legacy 1.4 version branch of GnuPG, the open-source successor to PGP application for encrypting email and files. Those SHA1-generated signatures were accepted by the modern GnuPG branch until recently, and were only rejected after the researchers behind the new collision privately reported their results.

Git, the world’s most widely used system for managing software development among multiple people, still relies on SHA1 to ensure data integrity. And many non-Web applications that rely on HTTPS encryption still accept SHA1 certificates. SHA1 is also still allowed for in-protocol signatures in the Transport Layer Security and Secure Shell protocols.

Tags: academic papers, certifications, cryptography, encryption, forgery, impersonation, keys, PGP, SHA-1

Posted on January 8, 2020 at 9:38 AM • 26 Comments

Comments

Me • January 8, 2020 10:18 AM

Hopefully this lights the appropriate fire under their but about fully deprecating this thing.

David Newman • January 8, 2020 11:02 AM

As of 8 January 2020 the GPG Suite package for MacOS on gpgtools.org still uses a vulnerable version of GnuPG. GnuPG itself no longer supports SHA-1 as of version 2.2.19 and is available at gnupg.org. However, installing that version does not overwrite a previous installation of the GPG Suite.

Steven Clark • January 8, 2020 11:25 AM

I wouldn’t say anything supporting OpenPGP doesn’t support SHA-1. After all, that’s still the fingerprint algorithm whether you like it or not. GPG has been rejecting unilaterally changing this without a standards change for years, as if they aren’t essentially the only implementer left.

Robert Stoddard • January 8, 2020 12:27 PM

GIT uses SHA1 for commit IDs, I imagine this allows for an attacker to create a replacement commit in the blockchain. This could be a serious issue with GIT’s security.

Lee • January 8, 2020 12:54 PM

If anyone’s avoiding SHA-256 due to the longer values, I just use base 62/64 which only requires 43 characters – compared with the 40 hex digits for SHA-1. Even where security doesn’t seem relevant it’s worth ditching SHA-1 completely.

Clive Robinson • January 8, 2020 1:20 PM

@ ALL,

The fact this attack is easer than the last does not mean that SHA-1 should be entirely consigned to the dustbin/trashcan of history.

Because not all uses of it are security sensitive. That is there are some uses where using it will have no effect if it is secure or hoplessly insecure. Such uses can be found in random number generators for simulators where it’s the quality and repetability of the pseudorandomness not the security that counts.

That said for security SHA-1 has been a “dead duck waddling” for quite some time now, so there is no excuse for code where security does count to not have been cleaned up and prepared for SHA-1 to be removed and replaced.

There is always the legacy code argument for people to have an excuse to sit on their thumbs and in effect do nothing, or more likely put their index fingers in their ears and loudly say “nanananana…” every time anyone mentions it. Likewise there are arguments about “not alarming end users with warning messages”…

Thus anyone who can get between Alice and Bob can mount a fallback or downgrade attack and they won’t get warned.

Unless this “don’t worry the user” or similar excuse is not quashed entirely, then we will not have any secure systems going forwards in time. It should never be possible for a thitd person to get in between two users and force their software to negotiate a downgrade to a known to be insecure protocol.

And yes I’m aware of the “embedded software” argument where the Smart Meter etc has been designed so that it can not be upgraded but has an expected life of 25-50years in open connection to what is an insecure network (they all are if a third party can see/share the network). Such non upgradable equipment should be ripped out at the system designers or owners expense.

But I’ve been saying this for way longer than SHA-1 has been considered vulnerable, and all that has happened in that time is the problem has actually got worse, a lot worse, in that time.

But history tells us one thing, those who created the mess will stoutly and robustly deny it’s their responsibility[1], and worse will continue in their bad habits untill they are made responsible in one way or another. Solid no nonsense regulation or legislation would once have been a solution, but these are the same people that buy political control and favourable legislation these days…

[1] At the very least the Upton Sinclair observation about “saleries and opinions” applies.

xxx • January 8, 2020 3:26 PM

Speaking about MD5 – how easy or how difficult
is to forge iso image so that it contains malware
but the MD5 hash remains original?

David Leppik • January 8, 2020 3:57 PM

@Clive:

While I agree that a lot of uses may not actually need security from SHA-1, as far as I can tell PRNGs aren’t one of them. Mersenne Twister (and its variants) is the most popular algorithm for scientific applications, and there’s no advantage to using a SHA-1 hash to initialize it. So long as initial collision seeds are unlikely, it’s fine to use any non-random number source. Typically you use the system clock along with something to make it thread safe. (Java’s built-in seed generator xors System.nanoTime() with a simple thread-safe secondary PRNG. I’m guessing Python, which is the most common language for the sciences and neural network training, does the same thing, but I don’t have the source code in front of me.)

David Leppik • January 8, 2020 4:01 PM

One thing I find interesting is that now that cloud computing is so prevalent, these attacks have a fairly precise and predictable dollar cost. It used to be that attack cost was described in terms of CPU hours on whatever architecture the researcher was using; these days it’s just as easy to extrapolate from their AWS bill.

mostly harmful • January 8, 2020 4:42 PM

I would be quite interested to hear more about the issue @Steven Clark raises.

The advice under «GnuPG Frequently Asked Questions: How do I validate certificates?» currently exhibits some plain flaws in its reasoning regarding fingerprints, in light of the news discussed here.

In particular that FAQ says: “By comparing the fingerprints of the certificate you have against the fingerprint they specified, you’re ensuring that you have the right certificate.”

But if I understand correctly what I am reading here, that is simply false.

Electron 007 • January 8, 2020 4:50 PM

@O.P.

The new attack is significant. While SHA1 has been slowly phased out over the past five years, it remains far from being fully deprecated. It’s still the default hash function for certifying PGP keys in the legacy 1.4 version branch of GnuPG, the open-source successor to PGP application for encrypting email and files. Those SHA1-generated signatures were accepted by the modern GnuPG branch until recently, and were only rejected after the researchers behind the new collision privately reported their results.

Git, the world’s most widely used system for managing software development among multiple people, still relies on SHA1 to ensure data integrity.

@”Me”

Hopefully this lights the appropriate fire under their but about fully deprecating this thing.

https://en.wiktionary.org/wiki/deprecate

“Deprecate” is a curse word. People have their head stuck up their backend repository system.

Clive Robinson • January 8, 2020 7:41 PM

@ David Leppik,

While I agree that a lot of uses may not actually need security from SHA-1, as far as I can tell PRNGs aren’t one of them.

It’s not a question of “need” as such more “perception” and “history”.

SHA was an NSA hash function that was in open libraries of code, which ment it was quick to add to code, even though the code ended up running more slowly than it would have using anyone of many insecure algorithms. However marketing pushed SHA usage as it was certified to be good by NIST in all sorts of ways the other algorithms were not. So they could put a “gold sticker” on the box (yup back then software still came in boxes, not all were even “shrink wrapped”).

Then things went wrong, in 1995 because the NSA had “apparently” goofed with SHA’s design and it was vulnerable. So SHA-1 came in with a hurry and SHA became a mark of badness, which is not Marketing friendly… So people startrd dumping SHA and moved over to SHA-1 as open libraries of code were quickly available. Thus the work input for a programmer was not that great, and marketing people got their “perception” of being proactive easily, and a new gold sticker to replace the old one.

The Mersenne Twister was not thought up for another couple of years and it’s entry into code bases was slow. In fact many assume it was a child of this century not towards the end of the 20th Century.

As we can see with the amount of SHA-1 code still around, oncr added to a code base people do not remove things unless they are declaired no just “bad” (2005, 2011, 2017) but “Pull your hair out and scream down the house because the sky is about to fall existentialy bad” (2020) and maybe not even then…

But if you want another example, how about a “Linus Torvalds” quote about SHA-1 and Git,

Nobody has been able to break SHA-1, but the point is the SHA-1, as far as Git is concerned, isn’t even a security feature. It’s purely a consistency check.

I will now “duck and cover” 😉

But ask the question as to why Linus chose to use SHA-1 as a “consistency check” when he had a choice of other algorithms, some of which would have been faster?

His answer at the time was,

it’s just the best hash you can get.

And that is the same point those marketing people in effect used, and the “big boss” was happy because not only was he told it was “the best” he was also told that “it’s free” and “comes with a NIST guarentee”. So “Best, Free, Guarentee” what’s not to like about SHA-1? Especially when you throw in an illusion to no “real” work for the programers to do as it’s all been done in a free library…

Basic “Free Market Economics” should tell you the what and why of how we got to where we are with SHA-1 today.

yyy • January 9, 2020 4:56 AM

@xxx

I would say A LOT easier. MD5 is outdated for decades and creating collisions was matter of hours in 2004. Forging something is harder, but disk image provides enough freedom* to do so.

In the case with forged SHA-1 certificate, they used user attribute packet with jpeg image, which ignores anything after the “end of image” marker, giving attacker enough freedom to manipulate with certificate.

Anders • January 9, 2020 8:56 AM

@xxx

Look at this:

http://www.mscs.dal.ca/~selinger/md5collision/

There’s two exe files, hello and erase, with
same MD5 hash, but different function. I haven’t
tried yet to create a disk image, but i think it
would be sufficiently easy to do with sector level
disk editor as there’s plenty of free room for such
a manipulation. Should be fun project actually to try.

Steven Clark • January 9, 2020 11:59 AM

@mostly harmful I got the paper and downloaded and imported the public keys. Their fingerprints don’t match, fortunately, because the fingerprint is largely a hash of the raw key instead of the key and the user ID as seen here. Getting a collision that is also a valid key is going to be difficult so my jab was a bit mean spirited and frustrated that I can’t tick this checkbox as done. This still effects GPGs web of trust model though as a signature applied to one key over the key and user data could then be transferred to the other key. You couldn’t use it to get RPM to check a package with a different key (which wouldn’t be an attack as both keys would have to be installed) for example. But maybe similar techniques could be used by padding the description with garbage to get RPM to accept malware in place of a trusted package. Which is fortunately fixed in new versions allowing sha256 signatures.

Electron 007 • January 9, 2020 12:29 PM

On a more serious note, a hash function such as SHA-1 needs to be considered irretrievably, irredeemably broken by the time a “practical” exploit or collision appears in the open literature.

And there’s a whole family of them: SHA-2, SHA256, SHA512, even a SHA-3, released by our beloved NSA for the general public to use for lawfully permitted kid-sister or baby-sitter cryptographic purposes. And then we learn that they had us all on a baby monitor the whole time.

Think about the private equity and hedge funds behind the various electronic cryptocurrencies.

https://www.varsitytutors.com/hotmath/hotmath_help/topics/domain-and-range

The basic problem is that the domain of a “hash function” is much greater in cardinality than its range. Consequently many different values in the domain of the hash function will unavoidably map to the same value in its range.

Check-sums, hash functions and the like can and do help assure data integrity, but the very idea of a collision-free many-to-few mapping is undependable and prima facie suspect even to the uninitiated, and in some light appearing somewhat as a mad-scientist endeavor to circumvent the second law of thermodynamics to recover entropy that has been irreversibly generated or dissipated.

mostly harmful • January 9, 2020 12:58 PM

@ Steven Clark

Thank you for straightening out some of my confusion over the implications (my domain-literacy dangerously close to that of the dog in Gary Larson’s Far Side cartoon “What Dogs Hear”).

SpaceLifeForm • January 9, 2020 1:33 PM

@ Electron 007

Well said.

Any concerns about BLAKE2 ?

Electron 007 • January 9, 2020 3:12 PM

@SpaceLifeForm

@ Electron 007

Well said.

Any concerns about BLAKE2 ?

An additional hash — based on radically different mathematical theories or ideas — to use alongside an existing algorithm as another cross-check for data integrity, can be a good thing, but my concern is a general over-reliance on hash functions in general, or a vain search for “one true hash” versus the huge infrastructure and investment of large sums of money, that has gone into “mining” and breaking the said hash functions.

We need to break away from that Bernstein cult or college fraternity, involve other great minds, and allow them to work independently. Tenured professors are under a general obligation to teach others freely what they have learned, not just to conduct their own proprietary research in an ivory tower with our patronage.

Felix • January 9, 2020 3:39 PM

@Electron 007

Cool down and spin down.
Until the digest is smaller than the message,
there’s ALWAYS collision possibility.
So fundamentally there is no possibility
of the “secure hash function”.

Dave • January 9, 2020 10:04 PM

@Steven Clark: Their fingerprints don’t match, fortunately, because the fingerprint is largely a hash of the raw key instead of the key and the user ID as seen here. Getting a collision that is also a valid key is going to be difficult

They also rely on stuffing a shorter fake key into the space for a ridiculously-sized original key, and a bunch of other special-case things set up specifically for this case. Not trying to bash the paper, but I’ve looked at it from several angles and it’s hard to see how to generalize this attack to fit outside some quite special-case situations. For starters it doesn’t affect any online protocols like TLS, SSH, IPsec, etc because for that you’d need to forge sigs in real time or close to it. It probably doesn’t affect X.509 because there’s no obvious way to set things up in the way you can for OpenPGP, and X.509’s ASN.1 adds a lot of fixed-format-and-position gunk to the encoding where you can’t vary things at random as for OpenPGP so S/MIME is probably also hard. Beyond that, you’re starting to get into the long tail where eventually you’ll find something else that’s vulnerable, but it’s so far out the end of the tail that it’s either not a worthwhile target for anyone (no value to an attacker to expend the effort) or there are so few users that it won’t have much effect if attacked.

Having said that, it’s a good paper, and more timely warnings to people to move away from SHA-1 if possible.

MrC • January 10, 2020 12:37 AM

Before panicking too much, remember that this is a collision attack, not a preimage attack. Most of the nightmare scenarios people are bringing up require a preimage attack.

trsm.mckay • January 10, 2020 4:34 PM

@ Electron 007
And there’s a whole family of them: SHA-2, SHA256, SHA512, even a SHA-3, released by our beloved NSA

One of these things is not like the other. SHA-3 originated through a NIST organized contest, and has significant differences from the SHA-1/SHA-2 family. Now the contest approach is not a guarantee of perfection. For example many have speculated that NSA designed the contest rules for AES to end-up with an algorithm that was particularity susceptible to side-channel attacks. So we can’t say that Keccak was designed entirely independent of the NSA…

Who? • January 12, 2020 6:25 AM

MD5 has been vulnerable the last twenty years, SHA-1 is vulnerable now. We should not trust on these hash algorithms alone. What about a file that has both MD5 and SHA-1 hashes? How difficult is achieving a double collision?

Lots of old files provide both MD5 and SHA-1 hashes, so this question matters.

TruePath • January 18, 2020 7:49 PM

I’m also a bit skeptical about the claimed impact on SCM because Git uses SHA-1 hashes. I mean it does but I don’t really see the threat model on which this is a widespread threat. Especially given that we can easily patch git servers and places like github to simply issue an error if there is ever a hash collision.

If you are talking about using the collision to tamper with mainstream packages like the linux kernel or whatever that means uploading a bad version of a commit to somewhere like github (or even some private git server) while simultaneously getting your submit accepted by the repo mainters without them noticing the weird crap you stuck in headers or whatever to execute the pre-image attack. That’s going to be computationally more difficult plus all the social difficulty of getting your patch accepted without modifications by a project sufficiently important for the targets to use it but sufficiently quite you won’t just be patched over or forced to merge. And all that on the hope the code server fails in the one way you can use (serving your malware version).

Given the large expense and the difficulty retaining anonymity while submitting the patch I don’t see much danger here. But maybe i’m missing something.

me • May 2, 2020 9:54 PM

who?, add the exponents.

New SHA-1 Attack

Comments

Leave a comment Cancel reply