The Tricky Aftermath of Source Code Leaks

Lapsus$ hackers leaked Microsoft’s Bing and Cortana source code. How bad is that, really?
An illustration with a collage of browsers and cyber security icons.
Illustration: Elena Lacey

The Lapsus$ digital extortion group is the latest to mount a high-profile data-stealing rampage against major tech companies. And among other things, the group is known for grabbing and leaking source code at every opportunity, including from Samsung, Qualcomm, and Nvidia. At the end of March, alongside revelations that they had breached an Okta subprocessor, the hackers also dropped a trove of data containing portions of the source code for Microsoft's Bing, Bing Maps, and its Cortana virtual assistant. Sounds bad, right?

Businesses, governments, and other institutions have been plagued by ransomware attacks, business email compromise, and an array other breaches in recent years. Researchers say, though, that while source code leaks may seem catastrophic, and certainly aren't good, they typically aren't the worst-case scenario of a criminal data breach.

“Some source code does represent trade secrets, some parts of source code may make it easier for people to abuse systems, but accounts and user data are typically the biggest things companies have to protect,” says Shane Huntley, director of Google's Threat Analysis Group. “For a vulnerability hunter, it makes certain things easier, allowing them to skip a lot of steps. But it’s not magic. Just because someone can see the source code doesn't mean they'll be able to exploit it right then.”

In other words, when attackers gain access to source code—and especially when they leak it for all to see—a company's intellectual property could be exposed in the process, and attackers may be able to spot vulnerabilities in their systems more quickly. But source code alone isn't a road map to find exploitable bugs. Attackers can't take over Cortana from Microsoft or access users' accounts simply because they have some of the source code for the platform. In fact, as open source software shows, it's possible for source code to be publicly available without making the software it underpins less secure.

Google's Huntley points out that the same broad and diverse vetting needed to secure open source software is also vital for critical proprietary source code, just in case it is ever stolen or leaks. And he also notes that major vulnerabilities in open source software, like the recent Log4j flaws, have often lurked undiscovered for years or even decades, similar to inconspicuous typos that aren't caught by an author, editor, or copyeditor. 

Microsoft detailed its Lapsus$ breach on March 22 and said in a statement that “Microsoft does not rely on the secrecy of code as a security measure and viewing source code does not lead to elevation of risk.”

Typically, security researchers and attackers alike must use “reverse engineering” to find exploitable vulnerabilities in software, working backward from the final product to understand its components and how it works. And researchers say that process can actually be more helpful than looking at source code for finding bugs, because it involves more creative and open-ended analysis than just looking at a recipe. Still, there's no doubt that source code leaks can be problematic, especially for organizations that haven't done enough auditing and vetting to be sure that they've caught most basic bugs.

Brett Callow, a threat analyst at the antivirus company Emsisoft, also points out that attackers have a clear interest in making source code leaks sound as damaging as possible, regardless of the reality for a particular organization.

“Attackers want to make the incident seem as bad as they possibly can, and that isn’t simply to extract payment from the current victim," Callow says. “It’s also sending a warning shot to their future victims saying, ‘Look how much attention these incidents can bring; we make your life thoroughly miserable. The easiest and least painful option is simply to pay us!’”

In practice, though, Callow says that while some data breach victims have specific concerns about source code leaks, they aren't the highest-priority concern for most organizations. “It isn’t to say it can never be problematic, just that it usually isn’t,” he says. 

The bigger concern about source code leaks often isn't about the source code itself. Rather, if an attacker has compromised something as highly guarded as source code, it could mean that they've grabbed other crown jewels like sensitive user data, encryption keys, or code-signing certificates, which are meant to verify that a piece of software hasn't been altered by a malicious actor. If stolen, these have more urgent and immediate ramifications for the security of a company, its products, and, most importantly, its customers.

Most dangerous of all, if an attacker can not just access or steal a copy, but change a product's source code through a software update or other manipulation, that's the type of breach that can have dire consequences.


More Great WIRED Stories