Application Security , Next-Generation Technologies & Secure Development

Trojan Source: Invisible Vulnerabilities in Most Code

This Flaw Could Lead to an Attack Like SolarWinds
Trojan Source: Invisible Vulnerabilities in Most Code
The wooden Trojan Horse used by the Greeks during the Trojan War to enter the city of Troy (Image source: ISMG)

Two researchers from the University of Cambridge have discovered a vulnerability that affects most computer code compilers and many software development environments, they report in a research paper titled "Trojan Source: Invisible Vulnerabilities."

See Also: State of the Internet/Security Report: API: The Attack Surface That Connects Us All

The researchers say that the bug could add subtle vulnerabilities into open-source projects and cause a SolarWinds-like open-source supply chain attack scenario. This attack is particularly powerful within the context of software supply chains and if an adversary successfully commits targeted vulnerabilities into open-source code by deceiving human reviewers, downstream software will likely inherit the vulnerability, they say.

The vulnerability affects almost all compilers that transform human-readable source code into computer-executable machine code, according to Nicholas Boucher and Ross Anderson, who wrote the research paper. They say they informed 19 independent companies and organizations - including Red Hat, Atlassian and Rustc - in a coordinated disclosure effort.

The Vulnerability

Tracked as CVE-2021-42574, this vulnerability resides in the Bidirectional Algorithm of the Unicode that handles both left-to-right - as in English - and right-to-left - as in Arabic and Hebrew - script display orders. Computer systems need to follow only one direction to avoid directionality conflict in text, and this is done by the BiDi override, which makes left-to-right text read from right to left - and vice versa - the researchers say.

According to the CVE listing, "[The vulnerability] permits the visual reordering of characters via control sequences, which can be used to craft source code that renders different logic than the logical ordering of tokens ingested by compilers and interpreters. Adversaries can leverage this to encode source code for compilers accepting Unicode such that targeted vulnerabilities are introduced invisibly to human reviewers."

To explain how the exploit works in certain editors and code review tools, Atlassian staff member Srivathsav Gandrathi provides an example, saying, "When you copy and paste a code snippet with bidirectional override characters to a vulnerable code editor/block, Unicode characters that change the order of the text are not displayed. Many websites and online editors don’t render these special characters, so a developer could unintentionally introduce an attacker’s code into their own codebase by copying and pasting a code snippet from another vulnerable website, without realizing it."

In short, the reviewed code is different from the compiled code. Another example from Rustc, shown below, gives more clarity on this:

Example of bidirectional override (Source: Rustc)

Although the CVE program has not assigned a severity for this vulnerability, Atlassian, whose cloud, server and data center products were affected by the flaw, has rated its severity as "high," based on Atlassian severity levels. Red Hat Product Security, however, has rated this flaw as having a moderate security impact, based on its CVSS v3 severity metrics.

Exploit Techniques

The researchers note that there are at least three different techniques to exploit the visual reordering of source code tokens. They are:

  • Early returns: In the early-return exploit technique, adversaries disguise a genuine return statement as a comment or string literal, so they can cause a function to return earlier than it appears to, the researchers say. It causes a function to short-circuit by executing a return statement that visually appears to be within a comment.
  • Commenting-out: In this exploit technique, text that appears to be legitimate code actually exists within a comment and is thus never executed. This allows an adversary to show a reviewer some code that appears to be executed but is not present from the perspective of the compiler or interpreter.
  • Stretched strings: This exploit causes portions of string literals to visually appear as code. Text that appears to be outside a string literal is actually located within it and has the same effect as commenting-out, causing string comparisons to fail.

The researchers have validated these attacks by implementing proof-of-concept attacks in the C, C++, C#, JavaScript, Java, Rust, Go and Python programming languages and have successfully verified it on GNU’s gcc v7.5.0 on Ubuntu and Apple clang v12.0.5 on MacOS.

The Other Variant

The Cambridge research duo also found another similar attack that uses homoglyphs, or characters that appear near identical. This is tracked as CVE-2021-42694.

Homoglyph function attack in C++ (Source: Trojan Source research paper)

In the above example, researchers show how a homoglyph attack can be carried out in C++. They used two H's that look similar but actually are different - the Latin H, in blue, and the Cyrillic Н, in red. This program outputs the text Goodbye, World! when compiled using clang++, the researchers say.

This example is not malicious in nature, but the researchers note that "an attacker can define such homoglyph functions in an upstream package imported into the global namespace of the target, which they then call from the victim code."

Raising the Defense

The simplest defense tactic that researchers suggest is to ban the use of text directionality control characters both in language specifications and in compilers implementing these languages. To further explain the tactic, they divide it in to three parts:

  • Compilers, interpreters, and build pipelines supporting Unicode need to display errors or warnings for unterminated bidirectional control characters in comments or string literals and for identifiers with mixed-script confusable characters.
  • Language specifications should formally disallow unterminated bidirectional control characters in comments and string literals.
  • Code editors and repository front ends should make bidirectional control characters and mixed-script confusable characters perceptible with visual symbols or warnings.

The vulnerabilities were originally reported to 19 companies on July 25 and were under a 99-day embargo period that ended on Nov.1. The researchers noted that 11 of the affected companies had a bug bounty program and said five of those programs rewarded them with an average payment of $2,246.

The researchers also shared their findings with the CERT Coordination Center sponsored by the U.S. Cybersecurity and Infrastructure Security Agency, which gave all affected vendors access to VINCE, a tool providing a shared communication platform across vendors implementing defenses.

Tim Erlin, vice president of strategy at Tripwire, acknowledges that there’s an obvious challenge in patching this vulnerability across the variety of components affected, but he says the bigger issue is finding instances of it being exploited in the wild.

Erlin told Information Security Media Group: "It’s tempting to think that the initial public disclosure is the moment when everyone finds out about the vulnerability, but history suggests that attackers, who have a vested interest in keeping it to themselves, are likely to have known about these types of conditions before public disclosure."

Citing a discussion on GitHub from 2017, in which a user seems to explain a very similar issue Erlin says, "There’s evidence that this disclosure isn’t actually the first for this type of attack."


About the Author

Mihir Bagwe

Mihir Bagwe

Principal Correspondent, Global News Desk, ISMG

Bagwe previously worked at CISO magazine, reporting the latest cybersecurity news and trends and interviewing cybersecurity subject matter experts.




Around the Network

Our website uses cookies. Cookies enable us to provide the best experience possible and help us understand how visitors use our website. By browsing databreachtoday.com, you agree to our use of cookies.