CVE-2022-44889 - Text4Shell Analysed

As part of our Attack Surface Management capabilities delivered through the watchTowr Platform, we analyse vulnerabilities in technology that is likely to be prevalent across the attack surfaces of our clients. This enables our ability to rapidly PoC and identify vulnerable systems across large attack surfaces.

You may have heard of the recent 'Text4Shell' (or, more formally, 'CVE-2022-44889') bug in Apache Commons Text, and you might want more information about it to assess your position.

This vulnerability, on paper, gave cyber security professionals worldwide vivid flashbacks of responding to Log4Shell in December 2021 - with most of the social media pundits that we all love and follow declaring the end of the world.

However, this story starts a little less violently with an email to an Apache mailing list.

The initial posting to the Apache mailing list detailing the vulnerability is brief but gives us enough information to find more:

Apache Commons Text performs variable interpolation, allowing properties to 
be dynamically evaluated and expanded.

The standard format for interpolation is "${prefix:name}", where "prefix" is
used to locate an instance of org.apache.commons.text.lookup.StringLookup
that performs the interpolation. Starting with version 1.5 and continuing
through 1.9, the set of default Lookup instances included interpolators that
could result in arbitrary code execution or contact with remote servers.

These lookups are: 
 - "script" - execute expressions using the JVM script execution engine 
 - "dns" - resolve dns records 
 - "url" - load values from urls, including from remote servers
 
Applications using the interpolation defaults in the affected versions may be
vulnerable to remote code execution or unintentional contact with remote
servers if untrusted configuration values are used. Users are recommended to
upgrade to Apache Commons Text 1.10.0, which disables the problematic
interpolators by default.

Ok - a lot of words, but what does this actually mean? Let's break it down and explore this bug some more.

String Interpolation

"String Interpolation" is one of those terms that has very little meaning outside of software development circles. Although it is somewhat opaque, it expresses a straight forward concept - that of replacing values in a string at run time. For example, Python3.6 introduced 'f-strings', which permit interpolation. It's much easier to show this with an example than it is to explain it:

>>> mynumber = 4919
>>> print(f"My number is {mynumber}, you see")
My number is 4919, you see
>>>

As you can see, the variable 'mynumber' is substituted into the string before it is passed to the print function. In addition to this, Python's string interpolation provides formatting features - for example:

>>> print(f"My number is {mynumber}, which is {mynumber:x} expressed in hex!")
My number is 4919, which is 1337 expressed in hex!
>>>

Note the :x part of the format expression here. This instructs Python to render the number in hex. They're pretty versatile, allowing the developer to specify things like precision and padding. Apache Commons Text - the Java library which contains the vulnerability - contains several custom interpolator functions. For example, perhaps you have some base64-encoded data and you wish to log it. Instead of decoding it yourself and cluttering up the code, you can use the Base64Decoder format:

package log4text;

import org.apache.commons.text.StringSubstitutor;

class log4text {
    public static void main(String[] args) {
        final StringSubstitutor interpolator = StringSubstitutor.createInterpolator();

        final String logmsg = interpolator.replace(
        	"The message is: ${base64Decoder:SGVsbG9Xb3JsZCE=}\n");
        System.out.println(logmsg);
    }
}

This results in the output:

The message is: HelloWorld!

This is useful to a developer, but not very interesting to an attacker, right? Well..

Interpolating Too Much

Now that we know what string interpolation actually is, we can drill down into the meat of the bug itself. Looking at the advisory again, we can zoom in on the following:

These lookups are: 
 - "script" - execute expressions using the JVM script execution engine 
 - "dns" - resolve dns records 
 - "url" - load values from urls, including from remote servers

This would seem to suggest that, just as we used a 'hex' format specifier in our previous example, the commons-text library permits the use of a 'script' specifier to evaluate code at runtime. If we take a look at the changeset which fixes the bug, we can see that the issue was remediated simply by removing a large swath of interpolation functionality - the removed test cases make good examples of how to trigger the bug. Here's one minimised example, showing the dns interpolator:

final String text = interpolator.replace("DNS: ${dns:address|apache.org}\n"

Let's try out a similar thing in our skeleton app.

package log4text;

import org.apache.commons.text.StringSubstitutor;

class log4text {
    public static void main(String[] args) {
        final StringSubstitutor interpolator = StringSubstitutor.createInterpolator();

        final String logmsg = interpolator.replace(
        	"The message is: ${dns:address|watchtowr.com}\n");
        System.out.println(logmsg);
    }
}

Our output:

The message is: 99.83.190.102

Neat - the dns:address specifier has performed a DNS lookup and returned the result. If this text is supplied by an attacker, they could perform lookups of arbitrary domain names - bad, but not terrible. What could be worse? Well, the script interpolator. Taking a look at the (now removed) testcase for this interpolator fills me with dread:

Assertions.assertEquals("7",
	StringSubstitutor.createInterpolator().replace(
    	"${script:javascript:3 + 4}"));

Yikes! If this string comes from an attacker, they can execute their own JavaScript payloads, simply by including the magical script:javascript interpolator.

For the sake of completeness, there is a third 'lookup' which is referenced by the advisory - the url lookup. Sifting through the patch, we notice the following removed text:

"URL Content (HTTP):    ${url:UTF-8:http://www.apache.org}\n"
"URL Content (HTTPS):   ${url:UTF-8:https://www.apache.org}\n"
"URL Content (File):    ${url:UTF-8:file:///${sys:user.dir}/src/test/resources/document.properties}\n"

This would allow for an entire file to be fetched and inserted into the rendered string - from an arbitrary HTTP or HTTP endpoint, bringing SSRF concerns, or simply from an accessible file. If the user has the ability to view the output of the interpolated string, they could simply send a payload such as ${url:UTF-8:file:///etc/passwd, for example, to obtain sensitive files.

Relation to Log4Shell

Most people in defensive roles remember the catastrophe that was log4shell, a similar bug that was disclosed at the end of 2021. Log4Shell also came into existence when untrusted input as examined for format information, devastatingly permitting an interpolated string to specify a Java binary which it would then load from a remote source, in order to format the output.

Mitigation

As the advisory notes, updating to the latest Apache Commons Text (1.10.0) fixes the vulnerability.

It does so by disabling dangerous interpolation operators (that is, url, dns, and script) by default. If you are unable to update, preventing the flow of untrusted data to Apache Commons Text is an acceptable (although perhaps difficult-to-achieve) stopgap.

Those who wish to hunt for exploitation attempts could start by looking for traffic containing the interpolation operators - ${script:javascript:, for example. It seems unlikely that there is a legitimate use case for these, although the labyrinth that is modern enterprise software may have found one.

Conclusion

This rapid analysis allowed watchTowr to rapidly and proactively enumerate vulnerable systems across our client base, while now continuously testing for the vulnerability.

We believe this is an interesting example of a high-impact vulnerability that would have a material impact on any organisation - and requires a rapid reaction to identify vulnerable systems before Internet-wide exploitation begins. Your penetration test 3 months later would be too late.

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about how the watchTowr Platform, our Attack Surface Management and Continuous Automated Red Teaming solution, can support your organisation, please get in touch.