By — Aliz Hammond — Feb 3, 2023

"I don't need no zero-dayz" - Docker Container Images (1/4)

This post is the first part of a series on our recent adventures into DockerHub. You may be interested in the next instalments, which are:

Here at watchTowr, we love delving into 'forgotten' attack surfaces - resources that are exposed to the public, forgotten about, and give us all the information we need to gain initial access to an organisation.

The more dark corners of an attack surface that we can see, the more we fulfil our ultimate goal - helping our clients understand how they could be compromised today.

Today, I'd like to share some research that we recently undertook in order to assess the scale of one such oft-neglected attack surface - that is exposed by public Docker container images.

Through the watchTowr Platform's analysis of attack surfaces, we frequently discover sensitive data in Docker container images. In order to more formally assess the scale of this problem, and also to raise awareness of the associated veiled attack surface, we decided to embark on a research project searching a representative portion of available DockerHub images for sensitive information. Our expectation at this point was that we would discover access keys for small-scale applications, but we soon found that even large organisations were not without weaknesses in this area.

In this post, the first of a four-part series, we will discuss an overview of our findings at a high level. Subsequent posts will explain our technical methodology in more detail, and offer additional insight into the problems we encountered during the research - as well as observations on the distribution of sensitive data and recommendations to those who wish to avoid their own inadvertent exposure.

What Are Docker Containers?

Docker containers, favoured by application developers and DevOps teams worldwide, are used to package an entire system configuration into a single 'container', making it easy for end-users to start an application without having to configure their own system appropriately. For example, a developer who produces forum software could make such a container available to the wider public, containing not only their forum software, but also a database and web server, plus the appropriate configuration files to enable the application to "just work" out-of-the-box.

These 'containers' are usually stored in a 'registry', which can be configured to allow access only to an organisation's employees, or to the wider public. One such registry, 'DockerHub', is operated by the developers of Docker themselves, and is designed to allow the public to easily share their images, in a similar manner to how GitHub allows developers to easily share their code.

When developing such a 'container', however, there is a risk that sensitive data is inadvertently included in the configuration of a container published publicly. This results in the wider public being able to access such sensitive data, and use it for nefarious means, which can often have a significant impact (discussed below).

One example could be in the hypothetical forum software I mention above. If the database is configured to allow remote connections, and it is also configured with a hardcoded password, then an attacker could extract the password from the container image and use it to access any instance of the software built from that container image.

Everyone Gets A Secret! - Over 2,000,000 Secrets

At this point it is important to note the inherent difficulty in appropriating or validating secret data held by organisations with which we are not involved. For example, we found some data claiming to be a list of passwords for email accounts - but it is impossible for us to verify their validity without simply using them to log in, which would cross ethical and legal boundaries. Also difficult to classify are API tokens, which are strings similar to passwords, but with varying levels of permissions - given an API token, it is impossible for us to determine if it is a token holding powerful privileges, or a low-privilege token designed to be made public.

While we've taken every effort to remove obvious false positives (for example, we saw secret keys such as "NotARealKey", which are obviously bogus), some may remain in the dataset. For purposes of calibration, however, here is an example file which we consider valid:

{
        "cf_email": "<redacted>",",
        "cf_apikey": "<redacted>",
        "cf_org": "<redacted>Prod",
        "cf_space": "<redacted>prodspace2",
        "cf_broker_memory": "512M",

        "devex": {
                "ace_app_space": "opsconsole",
                "ace_app_suffix": "dev",
                "bssr_client_id": "<redacted>",
                "bssr_client_secret": "<redacted>",
                "node_env": "production",
                "session_key": "opsConsole.sid",
                "session_secret": "<redacted>",
                "slack_endpoint": "<TBD>",
                "uaa_callback_url": "<redacted>",
                "uaa_client_id": "<redacted>",
                "uaa_client_secret": "<redacted>"
        },
        "CLOUDANT_USER": "<redacted>",
        "CLOUDANT_PASSWORD": "<redacted>",
        "BLUEMIX_CLOUDANT_DB_NAME": "<redacted>",
        "TERRAFORM_CLOUDANT_DB_NAME": "<redacted>",
        "CLOUDANT_DB_URL": "<redacted>",
        "BLUEMIX_SPACE": "<redacted>",
        "UAA_CLIENT_SECRET": "<redacted>",
        "UAA_CALLBACK": ""<redacted>",",
        "SESSION_CACHE": "Redis-cam-ui",
        "TERRAFORM_PLUGIN_ENDPOINTURL": "<redacted>/api",
        "CAM_TOKEN":"<redacted>",
        "A8_REGISTRY_TOKEN": "<redacted>",
        "A8_CONTROLLER_TOKEN": "<redacted>",
        "REDIS_HOST": "<redacted>",
        "REDIS_PASSWORD": "<redacted>",
        "REDIS_PORT": "<redacted>",
        "SRE_SLACK_WEBHOOK_URI":"https://hooks.slack.com/services/<redacted>
        ",
}

Looking at this file, we can estimate that 'cf' represents CloudFare, the wildly popular CDN. We also note credentials for "Terraform", which is a system used by DevOps engineers for provisioning entire 'fleets' of servers to build entire environments. Furthermore, there are connections to Slack, the popular business chat application, and other services. It is easy to imagine a scenario in which these credentials are leveraged to deploy ransomware across a large environment, resulting in considerable loss of business and revenue, especially given the presence of the string 'prodspace2', suggesting that this file contains secrets for a production environment.

This file is particularly interesting as it belongs to a fortune-10 organisation, who are well-funded and motivated to secure their environments - we hope. Our initial expectations of finding 'low-hanging fruit' are clearly, slightly exceeded here.

Our initial findings located a vast amount of such secrets - around 2.5 million across a small-subset of container images reviewed from DockerHub - roughly 22,000 images representing roughly 0.4% of all available container images.

Let's drill down into this huge number and find the most interesting data.

150,000 Unique Secrets

While a naïve analysis of the dataset suggests over 2.5 million credentials, the watchTowr team quickly identified a large number of 'false positives' and pared this number down to slightly over 1.1 million. Many of these are duplicated, however, and removing these duplicated results reveals a core of 152,000 unique secrets, sourced from around 22,000 vulnerable Docker images.

*From the 2,500,000 initial potential credentials we pare down to 152,000*

We can further explore this dataset, breaking down our results and categorising by the service each secret applies to. This helps us suggest a more realistic assessment of the true risk that exposure entails.

*The types of credentials we discovered*

The majority of the secrets, as you can see, are those for "generic" API endpoints. These typically enable applications to integrate with external services, such as Cloudfare or Terraform. While this category is interesting, remaining credentials have been categorised by the service they are associated with, which makes analysis much more useful. Let's put these "generic" results aside for now, and look more into the second-largest category - that of 'Cloud provider'.

Cloud Platform Access Keys - Cryptomining For Days

Once we set aside the 'generic' keys, we can see a large amount of credentials (around 23% of our total) categorised as being for a 'Cloud Provider'. These services, such as Amazon's ubiquitous 'AWS' and Google's 'GCP', enable developers and devops teams to virtualise their compute resources, often running the entire companies infrastructure from one cloud account.

As mentioned before, these credentials are typically generated by developers and assigned a level of privilege appropriate to their intended use-case, with some keys designed for public consumption and some used for sensitive operations. While it is impossible for us to determine the privileges associated with each key without crossing ethical and legal boundaries, it is difficult to overstate the risk that these keys can potentially represent. Even on an unused account, a malicious actor can use these keys to run cloud services to mine cryptocurrency - an activity popular enough that it has spawned public tools such as the Denonia crypto miner to ease the process. With an active account, however, things are even worse, as a powerful cloud token allows access to virtual servers, S3 buckets, virtual networks, VPN traffic, and other resources.

History shows us of the potential consequences of a cloud provider credential leak. Back in 2021, AWS accounts belonging to Ubiquiti were allegedly leaked by a disgruntled ex-employee, resulting in an extortion attempt (more info here) and a significant drop in stock price, highlighted in the red box below:

*Stock market price change correlated to an AWS compromise (red rectangle)*

This is not an isolated incident - those with longer memories may recall the 2020 story of a Cisco ex-employee who maliciously deleted ~450 virtual servers from Cisco's AWS account, resulting in one million USD of customer refunds and a further 1.4 million USD in employee time (more info here) as Cisco scrambled to recover during a two-week period. Clearly, the dangers of an exposed and compromised cloud provider account are significant and well-known.

🗨️

Over 30,000 unique access keys for Amazon AWS were found

However, this was not reflected in our results. We found slightly over 30,000 unique AWS keys, a truly terrifying number (even though we expect a portion of these to be low-privilege tokens). Indeed, AWS keys alone form around 23% of the credentials we discovered.

Other cloud services were not absent from the data, either, as we also found almost 90 keys for the competing Google Compute Platform, as well as credentials for other services such as Alibaba Cloud. These keys allow a similar-scale breach via other cloud services.

We leave it to the reader's imagination and critical thinking as to the reason for the huge difference in the number of exposed keys per platform.

Payment Processing - Issuing All Of The Refunds

The keys for payment processors are often valued by attackers for obvious reasons - they often represent the ability to transfer funds, issuing refunds or making payments. Any single leaked credential of this type almost certainly spells disaster for the owner - yet we still found over a hundred instances of these tokens in our crawl.

*Zooming in on remaining secrets, we can see payment processor tokens are also exposed en-mass*

These tokens are for a variety of services, but the most well-represented in our dataset is the payment processor 'Stripe', which is particularly popular for websites which wish to provide e-commerce abilities without handling card data directly. Stripe themselves comment on the power of these keys here:

Your secret API key can be used to make any API call on behalf of your account, such as creating a charge or performing a refund.

It seems obvious that the presence of these keys in our dataset is worrisome. What else do we have?

As shown in the graphic above, social media tokens are also prevalent in our haul, as 301 images contained at least one such access token. We estimate these to allow an adversary to post and manipulate an organisation's online presence. Around 10% of these tokens were for Facebook, while LinkedIn and Twitter were also popular. We also noted tokens for mass-email companies such as SendGrid and MailGun.

*Credentials for a variety of social media sites were discovered*

While a breach of an organisation's social media account may seem minor compared to a compromise of payments, it still represents a significant risk. Years of careful public relations can be undone quickly as customers lose faith in an organisation's security posture, or sensitive data may be recovered via the accounts 'Direct Messaging' feature. Often, attackers use compromised social media accounts to spread cryptocurrency scams for a quick payout (for an example, see here):

*Bill Gates isn't really giving away Bitcoin!*

While often overlooked, the use of a compromised social media account to facilitate additional social engineering attacks inside an organisation should not be underestimated.

Related to social media, we found 67 images containing tokens for the popular business-oriented communication application Slack. These tokens may allow an attacker to disrupt communications and even read private messages sent and received by employees via the service. While it is tempting to doubt the severity of such a breach, there are historic examples of attackers leveraging Slack - the 2020 breach of Animal Jam involved compromised AWS credentials exposed via a breach of their Slack environment.

SSH

On a more technical note, we found 559 images (~2.5%) containing a cryptographic 'host' key pair, typically used to secure a 'Secure Shell' (or 'SSH' ) connection for encrypted communication. We were able to determine that a subset of these keys are in use by active, publicly accessible endpoints on the wider internet. More seriously, 189 images contained at least one authentication key - potentially allowing an attacker to authenticate and execute commands on the host. Historic breaches involving such keys include the 2016 breach of DataDog.

Other Secrets

Of all the users of the Internet, few are more concerned with security than users of cryptocurrency. Nevertheless, we discovered six Docker containers which held cryptocurrency wallets, allowing anyone to access funds and perform transfers. We even found a key for an "onion" hidden service, as used by the ToR project to communicate extremely sensitive information securely.

Finally, we noted a small number of Docker images which included credentials to further Docker registries, such as those owned privately by an organisation.

The only discernible barrier to finding more credentials in this dataset seemed to be the amount of time we were willing to invest in searching, which does not bode well for anyone.

The Dataset

Finally, it may help to provide a quick summary of the dataset, to give an idea of the scale of our research.

Over a period of around three weeks, we obtained the newest tag from 22,194 docker images, which we believe is around 0.4% of the total available repositories at the time of publishing. These images contain around 240 million files, of which approximately 32 million (around 3.8 TB) are unique in their contents.

🗨️

Our dataset spans almost 4TB of data in 32 million unique files, taken from over 22,000 docker images

We selected docker images randomly, and analyzed only their newest tag, in order to avoid a dataset skewed by a single container. For simplicity, we analyzed only Linux-based containers on the amd64 platform. We believe our sample to be representative of the broader DockerHub file collection.

We used Amazon EC2 for the project, using mostly t2.xlarge workers (more detail will follow in subsequent posts). Our total spend was under 2000 USD, which speaks to the accessibility of the data we collected - this sum is low enough that an adversary might expect to recoup it by monetising access to the resultant stolen data.

As you can imagine, locating sensitive information in such a large dataset is a technical challenge. We've taken every effort to remove false positives from the results we present here, but some may still remain - this topic will be discussed at length in a subsequent post. With some notable exceptions, it is impossible to verify if a secret is truly sensitive (for example, if a credential is valid) without actually trying to use it, which is not something we are able to do.

Summary

We're quite excited to share the methodology and further insights we gained from this research. To that end, we will be posting a series of weekly blog posts going into more technical detail around specific topics. Expect the following posts:

Next up, "Layer Cake: How Docker Handles Filesystem Access (2/4)" will go into detail about how Docker assembles filesystem images for containers, how these are stored on DockerHub, and how we can best archive them,
After this, "Learning To Crawl (For DockerHub Enthusiasts, Not Toddlers) (3/4)" will lay out a general overview of our archival process, and then dive into the specifics of how it operates,
And finally, "What Does This Key Open? - Detecting Secrets (4/4)" will talk about how we customised an open-source tool to crawl through the dataset and locate interesting information. We will go into detail about our findings, and end the series with some final thoughts on how to secure your environment.

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about the watchTowr Platform, our Continuous Automated Red Teaming and Attack Surface Management solution, please get in touch.

At watchTowr, we passionately believe that continuous security testing is the future and that rapid reaction to emerging threats single-handedly prevents inevitable breaches.

With the watchTowr Platform, we deliver this capability to our clients every single day - it is our job to understand how emerging threats, vulnerabilities, and TTPs could impact their organizations, with precision.

If you'd like to learn more about the watchTowr Platform, our Attack Surface Management and Continuous Automated Red Teaming solution, please get in touch.