Path-Style Model for AWS S3 Can Lead to Namespace Squatting

Summary

Path-style model for AWS S3 and other services supporting S3 APIs can lead to namespace squatting. An attacker can create a bucket that shares the name with a special filename like “robots.txt” or “crossdomain.xml”, and insert their own content via filenames placed in that bucket. Services that rely on filename verification of domain ownership and are not precise about checking the content of such files, may end up verifying the ownership of the parent domain incorrectly. We have not yet been able to confirm this via testing.

AWS will be deprecating this functionality as of September 30th, 2020.

Details

Amazon Web Services (AWS) provides a storage service called Simple Storage Service (S3) which allows users to storage data as files located inside separate locations called buckets (see docs). S3 currently supports two different addressing models: path-style and virtual-hosted style. The path-style looks like this:

https://s3.amazonaws.com/bucket/file

The virtual-hosted style looks like this:

https://bucket.s3.amazonaws.com/file

It is possible to name a bucket using a reserved name like “robots.txt”, “sitemap.xml” or “crossdomain.xml” and have that being available via the path-style addressing. HOWEVER, the only thing that would get returned is an XML-type directory listing. An attacker can add additional files into that bucket to try to influence the directory listing, but most parsers would disregard the entire file since it is malformed. What may end up happening is that the user will essentially squat this namespace.

It is not possible to reserve anything in the “.well-known” directory since it starts with a period and bucket names must start with a lowercase letter or a number. This it would not be possible to get an SSL certificate issued this way.

Additionally, if a third party service like Google WebMaster tools, Bing, etc. uses a domain validation approach to verify ownership by placing a file in the root directory, it may be possible to claim the “s3.amazonaws.com” domain as follows:

1. Create a bucket matching the verification name of the file.
2. Add the verification content as a key in that bucket.
3. Make the bucket public.

When the verification service hits the URL for “s3.amazonaws.com/verification.html” they will hit the bucket listing that was created. If the service disregards the XML and uses the value it finds, it may end up registering the service domain in the user’s account.

In our testing we have not yet found a service like that – most services will not parse an XML file that the directory listing produces.

Vendor Response and Mitigation

The vendor provided the following response:

We do not believe the behavior you describe in this report presents a security concern, given what you have outlined is theoretical.

Additionally, AWS has announced that the path-style addressing model will be deprecated as of September 30th, 2020 (see here and here).

Credits

Text written by Y. Shafranovich.

Timeline

2019-02-03: Initial report to the vendor
2019-02-06: Followup communication with the vendor
2019-02-12: Followup communication with the vendor
2019-02-18: Followup communication with the vendor
2019-02-19: Followup communication with the vendor
2019-05-03: Followup communication with the vendor
2019-07-28: Draft blog post sent to the vendor for review
2019-08-14: Public disclosure

Research: The Dangers of Proxying S3 Content

Background

It is common for organizations to use Amazon’s S3 service as a place to host static assets and other content. The content within Amazon S3 is organized in “buckets”. Amazon also provides ability to point custom domains at S3 buckets through virtual hosting or the static website endpoints. In both cases, a CNAME mapping is created from the custom domain to an Amazon domain name.

However, SSL support is not available via the custom domain name, but SSL is provided if either “s3.amazonaws.com/<bucket-name>” URL or the “<bucket-name>.s3.amazonaws.com” domain name is used directly (as long as there are no periods in the bucket name). The reason why SSL doesn’t work when accessing the plain domain names is because Amazon is not able to provide certificates for them because they are not the domain owner and current S3 functionality does not allow custom certificates to be loaded. However, for “s3.amazonaws.com” domains, Amazon provides a wildcard certificate which works just fine. If you try to access the domain names directly, you will be served content with the same wild card certificate which of course would not match the domain names.

Possible SSL solution – CloudFront or Another CDN

One possible solution offered by AWS is to use their CDN offering called CloudFront. In that case, you can setup two CloudFront domains that sit in front of the S3 buckets and CNAME your domain names to them. This of course comes at a higher price and a confusing set of options: you can use the cloudfront.net subdomains, or a free SNI-enabled SSL certificate not compatible with older browsers or a costly ($600/month/per domain) option to upload your own SSL certificate. The data would then flow as follows:

[S3] >—-internal AWS network—-> [CloudFront] >—–SSL—-> [users]

Another set of solutions is to use a non-AWS CDN like CloudFront, etc. and have the CDN proxy the content with SSL. The setup would be similar to Amazon with SNI and non-SNI SSL options available. The data flow would then look like this (for CloudFlare):

[S3] >—-HTTP—-> [CloudFlare] >—–SSL—-> [users]

What You Should Not Do – Proxy S3 Content Yourself

Of course many developers would immediately react to this particular problem in the same way: I can do it better by myself! The usual solutions is to have a script or a webserver rule that will automatically retrieve the content from S3 and display it to the user. So it would look like this:

Everything after “/static/” would that be retrieved from some S3 bucket, let’s say “marketing.example”. So the following path would be followed:

Of course this only lasts as long as there is only one bucket. Let’s say now another bucket is needed called “support.example”. So the script will become something like with the bucket name in the URL:

What will often happen at this point is that the developer will not realize that the bucket names need to be validated against a whitelist of valid buckets. Because S3 bucket names are not unique to one AWS user but share a global namespace across all S3 users, this script would be able to retrieve data from any other S3 bucket, as long as it is public. This will not happen when using CloudFront or other CDNs because they will be mapped 1-to-1 against a specific S3 bucket.

How will this look like? If an attacker can figure out that the script takes arbitrary bucket names, they can go ahead and create a new bucket called “evil.example” and then use the following URLs to retrieve content from it:

What can this be leveraged for? Some examples:

  • Serving malware since the content will be served under the target domain and the target SSL certificates
  • Facilitating phishing attacks
  • XSS since HTML / JS content will bypass the same origin policy since it is served from the same domain as the target
  • Stealing session cookies since the code will run in the same domain and have access to cookies
  • If the content is retrieved using the S3 APIs, then an attacker could setup a “Requester Pays Bucket” and make money off the target (although Amazon would probably catch this eventually)
  • [insert your exploit here]

Recommendations

  • Don’t re-invent the wheel, use an existing solution like CloudFront, or some other CDN
  • If you must proxy content yourself, make sure you have a whitelist of valid buckets, and use other technologies like subdomains, HTTPOnly cookies, CSP headers, etc. to segregate the S3 content from the rest of the site