So we all know how caching works, right? Let’s have a quick overview to be sure.
Caching uses a high-speed data storage layer to store a subset of transient web data so future requests for the same data serve up fast, without the need to access the data’s primary storage location. Cached data usually holes up in fast-access hardware like RAM (Random-access memory), the primary purpose being to increase the speed of data retrieval by reducing the need to access the underlying slower storage layer.
Many businesses and organizations host their own cache using software like Varnish, while others opt for Content Delivery Networks (CDN) like Cloudflare that scatter cache across distributed geographic locations. Then there are content management applications like Drupal with built-in cache. Lots of permutations, the same purpose: boosting performance on both the server and client sides.
Ok, so now we know how caching works, but what is cache poisoning? Let’s get down and dirty.
Web cache and DNS cache poisoning work by sending a request that results in a harmful response. This response is then saved to cache, which serves it up to other users in the form of bogus internet addresses (DNS corruption) and even complete web-cache misappropriation and application takeover.
HTTP protocols within the web-caching mechanism only perform integrity checks on the server side. This lack of validation and authentication gives attackers an opportunity to poison the cache repository.
When a cache receives a request for a resource, it must decide if it already has a saved copy of this exact resource to reply with or if it needs to forward the request to the application server to retrieve the resource. Identifying whether two requests are attempting to load the same resource is tricky; matching requests byte-for-byte is ineffective because HTTP requests are chock full of immaterial data, like information about the user’s browser and system. Cache Keys allow us to sidestep this problem by defining what information in a visitor's HTTP request matches resources in cache.
Composed of a small number of specific HTTP request components, Cache Keys fully identify the resources the user is requesting. The upshot of Cache Key composition is that a cache finds equivalences in two separate requests, responding to the second with data cached from the first. The vulnerabilities here are pretty obvious.
In theory, sites can use the Vary response header to specify additional request headers to key. In practice, Vary header usage is pretty scant; CDNs ignore it completely, and most people don't even know if their application supports header-based input.
Cache poisoning isn't an end in itself, but rather a conduit for the exploitation of secondary vulnerabilities like XSS (cross-site scripting). By exploiting secondary vulnerabilities, attackers can progress from simple single-request attacks to more complex exploit chains that hijack JavaScript, skip across cache layers, subvert social media, and corrupt cloud services. The toxic tailoring of data responses through cache poisoning attacks essentially turns your cache into a highly effective exploit delivery system.
To actually poison web cache and deliver an exploit to all subsequent visitors, hackers must ensure that they send the first request to the homepage after the cached response expires. A crude means of achieving this is to use tools like Burp Intruder or a custom script to send large numbers of requests to the site’s repository. The more discerning attacker, however, may reverse engineer the cache expiry system to predict expiry times and monitor available data over a prolonged period. This is a bit of a commitment, even for the most determined hacker.
Unfortunately, many websites don’t require this level of commitment before they give up their data. Response headers often specify the age of their most-recent response plus its expiry date, meaning attackers know exactly when to send their payload to guarantee that their response payload caches.
In an ironic twist of fate, the much-beloved Mr. Robot TV show fell victim to a cache poisoning attack a while back. Embarrassingly, it took the intervention of whitehat hacker, Zemnmez, to highlight the vulnerability: a cross-site scripting (XSS) flaw with the potential to expose the user’s private Facebook information.
This ethical hacker actually had difficulty in alerting the show’s creators and website developers about the flaw and had to go to some lengths to bring it to their attention. Fortunately for Mr. Robot, unlike the show’s protagonist, plenty of hackers are out there trying to help shore up weaknesses in web defenses to prevent attacks like cache poisoning. This attack serves as evidence that even the (seemingly) most savvy among us can fall victim to this type of attack if they let their guard down.
Once thought too complex to be a real threat, we now know that web cache poisoning through unkeyed input exploitation is a reality, while DNS poisoning is a relatively easy means of accessing a website’s data pipeline.
But what can we do to protect ourselves? There’s no simple fix and disabling the caching mechanism in its entirety is not feasible for most of us, but there are options:
Cache poisoning is a rapidly evolving cash cow for hackers. While these preventative measures form a strong line of defense, it is vital that you keep your ear to the ground about cache poisoning. Be vigilant, be suspicious, and take the time to create robust fortifications to prevent hackers from weaponizing your cache.
It’s time for the red pill Mr. Anderson. If you know it’s toxic, don’t be slippin’ under!