A new kind of protest: “Occupy Web Site”

My good friend Rafal Los (Wh1t3 Rabbit) and I have recently spent a good amount of time at HP Discover chatting about IT security, performance, and new stories from the real world of IT. Of course I had to share some new things I have learned from a few customer engagements – in particular, a new twist on Denial of Service attacks that you might not normally associate with a performance vulnerability. Here is the blog post we created together:
Many of the vulnerabilities we hear about in technology systems are exploited for the purpose of financial gain, competitive tactic, or simply for the challenge of doing it. One especially common vulnerability can cripple the infrastructure of your website from what’s called a denial of service or DOS attack, and a more sinister version called a distributed denial of service (DDOS) attack. In these attacks, your website is bombarded with SYN-flood or other low-level network activity that overloads the physical infrastructure of the system. Recent variants of this attack targeted faulty web server handling of requests, enabling even tiny, slow carefully crafted packets to completely stop a website's function. But there’s another type of DOS attack which exploits a vulnerability at higher levels of the architecture. The vulnerability is the persistent cart object of your e-commerce web application.
Recently during a testing engagement, a test engineer engaged in root-cause analysis on a strange anomaly in the website. They were observing an occasionally-blocking, long garbage collection in the app layer causing extended stop-the-world state on processing – in essence, “gracefully pausing” all activity on the system. The results of his investigation showed that there were a few persistent carts that stored more than 100,000 items in the cart. Here’s the logical root-cause of the performance issue:
  1. an individual persistent cart grows to 100,000 items (objects) in the cart object
  2. when an end-user opens the cart, the app server must re-populate the cart with those objects, loading them into the heap
  3. when the session ends after closing the client (aborting), those objects age on the server until they must be marked and cleaned-up in GC (garbage collection)
  4. when that GC time comes and you have enabled many threads for parallel GC (very common), then you hit an STW (stop-the-world) condition on the server
  5. no processing is allowed, everything is “paused” and end-user response times come to a stand-still
No real human being added 100,000 items to an online shopping cart, or at least not with any reasonable purpose. Of course if we we consider the Occupy movements across the globe, demonstrating and protesting against income inequality and inequitable policies around commerce and taxation, this persistent cart vulnerability could become a seemingly benign form of occupation that could develop into a serious threat:  Occupy Wall Street could become Occupy Web Site.
Most likely this anomaly was the result of some mechanical automation (and how hard is it for most of us to 'mechanically automate' 100,000 connections to a web server?), but it still showed us the impact of what could happen if someone (or several people) loaded up carts on a website but never checked out and never cleared them out. It would be a slow trickle of traffic over a longer period of time that would never be prevented by your IDS or Firewall protection.
Imagine you work for an online retailer that has a vested interest in doing well over the holiday season which is upon us. Imagine also that you're like many online retailers and make a large chunk of your yearly revenue over the brief, but extremely busy, holiday gift-buying rush. You and I can probably name at least 2 or 3 of these e-tailers right now...If you work for these e-tailers you know there is a "holiday freeze" that happens from the week before US Thanksgiving, until right after New Years' holiday...where no code changes are allowed to occur. OK, so combine those two things, with the fact that many companies rush out code for holiday promotions, site upgrades, etc right before the holiday rush and often forget (or neglect?) to test performance to the degree they should.  So how is this a problem?
Effectively, if one of your competitors wasn't as ethical as you are they could pay a couple of (unethical, black-hat) hackers a bit of money to make sure you don't have a good shopping season. Those hackers would then script up an attack -- using presumably one of the hundreds of millions of compromised computers world-wide -- to make sure your site and e-commerce system was completely offline during the times where your company needs to be making money and selling your wares. This attack will cost little,  but have a potentially devastating effect on your organization's yearly revenue...believe me this is not FUD.  From a Washington Post article "...November results offer an important benchmark for retailers and economists. During the holiday shopping season, merchants can make up to 40 percent of their annual revenue." (Washington Post, 12/1/11)

How to avoid or protect against this “Occupy Web Site” condition:
  1. limit the number of items that can be added to a cart
  2. sweep through persistent cart objects in the database and clean out any of them with over 100 items
  3. configure parallel GC appropriately, leaving enough threads open for normal processing to avoid STW
The original post on Raf’s website is here: http://h30499.www3.hp.com/t5/Following-the-White-Rabbit-A/OWWWS-The-Other-Form-of-Occupy-Occupy-World-Wide-Web-Site/ba-p/5408553
Note: please take note that references to the Occupy movement are made with full awareness of the fact that the majority of network and system administrators working in the trenches of a datacenter in the IT department are members of the 99%.