One of the things I love about working at Verisign is the amazing amount of technology that we have developed over the years. This technology wasn't created in a vacuum by some folks sitting around saying, "wouldn't it be cool to build this widget?" It was developed to solve real, internal problems.
The most famous of our internally developed platforms is the Advanced Transaction Look-up and Signaling Platform (ATLAS) resolution system that powers all of our Domain Name System (DNS) services (including .com, .net, Managed DNS, etc.). I will go into detail on the ATLAS platform in a future blog post, but in this post I'd like to focus on one of my favorite internally developed technologies: Athena, the custom distributed denial of service (DDoS) mitigation platform that powers the Verisign DDoS Protection Service.
As DDoS attacks have increased in size and complexity the task of protecting our Internet scale services has pushed the limits of commercial off-the-shelf (COTS) DDoS detection and mitigation gear. The latest round of attacks aren't "simple dumb attacks" - when I talk to analysts, customers, prospects, etc., I talk about how we have gone from an "OR" DDoS world, to an "AND" one.
Back in the day, DDoS attacks were either really big OR really complex (see a previous post where I break down what a complex DDoS attack is). What we have seen lately are attacks that are really big AND really complex. It takes more than just bandwidth and some commercial mitigation gear to block them, and the scale at which we operate exposed the limitations of relying on someone else's roadmap for our DDoS mitigation needs.
Because vendors need to satisfy a broad range of mitigation needs for enterprise customers, features that are critical to operating cloud services and protecting multiple customers simultaneously like Verisign does can often get pushed to the back of the priority queue. To avoid this, we built our own mitigation platform, completely developed and controlled in house by Verisign engineers, to enable us to be more agile in responding to new threats. Our DDoS mitigation experts sit right next to the engineers that build and develop the system, providing instant feedback on the types of attacks we are seeing and what problems we may have. This allows the engineers to quickly develop new features to target those requirements.
So What is Athena Exactly?
Athena is made up of three core components: the shield, proxy and load balancer. Let's break down what each component is, and how it works.
Many attacks we see are rather straightforward layer three and layer four attacks - i.e. simpler attacks that target network services at the User Datagram Protocol (UDP) and Transmission Control Protocol (TCP) levels. You might have heard some of the names like "SYN flood," "Smurf," "ICMP flood," etc. These attack types generally are just trying to starve the network layer functions below an application, like bandwidth or the ability for routers to route packets. Most of these attacks can be handled very simply at the network layer by doing some mitigation techniques like applying IP reputation lists that have millions of IP addresses, packet inspection to determine legitimacy, blacklisting, whitelisting, etc.
The Athena Shield is a wicked fast processing system that does our layer three and layer four filtering using the techniques described above (as well as many others). Athena Shield gets its speed from the massive amount of performance tuning we have done internally to do line-rate speed inspection and filtering of packets - we are now actually only limited by the speed of the various network cards on the market. Athena Shield can also inspect and filter on higher-level protocols across packet boundaries, dropping junk packets before they come anywhere near the back-end systems. Packets that pass the initial sniff test can either be forwarded directly or subjected to additional validation by the Athena Proxy.
Connection-oriented protocols like HTTP(S) can be difficult to defend because getting a full picture of the transaction requires interaction with the protected server. Athena Proxy stands in for that server in the initial stages of a transaction, allowing us to inspect and filter HTTP and HTTPS-level content. The request string, method, query parameters, headers and body content are all trivial to parse and inspect (if you have an Athena handy).
We often can spot anomalies in the header values (e.g. Content-Length doesn't match the actual content length of the request, etc.) and create the mitigation rules needed to block the bad traffic. We also have some cool features in the proxy that let us do challenge/response and other types of client verification. After the proxy inspects them and drops the bad ones, legitimate requests go back to the protected servers to be handled by their applications, but before that Athena has one more trick up its sleeve:
Athena Load Balancer
"You built your own load balancer? Are you crazy?" I hear that often from various folks externally when I talk about the third part of the Athena platform. When you handle an average of 70 billion DNS queries a day across the globe, and deal with the size and complexity of DDoS attacks that we see, it's not just DDoS mitigation gear that needs to be augmented. That was the case with the load balancers we deployed internally. The COTS gear could handle our needs under steady state or normal traffic times, but when a massive or really complex attack hit either our DNS or our DDoS platforms, the load-balancers were at risk of melting down under the load. As a result, we finally decided to build our own load balancing system that better fit with our stringent requirements, and Athena gave us the perfect platform to start with.
We have Athena Load Balancers deployed across our entire constellation protecting all of our Verisign services. The beauty of the Athena Load Balancer is that it does line-rate attack filtering right at the load balancer before requests ever touch any of the transaction services. This allows the Athena Proxy, Shield, ATLAS DNS platform or any of our applications to focus on really complex application-level attacks that are specific to that platform (i.e., DNS dictionary attacks can be easily handled by our ATLAS platform, so we let that platform handle the complex DNS attacks while allowing the Athena Load Balancer to filter out junk packets, SYN floods, non-DNS protocol packets, etc.), greatly increasing our resolution capacity.
Remember, your platform capacity isn't just reliant on the size of your bandwidth or ability to filter out layer three and four DDoS attacks, you actually have to ANSWER the DNS question from an end point (recursive in the midst of all of the chaos). The Athena Load Balancer handles all of our health check and internal routing protocol communication with our routers as well. Removing points of failure is critical in designing a highly resilient network platform, so the fewer devices we have in the path from the end user to the content we are serving, the better.
Conclusion and What's Next
Where are we going from an R&D perspective moving forward? Well, attacks aren't going away, and they certainly aren't getting smaller or less complex, so we need to make sure we continue to innovate on the mitigation side. In addition, we are constantly keeping an eye on new technologies and specifications that customers are using. The explosion of mobile, the next version of HTTP, and the custom protocols customers are developing themselves are all things our engineers and product folks are keeping a close eye on.
We (and the industry as a whole) are also focusing on how to enable greater detection of different attacks, i.e., how can we better spot an attack in all of the noise to start the mitigation process sooner? I will save that topic for another time and another blog post. Keep an eye out for my next post on the ATLAS platform. It's not just for DNS anymore!