The Fallacies of Distributed Computing Reborn - The Cloud Era
This post was originally written for New Relic in January, 2011
Seventeen years ago, long before Sun Microsystems was putting the “dot in dot-com”, L. Peter Deutsch, then a Fellow at Sun, wrote The Fallacies of Distributed Computing, a classic and often quoted list of assumptions that programmers may falsely believe when building distributed systems.
The fallacies are:
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn't change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
TL;DR
The Fallacies of Distributed Computing applies directly to web applications, and is especially important when developing applications hosted in the cloud.
What is a Distributed System?
A collection of independent computers that appears to its users as a single coherent system
- Tanenbaum and Steen: Distributed Systems: Principles and Paradigms
You know you have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done.
- Leslie Lamport : Security Engineering: A Guide to Building Dependable Distributed Systems
A distributed system is an application that executes a collection of protocols to coordinate the actions of multiple processes on a network, such that all components cooperate together to perform a single or small set of related tasks.
- Introduction to Distributed System Design - Google Code University
Web Apps Are Distributed Systems
Web applications are composed of multiple systems working closely together. Proxy servers, cache servers, load balancers, web servers, application servers and databases are all likely to be involved in processing a single web request. Many web applications also leverage other web-APIs. An application may show your friends on Facebook, allow you to post to Twitter, or store some files on Amazon’s S3 storage service.
Web Apps in The Cloud
Ignore the fallacies at your own peril. For a little while, you might pretend that some of the fallacies don’t apply to you when you host applications in datacenters with your own hardware. Hosting applications in the cloud provides no such grace period. Failure will happen. Hosting in the cloud, while being the easiest way to scale a web application, means facing up to these fallacies immediately.
When Netflix moved to the cloud, they were quick to realize that they could not ignore the fallacies:
“I knew to expect higher rates of individual instance failure in AWS, but I hadn’t thought through some of these sorts of implications.”
“AWS networking has more variable latency. We’ve had to be much more structured about “over the wire” interactions, even as we’ve transitioned to a more highly distributed architecture.”
“Co-tenancy can introduce variance in throughput at any level of the stack. You’ve got to either be willing to abandon any specific subtask, or manage your resources within AWS to avoid co-tenancy where you must.”
“We’re designing each distributed system to expect and tolerate failure from other systems on which it depends.”
- John Ciancutti, Vice President of Personalization Technology at Netflix, Inc
The Cloud vs. The Fallacies
The Fallacies of Distributed Computing are even more important to consider when building web applications in the cloud. While it seems obvious that the cloud highlights these fallacies more than ever (as Netflix noted recently), not everyone agrees. Tim Bray is currently an Android Developer Advocate at Google, former Director of Web Technologies at Sun and editor of the W3C XML specification. I was surprised to find a post from 2009, The Web vs. The Fallacies, where Tim suggests that the Fallacies are much less important to systems built on the web. I was even more surprised that he still agrees with that assertion when I asked him. Let’s see if I can change Tim’s mind.
Disputing The Fallacies: The Cloud Era
1. The network is reliable
When you consume web APIs from Facebook, Twitter or Amazon, “the network” is the internet. How reliable, within a given performance window, is the aggregate of all the web services you interact with? Tim Bray suggests “if a GET
gets a network blowup, just do it again”. Sure, we all do that. If a page loads really slowly and appears to hang, we just refresh. Do you do that with each of your services and API calls?
The network is not reliable. When you add up all the points of failure for all the various services and APIs you use, the odds of failure are not just high, they are a given. Instead, build an app that can function at reduced capacity when a given service is offline.
2. Latency is zero
Try as we might, there just isn’t a way to improve the speed of light. Serving customers half-way around the globe affects performance in big ways. Moving services closer to your customers via cloud availability zones and content delivery networks can help tremendously. Ignore latency online and you’re likely throwing away customers from half the globe.
3. Bandwidth is infinite
Not long ago everyone had high-speed internet access at home. (Not everyone, of course. The digital divide is real.) Wireless hotspots filled every home and apartment and the web felt fast. A few years ago this started to change. Starting with the iPhone, web applications were being used not just over high-speed internet connections, but over phone networks. Using web apps on mobile devices and tablets reminds us that bandwidth problems did not disappear with the dial-up modem. Stay mindful of how much data you’re shipping across the wire and all your users will be grateful. For a great read on how The Fallacies apply to mobile development, check out this great series of posts by our friends at Carbon Five on iPhone Distributed Computing Fallacies.
4. The network is secure
Network security can be a full-time job for a prominent web business. Transport layer security is great, except when we leak security tokens over insecure connections. Firesheep brought some necessary attention last year to the prevalence of private security tokens littering insecure connections, ripe for snooping by the guy in the corner of your favorite coffee shop sipping chai. The network is not secure, and the more we move our lives online the higher the risk of exposure. Keep a security mindset when developing your apps and keep your users safe.
5. Topology doesn’t change
One of the biggest benefits of moving applications to the cloud is the ability to change topology at will. Add new app servers. Upgrade (and possibly change the architecture of) the CPU on your database server in the middle of the day. Add reverse proxy servers, cache servers, change CDN providers, migrate availability zones. Topology changes all the time. Depending on a static infrastructure design not only limits your ability to respond to change, it increases the likelihood of site-wide outages.
6. There is one administrator
Of course there isn’t just one administrator. Even with applications hosted in your own private datacenter, your applications are likely interacting with systems outside your administrative control. These systems may have performance, availability or security issues that you have no direct influence over. Staying mindful that these systems are beyond your control can help you ensure they have minimal impact on your services when they are unresponsive.
7. Transport cost is zero
Not only is transport cost not zero, it’s priced like any other commodity and can be purchased per transaction, per gigabyte, per compute hour, etc. Cloud storage costs (and transport of that storage) are a major component of application hosting costs, just ask anyone doing video streaming online how close to zero their transport costs are.
8. The network is homogeneous
This fallacy was added to the original seven by James Gosling, creator of Java, in 1997. The best part of this fallacy is that it might actually be true. Almost. Between REST and JSON APIs, have you ever had to think about the implementation of an external service, beyond satisfying your curiosity? You’re much more likely to see libraries producing invalid messages than you are operating systems playing a role.
Building performant distributed web applications is hard
The Fallacies were coined 17 years ago, but they apply just as well today. Back then folks might have been fooled into actually believing them, but we’re smarter than that, right? Knowing what gotchas await our web applications, most notably those hosted in the cloud, can help us build better, faster and more performant applications. Isn’t that what we all want?