Episode #29 - Introduction to Amazon Web Services (AWS)

Loading the player...

About Episode - Duration: 9 minutes, Published: 2014-07-19

In this episode, I wanted to give you an introduction to Amazon Web Services (commonly referred to as AWS). AWS is one of the premier cloud providers, which is drastically changing the way many think about IT. This introductory episode lays the foundation for more advanced AWS episodes to come.

Download: mp4 or webm

Get notified about future content via the mailing list, follow @jweissig_ on Twitter for episode updates, or use the RSS feed.

Links, Code, and Transcript

In this episode, I wanted to give you an introduction to Amazon Web Services (commonly referred to as AWS). AWS is one of the premier cloud providers, which is drastically changing the way many think about IT. This introductory episode lays the foundation for more advanced AWS episodes to come.

So, what is “cloud computing” anyways? Well, on the AWS website, they refer to it as “the on-demand delivery of IT resources via the Internet with pay-as-you-go pricing”. So, what does that actually mean? Well, AWS will provide you with remote access to a nearly unlimited supply of storage, compute, and databases resources for a hourly price. AWS also provides a diverse set of services, things like memcache, queueing, and mail, just to name a few, on a pay-as-you-go model too. Massive on-line companies are using AWS to run their businesses for good reason. You can literally have your website hosted along side Netflix, Reddit, or Airbnb for example. These companies are using the exact same APIs and web interfaces as you.

The AWS website actually does a fantastic job of explaining what “cloud computing” is, so I am not going to go to crazy about that. But what I will do is review three examples of pain points in my career which would have become pleasures if AWS was used.

In this first example, I was the sysadmin for fast growing startup, where we typically had a mixture of hardware on-site, co-located in a data centre, and dedicated machines where the hardware was looked after by a hosting provider.

The company would send out email campaigns for most holidays, things like Valentines day, Mothers day, and Christmas. The result was a large influx of traffic, typically 30 to 40 times what we would normally see, which would dissipate over the following week. Logistically we would get the campaign schedule, and provision boxes by contacting dedicate hosting providers, companies like serverbeach, or the planet, now called softlayer, and request dedicated machines to augment our existing capacity. Typically, you would pay roughly a couple hundred dollars per box, for the month, and have the machine provisioned within a day or so. The key thing is, we would pay for the month, even though we only needed that machines for maybe a week, while we handled the extra traffic.

So, lets say we have campaign X and Y, and they have sharp spikes where the traffic picks up, and then dwindles over the following week. This dotted line marks the capacity you have on-line at all times to deal with minor traffic spikes. But to deal with these big email campaigns, we purchased extra capacity on a monthly basis, which would cover the spikes. There was some voodoo involved with estimating how large the spikes would actually be. Sometimes you are right, and something not. It can be costly on either side of the estimate, and with too little capacity, it will take days to provision new machines, and the traffic spike will be gone, or with too much capacity, you are blowing thousands of dollars on traffic that never arrived.

With AWS this problem is much easier to deal with as you can provision extra capacity through the AWS console using EC2 and have it online within minutes, or have auto scaling groups, so that extra capacity will come online without any user intervention, this ultimately leads to a graph that looks something like this. Where you can provision a bunch of capacity before a campaign, then within hours (due to AWS charging by the hour), start killing off machines, sames goes for the second spike. This leads to much less waste and a better return on your campaigns. Also, with auto scaling groups, you can easily deal with traffic spikes that come out of nowhere.

In the second example, I wanted to talk about content delivery networks. Going back almost 10 years, I was working for a very popular digital imaging company. We had a big product launch coming down the pipe, where Microsoft and CNET’s download.com were going to feature our product. Up until that point, we were doing software distribution through several ftp servers. The decision was make to use the Akamai Content Delivery Network, commonly referred to as a CDN to improve the download process. A CDN is basically the idea of caching frequently access content close to the end user. So if users in North America, Europe, or Asia, are all downloading the same content, it will typically be cached closer to them, to reduce the transit times, and ultimately improve user experience and download times. The process to configure the CDN back then, was as costly as it was complex, but ended in a huge success, and we used a large percentage of Akamais available capacity for those few days. I vividly remember being on a telecon with Akamai engineers working out how we would get access to their proprietary logs to view our download statistics. Back then, this was not a common practice.

Fast forward to today, and there are many CDN providers. Using AWS you can deploy your content to a CDN in minutes with just a couple clicks. In fact, you are actually watching this video served off the AWS CDN called Cloudfront. So, a viewers in North America, Europe, or Asia, will typically be hitting a cached copy which is hosted geographically close to them. This service is provided on a pay-as-you-go basis, and there is no Ops overhead on my side, I just configure it, and forget.

In this final example, I wanted to talk about how stressful it can be to purchase expensive hardware, in the hope that it will help alleviate growing pains. At a company I was working for, we were hitting the limits of our database servers, and we needed to move to better hardware quickly. The problem is, how much time will this new hardware buy you, and how much breathing room will it get you? You have limited options when the machines cost tens of thousands of dollars. What I am getting at is, testing an idea is expensive, risky, and total guess work. It is the last thing you want to be doing with tens of thousands of dollars. So, you spec out boxes, using some voodoo to predict the memory, cpu, and disk requirements, but in your gut, you do not actually know it will solve your problem, or by how much, without testing it first. So, in reality you are totally gambling. This is stressful. If this fails, its on you. Also, what about the delay between specing the machine, when it is ordered, and when it actually arrives. Move it down to the co-location facility. Rack it up, cabling, labelling, install the os, software, testing, migration, etc. All this takes time. It could be months or more before you have upgraded boxes in place. In this case, the IT is definitely slowing the company down!

Back in episode #27, we used an all SSD box, with 32 cpus, for several hours, to test some ideas. This machine likely cost many thousands of dollars, but we were able to rent it for about 7 bucks an hour. This is light years ahead of the situation I just told you about, where it was extremely stressful to test ideas. In the previous example, by the time you finally install the new database server, you are almost ready to start planning the next upgrade! With AWS you are free to test your ideas almost instantly, on beefy hardware, for an hourly price. If you use AWS to run your database servers you can upgrade hardware without contracts, Ops staff responding to hardware issues, racking and stacking, cabling or labelling. AWS allows you to focus on your applications and abstracts away the hardware and many of the issues you need to support that hardware. It also allows you to experiment quickly without massive stress.

Personally, this is why I love AWS as a sysadmin. Many of the hardware and capacity issues I have faced would have been dead simple with AWS. Given that there is no up front cost, contracts to sign, no large delays in provision new capacity, no headaches with long procurement cycles, or hardware life cycle issues like getting rid if hardware or replacing and troubleshoot existing hardware, and what is really nice, is that you do not need to support the infrastructure in a computer room, things like internet up links, power, cooling, physical security, etc.

If you are interested in trying out AWS, they have a Free Tier, which will allow you to explore the console and see what types of services they offer. This is extremely useful for getting an idea of how things work. In future episodes, I plan to talk about each service in detail, review common architectures, and look at how we can manage system operations inside the cloud through automation software.

I know there was not anything too technical in this episode, but I just wanted to lay the foundation for future episodes, where we will go into the technical details.