How to look good and save money while setting up scaling policies in AWS

Setting up scaling can be dead simple or heavily complicated, chances are that if you’ve agreed with any of those statements, you are doing something wrong!
Scaling an application effectively has many facets and it usually starts with the application itself, a devops or a developer can’t do this alone. In this post I’m going to assume you have scaling in place, instead I’m going to talk about how (and where) to leverage AWS spot instances in your scaling strategy to scale effectively while reducing risk to near zero using spot instances.

Let’s get some basic facts first.

What are spot instances?

Spot instances is a way for AWS to use their excess capacity, if they do not have enough people running on-demand or reserved instances at any given point in time, their hardware is not being used (or billed) and is going to waste. AWS (and google cloud for that matter) hence offer a bid based instance pricing strategy, this means that you define a “max amount” that you are willing to pay for the instance and if the price of the spot instance for the instance type goes above that limit, the server goes away. You always get the server at the published rate irrespective of the max limit that you define. In my experience the average spot rate for any instance throughout the month is at least 50% cheaper than on-demand rates (sometimes even cheaper) and are hence still significantly cheaper than full upfront 3 year reserved instances without any long term commitment or upfront investments.

The catch though in spot instances is the pricing strategy itself, by its very nature these instances can go away at any point typically whenever there is a sudden high demand for instances, spot or otherwise. Applications thus need to be stateless (or store state in a central infrastructure like DynamoDB/Redis etc.) in order to handle volatility of instances. The latter is a pre-requisite to any good scaling strategy and I assume you already have this with the regular on-demand scaling that you may be doing. In order to make sure that existing requests do not get severed as the instances are going down (spot or otherwise) you should use ELB’s connection draining.

What is Connection Draining?

Connection draining is an ELB (AWS load balancer service) feature. It allows servers which are going to be terminated because of an auto scale group action or in case it has become unhealthy, to retain its original connections, at this time the ELB stops sending it new requests, this in turn “drains” the connections out from an outgoing server. You can go upto 60 minutes for connection draining, however do note that you shouldn’t expect more than two minutes for spot instances for the same.

A Typical Scaling Strategy

A typical scaling strategy is to find the metric of pivot (I just coined this term) for the application, record it in relatively high resolution to take better decisions (generally 1 metric/min is enough) and then select points at which scale up and down should happen. As an example, if the application is CPU intensive i.e. as load increases on the application the RAM usage saturates/ server has enough, however the CPU increases with the load (linearly ideally), you can set 70% CPU for X minutes as the threshold for scaling up (via cloudwatch alarms) and something like less than 40% for X minutes as the lower threshold for scaling down. Typically, a minimum of 2 servers are kept as minimum spread across two availability zones in order to keep redundancy for 24×7 uptime guarantee. These servers are generally a mixture of on-demand and reserved instances, generally the minimum at which your application runs at least load during the day are reserved for a year to lower costs, as reserved get charged irrespective of usage.

Hybrid Scaling Strategy with Spot, Reserved and on-demand Instances

Coming to the point, the way to save money in cloud is to actually have a hybrid scaling strategy where a typical scaling strategy is in place (like one stated above) with a auto scale group, with reserved instances for whatever minimum are run during a typical day, combined with another auto scaling group which produces spot instances in a predictable fashion.

There are two ways of doing this, one which is described in an AWS blog here: https://aws.amazon.com/blogs/compute/an-ec2-spot-architecture-for-web-applications

The blog mentions putting different state of metrics for both auto scaling groups such as (my example) 70% / 30% for scale up/down for spot auto scaling group and for on-demand something like 80%/40% so that maximum number of spot remain in existence and in case spot server prices go above the threshold (typically set equal to on-demand price) the CPU will go up to the higher limit causing the on-demand auto scale group to kick in and add instances.

I feel that the above strategy works only if the variance of load is low, in our case the load varies from 1330 requests / min to 6300 requests / min in which the stated strategy is dangerous. The latter will fail when spot prices go up as now the on-demand instances will grow slower than what I want and may cause application experience to deteriorate due to increased latencies.

What has worked for me is to have a time based spot instance schedule according to the load pattern where we add spot servers in any given hour according to the load that we experience which would cause CPU to remain low and will prevent the on-demand auto scale group to scale up, when spot instance prices go up and we lose them, on-demand group will kick in and scale with ease.

There are two problems with scaling with spot instances:

Manual Adjustments of Spot Instance Schedules: Applicable to time based spot instance strategy.
The Flash Sale Predicament: Applicable to both metric and time based strategy for spot instances.

The Flash Sale Predicament

Flash sale is when the instance prices suddenly go up and all spots shut down, practically this has happened to us when AWS is experiencing issues in their EC2 service and they raise the prices of spots to impractical levels to shut them down to make way for on-demand and reserved instances.

When this happens imagine that you were running a fleet of 20 web servers at the time with 18 spot and 2 on-demand, when all 18 go down at once all your traffic will go to those 2 servers, generally no matter how well your application can throttle requests an increase like this is likely to make the application unresponsive making your instance unhealthy almost immediately. Your auto scale strategy will add servers slowly (remember, add Y instances if CPU > 70% for X minutes?), it will add the instances only every X minutes, but what happens if the added servers also go unresponsive in the time more are added? A Recurring loop of disaster!

To handle the flash sale predicament it is wise to put a fail-safe to scale up much faster in case the CPU goes berserk like add 6 instances when CPU = 100% for 3 minutes or similar, the specific strategy will depend upon the nature of load and the application though.

Manual Adjustments of Spot Instance Schedules

This is an issue introduced by the time based spot instance strategy and is probably why AWS dpes not recommend this approach. The basic cloud mantra is to do everything automatically, however if the load changes frequently for your application, manual intervention will be required time to time to adjust the auto scale schedules to add the appropriate number of spot instances at any given hour, and of course manual adjustments can only be so accurate all the time.

As I faced this problem and being unable to accept the AWS recommended approach for a stricter on-demand scaling policy (as its too risky for me) I am making an open source project to measure past metrics and predict a spot schedule for the day, you can check it out here: https://github.com/maingi4/spotOptimizer , check out its wiki to get you started here: https://github.com/maingi4/spotOptimizer/wiki, I am doing this for my company but they allowed me to make it open source.

Which Applications are fit for Spot Servers?

Typically web servers (UI or API) are ideal candidates provided they are stateless or store state in a central redundant store such as DynamoDB or Redis or whatever. Databases are not fit, even if you have NoSQL databases having data sharded across several servers with a good replication factor, you should steer clear of it, whenever a DB server goes out of the cluster, your servers will need to re-balance the data causing heavy load on your databases and you really don’t want a whole bunch of DB servers going down at once, who knows if all the servers actively or passively carrying a particular shard of data were all coincidentally spot instances.