Cost of Machine Learning
Exactly how much does deep learning cost? And are those prices fixed, or can they be optimized? Let me compare some cloud hardware and get down to dollars and cents to uncover some answers.
The ultimate goal of deep learning is to create a good model. This requires a lot of data, patience, and some luck. Basically, what we are doing is runing many experiments on the training data. The amount of training data for a task is pretty much fixed. The number of experiments depends on how many different hyperparameters you want to try.
When talking about cost, what really matters is the cost of training one epoch. This is the cost of running my training algorithm on every example in my training set exactly once.
As defined, the cost of one epoch is very problem specific. It is also model specific. Thus, when we talk about the cost,
it is very specific to the problem we are solving. Each problem, therefore, needs its own cost analysis to be performed, and the cheapest platform for
model1 may not be the cheapest for
I will use my TensorFlow port of LSTM CNN language model to run benchmarks because I am familiar with this model and it is similar to the problem I am currently solving.
This model needs 25 epochs to fully train. It will take anywhere between 1 to 20 hours, depending on the hardware.
I am not patient enough to re-run the complete training. Instead, I will only time the first epoch.
Using the following AWS EC2 machines (prices are quoted as of Nov 2016):
c4.8xlarge(32 CPUs, no GPUs) at $1.675 per hour
g2.2xlarge(8 CPUs, one K520 GPU) at $0.65/hour
p2.xlarge(4 CPUs, one K80 GPU) at $0.90/hour
Training time and cost for one epoch for
p2.xlarge is six times less expensive that the next contender, even though it is not the cheapest machine
to rent (per hour).
Utilization, Utilization, Utilization
Can we do better? Sure!
The default training parameters were selected assuming 4Gb of GPU memory. By changing
parameter, we change the amount of memory required to perform that training.
Increasing batch size proportionally decreases the number of batches in the epoch, while processing time per batch should stay roughly unchanged.
Ideally, we choose a batch size that will use 100% of GPU memory.
Here is an extension to the table above:
So, by choosing the correct hardware, and tweaking the trainig procedure to make full use of this hardware, we reduced the cost of training from 60 cents/epoch to 4 cents/epoch.
This should make the accounting department happy :)