×
Your report request could not be saved. Please try again.
You GPU Usage Custom Report Request is Being Processed

Request a Custom GPU Cloud Spend Report

Fill-out the form below to request a custom analysis report of your projected cloud gpu costs

Forecasting Your AWS GPU Cloud Spend for Deep Learning

Author: Josh Patterson
Date: 2/24/2021


In this post we look at how to forecast your AWS GPU spend for Deep Learning workloads. To do this we cover the following topics:

  1. How a typical user consumes GPU hours over the course of a month
  2. The types of users in an organization for GPUs
  3. General resource requirements for working with different types of models
  4. What kinds of GPUs the users typically train models on
  5. Building a spreadsheet model for how to forecast a monthly and yearly GPU spend

Why is this a compelling exercise?

Organizations are over-budget for their cloud spend by an average of 23-percent.

Beyond that an organization can easily waste 30% of their cloud spend if they aren't careful. This challenge is only getting bigger with organizations expecting to spend 47% more on the cloud in 2021. Organizations are struggling to control costs because they are spending more than they expect in many cases. Although cloud providers offer nice discounts for future commitments, these only work out if you use the resources you commit to in the future, making resource forecasting a useful topic.

In this post we focus on a specific type of workload to give you a better sense of how to more realistically forecast your GPU spend for Deep Learning training. We're going to focus purely on modeling GPU usage in this post. There are other services that a team may use but for the purposes of this analysis we are going to leave those AWS components (outside of SageMaker) out of the cost model for now.

How Does a Data Scientist Use Public Cloud GPU Hours?

The foundation of modeling GPU usage on the cloud is to document how our prototypical data scientist works. To profile a typical organization's data scientist's deep learning habits, we'll consider the following areas:

  1. What are the types of specific Deep Learning workloads they run and how many resources they require
  2. Note the different types of public cloud GPU options on AWS
  3. Understand any other tools on AWS, such as SageMaker, that may be relevant to building our cost model
This information will give us the required context needed to model how our organization's data scientists will consume AWS GPU resources.

Resource Requirements for Different Types of Deep Learning Workflows

We'll use Tim Dettmer's blog article on "Which GPU for Deep Learning" as a good starting place for both Deep Learning tasks and memory requirements per task type:

  • Using pretrained transformers; training small transformer from scratch >= 11GB
  • Training large transformer or convolutional nets in research / production: >= 24 GB
  • Prototyping neural networks (either transformer or convolutional nets) >= 10 GB
  • Kaggle competitions >= 8 GB
  • Applying computer vision >= 10GB
  • Neural networks for video: 24 GB
  • Reinforcement learning: 10GB (+ a strong deep learning desktop the largest Threadripper or EPYC CPU you can afford).

Many times (especially on twitter) the discussion around Deep Learning only focuses on the academic research aspect of training and development. For our case we're focused on the prototypical Fortune 500 company's data science group, so the workloads are going to be more diverse (read: not everyone is training GPT3).

For the purposes of modeling an enterprise data science group doing deep learning on AWS GPUs, we will split our users into two groups:

  1. Power Users
  2. Normal Users
We'll say that our "normal users" are running workloads that need from 8GB up to 16GB of GPU RAM (applying pre-trained networks and doing things in the "Kaggle"-class/jupyter notebooks).

Beyond that, we'll say that some of our "power users" may want to do things such as "train a new transformer" or train convolutional neural networks and that will require more than 24GB of GPU RAM in many cases.

A quick note on the relationship between GPUs and model accuracy below.

"We want to make a clear distinction around what we gain from GPUs, as many times this seems to be conflated in machine learning marketing. We define “predictive accuracy of model” as how accurate the model’s predictions are with respect to the task at hand (and there are multiple ways to rate models). Another facet of model training is “training speed,” which is how long it takes us to train for a number of epochs (passes over an entire input training dataset) or to a specific accuracy/metric. We call out these two specific definitions because we want to make sure you understand:

  1. GPUs will not make your model more accurate.
  2. However, GPUs will allow us to train a model faster and find better models (which may be more accurate) faster.
So GPUs can indirectly help model accuracy, but we still want to set expectations here.."

Oreilly's "Kubeflow Operations Guide", Patterson, Katzenellenbogen, Harris (2020)

Now that we've established our groups of users, what kinds of workloads they want to do, and also what some minimum GPU RAM requirements are for those workloads, we can move on and see what kind of AWS instances are offered that will support these workloads.

Understanding the Different Types of Public Cloud GPU Options

In this section we're going to document which GPUs are offered in which instance types on AWS (and which GPUs are not). Using this table of GPUs, we can then compare their RAM and relative power to best determine which are best suited for our enterprise deep learning team's use cases.

There are many different type of GPU chipsets offered by NVIDIA and sometimes it gets confusing as to which GPU is best for your workload. On top of that, AWS offers different GPUs in different instance types, so you need a map of instance type to GPU type as well. Additionally, NVIDIA won't allow certain types of GPUs to be used in a datacenter so this further filters down which GPUs are offered by AWS.

Here we're looking at the enterprise / data center as context for our GPU usage so we'll begin by listing the GPUs you won't find as public cloud instances (due to NVIDIA not rating them as suitable for data center use):

  • Titan V
  • Titan RTX
  • RTX 2080 Ti
  • RTX 2080
With those out of the way, we list below the NVIDIA GPUs you'll commonly find on most public clouds along with its associated AWS instance name: The GPUs above are listed in descending order of power with the new A100 ("Amphere") design being the most powerful GPU in the NVIDIA line today. Beyond that, we give the following notes on the relative power of each chipset.

  • M60s are considered virtual workstations for graphics work, should be considered last options for Deep Learning in this group
  • The K80 is the p2.xlarge on amazon and is roughly 1/5 as powerful as the V100
  • The T4 chipset (G4 AWS Instances) is focused on inference workloads
  • The V100 GPU is the same series as in the NVIDIA DGX-1 cluster
  • The A100 is only offered (as of today) in an 8xGPU configuration

Our regular users need from 8GB to 16GB of ram for their workloads, as described above, and they would either be using a T4 (g4 instance) or a V100 (16GB GPU RAM) as their main training GPU. A V100 is roughly 3x more powerful than a T4 GPU, for reference. Both p3 and g4 instances allow support for larger models (via more GPU RAM), high throughput, and low-latency access to CUDA. Both of these instances have support for mixed-precision training as well. Given that we can "just wait longer" in many cases for training, the gating mechanism for choosing an instance type ends up being "enough GPU memory for our use case".

Given that most users will go for more power if available, we imagine most of our users will opt for the more powerful p3.2xlarge instance but we'll cap them at a single GPU-profile to keep costs down.

Our power users need a bit more ram so we'll give them the p3.8xlarge instance with 64GB GPU RAM. In the next section we'll show the full table of p3 instances and their pricing. Let's now build up a hypothetical group of data scientists based on our profiles and calculate the team's GPU spend.

Building the GPU Spend Forecast

At this point in our process we know what workloads we are running and what kind of AWS GPU instances we're going to forecast our spend based on, so now we need to come up with a group of data scientists and how often they use GPU hours in the cloud.

Once we have our total monthly hours per group, we can forecast out our yearly AWS GPU spend.

Profiling Our Data Science Team

As we mentioned before we have two groups of users: regular users, and power users. For the purposes of this exercise let's say we have 10 regular users and then 5 power users in our overall data science group.

Model training workflows can take seconds up to days to run, depending on the complexity of the model, the amount of data, and other factors. The regular users' jobs will tend to take less time to train because they will be less complex.

The power users we've queried say some of their GPU-based training jobs can take multiple days, and then other times maybe a few hours. We will set their monthly budget of GPU hours at 900 GPU hours per month as they are training multiple models at the same time in some cases.

We'll set our regular users' GPU hours at 40% of the power users so that has them consuming 360 GPU hours per month.

It's worth mentioning that you are charged for hours the instances are allocated, not hours that we're actually training, so its easy for a team to accrue cloud usage hours faster than they think.

AWS G4 Instance Pricing

In the table below we show AWS' pricing for the g4 instances.

It's worth noting that all single GPU T4-based g4 instances have 16GB of GPU RAM as this is not completely clear from the g4 instance page.

Instance Size

GPUs - Tesla T4

---

GPU Memory (GB)

vCPUs

Memory (GB)

Network Bandwidth (Gbps)

EBS Bandwidth

On-Demand Price/hr*

1-yr Reserved Instance Effective Hourly*

3-yr Reserved Instance Effective Hourly*

Blended Price Avg(OnDemand and 1yr Reserved)

g4dn.xlarge 1 GPU 16 GiB 4 16 Up to 25 Up to 3.5 $0.526 $0.316 $0.210 $0.421
g4dn.2xlarge 1 GPU 16 GiB 8 32 Up to 25 Up to 3.5 $0.752 $0.452 $0.300 $0.602
g4dn.4xlarge 1 GPU 16 GiB 16 48 Up to 25 4.75 $1.204 $0.722 $0.482 $0.963
g4dn.8xlarge 1 GPU 16 GiB 32 128 50 9.5 $2.176 $1.306 $0.870 $1.741
g4dn.16xlarge 1 GPU 16 GiB 64 256 50 9.5 $4.352 $2.612 $1.740 $3.482
Multi-GPUs:
g4dn.12xlarge 4 GPU 64 GiB 48 192 50 9.5 $3.912 $2.348 $1.564 $3.130

Given the limited GPU memory ceiling (16GB outside of the g4dn.12xlarge) and the T4 being 1/3 as powerful as the V100, the g4 ends up being most appropriate for our "Kaggle"-users who are exploring notebooks or doing inference with pre-trained models.

AWS P3 Instance Pricing

In the table below we show AWS' pricing for the p3 instances.

Instance Size

GPUs - Tesla V100

GPU Peer to Peer

GPU Memory (GB)

vCPUs

Memory (GB)

Network Bandwidth

EBS Bandwidth

On-Demand Price/hr*

1-yr Reserved Instance Effective Hourly*

3-yr Reserved Instance Effective Hourly*

Blended Price Avg(OnDemand and 1yr Reserved)

p3.2xlarge

1

N/A

16

8

61

Up to 10 Gbps

1.5 Gbps

$3.06

$1.99

$1.05

$2.53

p3.8xlarge

4

NVLink

64

32

244

10 Gbps

7 Gbps

$12.24

$7.96

$4.19

$10.10

p3.16xlarge

8

NVLink

128

64

488

25 Gbps

14 Gbps

$24.48

$15.91

$8.39

$20.20

p3dn.24xlarge

8

NVLink

256

96

768

100 Gbps

19 Gbps

$31.22

$18.30

$9.64

$24.76


We've added a column at the end where we've averaged the price of On-Demand instance pricing and 1-Year Reserved Instances. For this cost model this is the hourly cost we use for instances because we want to account for the fact that some AWS resources will be used by good forecasting and 1-Year Reserved instances at the lower price. However, we also want to account for users who will end up using On-Demand instances, so we average the two prices together.

AWS EC2 and SageMaker

In some cases a data scientist may choose to simply send a container to a cloud instance and run it from the command-line. However, modern machine learning tooling uses dashboards, model metadata tracking, and other libraries.

AWS offers SageMaker as their machine learning platform in this case. AWS SageMaker is a development platform for machine learning practitioners that offers a fully-managed environment for building, training, and deploying machine learning at scale. SageMaker offers a Jupyter Notebook instance that allows the user to train a model and further use SageMaker’s API to drive other AWS services for tasks such as model deployment or hyperparameter tuning. All of the services are available via a high-level Python API in SageMaker.

Under the hood SageMaker uses Docker images and S3 storage for training and model inference. You can also provide a Docker image to SageMaker containing your machine learning training code in any framework or language you want to use.

AWS does not provide a clear pricing model for SageMaker as compared to traditional EC2 instances, so this makes it harder to forecast for pricing. For the purposes of this document and pricing model we assume SageMaker introduces a 40% increase in cost compared to running a regular EC2 instance.

At this point we have enough information to build our GPU cost forecast, so let's get to it.

Calculating the Forecast

For both groups, we take their projected GPU hours per month and calculate the GPU hours per year (Monthly GPU Hours x 12 months). For the power users we are allocating p3.8xlarge instances and we're using the averaged cost (On Demand and 1-Year Reserved averaged price) to calculate the yearly cost per user. From that we multiply the user cost by 1.4x to get the SageMaker overhead cost per year. We can see the math displayed in the spreadsheet shot below.



Once we have a cost per user per year, we multiply that number by the "Number of Users" in each group and add both of those numbers together.

This give us a yearly AWS cost for GPUs (with SageMaker) of $916,574.00.

If you want to run a quick variation on the above calculation, check out our online GPU cost calculator.

Other Considerations

Now that we've walked through the complete sizing exercise, we'll note a few other relevant topics that can impact your cloud spending decisions.

Some accounting departments will want a certain amount of spend to be a capital expense (CapEx) as opposed to an operational expenses (OpEx). Using cloud services is purely an operational expense, for reference. The reason accounting likes some of the expenses to be CapEx is that they can depreciate the hardware over time and help pay for the cost through reduced taxable income.

Another aspect of using cloud is how the billing is percieved. Previously your company dealt with many different enterprise vendors for different components. Once you started consolidating infrastructure services on public cloud, this began to consolidate cost into a single vendor's line item and accounting will note this increase for a single vendor as anomalous (and in some cases you are using more cloud services than anticipated). Combine this with the fact that you've moved a non-trivial amount of capital expenses over to the operational expenses column and this creates an interesting discussion with accounting. It's not good or bad, but it will change how accounting approaches things and there will likely be discussions to be had. From a pure cost perspective, these are aspects of using cloud services you'll need to consider as you find your way in the public cloud landscape.

The cost discussion is often intense, but many times IT will fire back "but the cloud is easier!"

In many cases this is true. For a real-time application we see the cloud services designed to be redundant, fault tolerant, and geographically-aware out of the box. In other cases, for workloads such as deep learning, many of these benefits are not as useful as you are looking for pure GPU horsepower in many cases.

Lastly we'll reiterate that you are charged for what you forget to turn off, and those GPU hours consumed can quickly pile up from notebook servers that are sitting idle. Another hidden cost is data that is transferred between AWS availability zones, where this is not a cost in an on-premise data center. We call these out to illustrate the need to be aware of the true cost of the agility of public cloud services.

Summary

As an AWS partner, we hope this article was useful for your organization, or just made you more aware about how to think about public cloud cost analysis.

Did you like the article? Do you think we are making a poor estimation somewhere in our calculations? Reach out and tell us.

Free GPU Cloud Spend Analysis

Would you like a customized cloud GPU spend analysis for your organization? Reach out and we’ll provide your company this GPU usage analysis report free of charge.