Using The TensorFlow Estimator Design Pattern
Author: Josh Patterson
Date: April 19th, 2019
This tutorial covers the basics of how to use TensorFlow's Estimator API to write modeling code that will run in a consistent fashion in multiple execution modes. In a previous article we looked at how to run a pre-built TensorFlow program in distributed mode on Kubeflow. However, the TensorFlow code itself was rather complex as it had the user dealing with all sorts of things beyond focusing on the model training itself. In this tutorial the reader will learn:
- What is the Estimator API?
- Why is it relevant?
- Sample code to model with Iris dataset locally with an Estimator
Lets jump right into "What is the Estimator API?"
Introduction to TensorFlow's Estimator API
For newer readers who aren't familiar with the landscape of machine learning tooling, we'll start off by defining TensorFlow:
"TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications."
Building on that definition, the TensorFlow Estimator API
is a high-level TensorFlow API that makes machine learning programming easier when dealing with different execution modes (e.g., "local", "distributed"). Historically TensorFlow coding has involved a lot of low-level details such as placing specific operations on specific GPUs. Estimators make sharing implementations of models easier
to share between data scientists. Another aspect of Estimators is that they build the TensorFlow graph for you and there is no explicit Session. Many data scientists do not want to have to deal with these type details, so Estimators make things considerably simpler. Ultimately it allows the user to get consistent results regardless if they are executing locally or in the cloud in distributed mode.
Estimators provide a standard way to deal with the following actions:
- export for serving
There are many pre-built Estimators already in the TensorFlow library, but you may write your own custom
Estimator as well. Any
Estimator, built-in or a custom one we create, will be based on the
Keras Support with Estimators
TensorFlow now supports converting any Keras model into an
Estimator, speeding up your model development.
This is done by defining a model with
tf.keras.Model and then converting the model to an
tf.estimator.Estimator object with the
tf.keras.estimator.model_to_estimator() method. Once we've got an
Estimator representation of the Keras model, we can train the model in the same way we'd train any
Deprecation of the Experiment Class
Previously we might have used the Experiment class for building TensorFlow training code, but at this point the Experiment class has been marked deprecated. The Estimator class should now be directly used in place of where we'd have used the Experiment class, and it appears to be a better design pattern as well.
Writing TensorFlow Code with Estimators
The primary steps necessary to write TensorFlow training code with Estimators are:
- Build your Estimator model or use a pre-built one already in the TensorFlow library
- Define how data is fed into the model for both training and test datasets (often these functions are setup the same)
- Define training and evaluation specifications (
EvalSpec, respectively) to be passed to
can also include information on how to export your trained model for prediction (serving).
.train_and_evaluate(...) method provides a consistent interaface for training locally or in the cloud, non-distributed or in a distributed-fashion. Check out the TensorFlow documentation for more details. In the next section we show an example of the Estimator API in practice.
Example TensorFlow Code for Modeling the Iris Dataset
We include below a basic TensorFlow Estimator API example code listing. This Estimator API example models the canoncical Iris dataset. While this example is not a complex deep learning model, the Iris dataset is simple and well-understood. The example below allows us to see the Estimator API in action without dealing with the distractions of a more complex model.
In the sections below, we provide commentary on the following areas of the code from above:
- Loading the initial dataset, creating feature columns
- Configuring our TensorFlow job with the
- Setting up an
Estimator with the
- Setting up the
- Setting up the
- Training the model
Loading the Iris Dataset
We use the include iris_data.py util functions to download and load the Iris dataset for us locally, as seen in the code snippet below (from the program listing above):
# Fetch the data
(train_x, train_y), (test_x, test_y) = iris_data.load_data()
# Feature columns describe how to use the input.
my_feature_columns = 
for key in train_x.keys():
The iris_data.py utilities do a few things under the hood:
- download the dataset
- read the CSV datasets with Pandas into dataframes
- creates dataset features and labels
- creates separate train and test datasets
Once we have our data, we can move on and configure our modeling job with the
Configuration with RunConfig
The RunConfig class is part of the Estimator API in TensorFlow and it specifies the configurations for an
We can see the
RunConfig class in action in our code in the snippet below:
config = tf.estimator.RunConfig(
This configuration does a few things for us, but primarily sets things like the directory where we'll save our model checkpoints (e.g.,
model_dir="/tmp/tf_estimator_iris_model") and then how often to checkpoint our model (e.g.,
save_checkpoints_steps=100). All of the properties for the RunConfig class are listed in the documentation, which we show the init function here for:
There are other properties that can be set for
RunConfig, but we'll save those for a future article. Now that we've looked at how to configure our job, let's now move on to how to create the
Setting up an Estimator with the DNNClassifier
For this example we're going to build a small multi-layer perceptron neural network to model our Iris dataset.
# Build 2 hidden layer DNN with 10, 10 units respectively.
estimator = tf.estimator.DNNClassifier(
# Two hidden layers of 10 nodes each.
# The model must choose between 3 classes.
In the code section above, we can see the
DNNClassifier class being instantiated defining our feature columns, where we'll train the model, and then the size of the hidden layers (10 nodes per layer). Finally we tell TensorFlow that we want the model to give us an output based on 3 classes. Let's now move on to setting up the
TrainSpec for the Estimator.
TrainSpec and Data Input
train_input_fn = lambda:iris_data.train_input_fn(train_x, train_y,
train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn,
In the code section above we can see two lines. The first line builds a
train_input_fn with the
lambda keyword in python based on the
train_input_fn(...) method in our iris_utils.py utilities.
The second line takes the
train_input_fn we created and passes it to the
TrainSpec class in the Estimator API. This class instance will be used by the Estimator when we train the model in a moment.
Oftentimes we will not have a pre-made input function for our Estimator.
To write a custom input function for TrainSpec in Estimators, check out the TensorFlow documentation on datasets for Estimators. Now we'll move on to evaluation with Estimators.
EvalSpec and Model Evaluation
# Evaluate the model.
eval_input_fn = lambda:iris_data.eval_input_fn(test_x, test_y,
eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_fn,
EvalSpec uses the lambda keyword in Python to create an input function that is then used directly as a parameter in the
Training Method for Estimators
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
Lastly, we'll highlight how it all comes together for training an
Estimator. We use the
EvalSpec as parameters to the tf.estimator.train_and_evaluate( ... ) method. Veterans of TensorFlow programming instantly recognize that this function encapsulates a number of things for training runs and is far easier to use consistently for data scientists. Another interesting aspect of this method is how it can train models in a distributed fashion for TensorFlow without changing the training code.
Running Our Estimator TensorFlow Program
To run this example application, first we need to pull the code down from github:
git clone https://github.com/pattersonconsulting/tensorflow_estimator_examples.git
Next, change into the new project directory that was just created:
The user needs to account for dependency management in running any Python program. The two common options are:
Once you have at least TensorFlow 1.12.0 installed locally, you should be able to run the example with the command:
The console output should look similar to the output shown below:
INFO:tensorflow:loss = 10.430826, step = 401 (0.862 sec)
INFO:tensorflow:loss = 6.974675, step = 501 (0.675 sec)
INFO:tensorflow:loss = 5.9006176, step = 601 (0.621 sec)
INFO:tensorflow:loss = 3.3194108, step = 701 (0.654 sec)
INFO:tensorflow:loss = 7.634383, step = 801 (0.596 sec)
INFO:tensorflow:loss = 5.5086565, step = 901 (0.568 sec)
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/tf_estimator_iris_model/model.ckpt.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-04-18-19:00:17
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tf_estimator_iris_model/model.ckpt-1000
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-04-18-19:00:18
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.93333334, average_loss = 0.06142591, global_step = 1000, loss = 1.8427773
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1000: /tmp/tf_estimator_iris_model/model.ckpt-1000
INFO:tensorflow:Loss for final step: 2.909092.
In this example we walked through the new Estimator API for TensorFlow and highlighted some of its core concepts. We hope the reader enjoyed the walkthrough. In future articles we'll take a look at how this example can be extended to distributed TensorFlow and then further executed on systems such as Kubeflow for on-premise/cloud/hybrid operations.
If you'd like further help in topics such as:
- General machine learning education
- Advanced deep learning modeling
- Enterprise machine learning infrastructure
please reach out to us and say hello