Setting Up Kubeflow 3.5 on Google Kubernetes Engine

Date: February 25th, 2019

This tutorial is on how to get Kubeflow 3.5 installed and working on GKE in under 30 minutes. For a review on why Kubeflow is compelling in the current machine leanring infrastructure landscape, check out our report.

Tasks in this tutorial:

Get the latest python
Setup GCloud Acccount
Setup GCloud SDK
Install kubernetes cluster running on GKE
Install Kubectl locally
Install Ksonnet locally
Setup github token setup
Use Ksonnet to deploy Kubeflow to our cluster on GCP

Sign up for Google Cloud Platform

If you dont already have an account on Google Cloud Platform (GCloud), you can sign up for a free trial.

Installing Google Cloud SDK

Before you install Google Cloud's SDK, make sure and upgrade python to the latest version to avoid issues (e.g., SSL issues). After you have python updated, take a look at the instructions for getting the GCloud SDK working.

For Mac OSX users, a simple way to do this is to use the interactive installer.

Once you have the GCloud SDK working, log into your GCloud account from the command line with the gcloud auth tool:

gcloud auth login

This will pop up some screens in your web browser asking for permission via OAuth for GCloud tools to access the Google Cloud Platform.

Standing Up a Kubernetes Cluster with GKE

make sure we have enabled the Google Kubernetes API: https://console.cloud.google.com/apis/library/container.googleapis.com

We need a project on Google Cloud Platform to organize our project resources inside. To create a project on GCloud, follow the instructions in the video embedded below:

We also will need to enable the Kubernetes Engine API on GCP.

Once you have the project created, check to make sure it shows up from the command line with the command:

gcloud projects list

This command lists all of the projects we have in our account on the google cloud platform. The output should be similar to below:

PROJECT_ID NAME PROJECT_NUMBER kubeflow-3-5-project kubeflow-3-5-project 781217520374 kubeflow-codelab-229018 kubeflow-codelab 919126119217

Now we need to create a Kubernetes cluster for our project on Google Cloud Platform. First, we need to set our current working project from the command line so we'll use the command as shown below:

PROJECT_ID=kubeflow-3-5-project
gcloud config set project $PROJECT_ID

gcloud container clusters create [your-cluster-name-here] \
      --zone us-central1-a --machine-type n1-standard-2

Note that the name of the project and the project ID may not be exactly the same, so be careful. Most of the time we want to use the Project_ID of our project when working from the command line. It may take 3-5 minutes for the system to complete the kubernetes cluster setup on GCP.

Installing Kubectl

kubectl controls the Kubernetes cluster manager and is a command line interface for running commands against Kubernetes clusters. We use kubectl to deploy and manage applications on Kubernetes. Using kubectl, we can

inspect cluster resources
create components
delete components
update components

For a more complete list of functions in kubectl, check out this cheatsheet.

An easy way to install kubectl on OSX is to use the brew command:

brew install kubernetes-cli

Once we have kubectl, we need permission for it to talk to our remote managed kubernetes cluster on GCP. We get the credentials for our new kubernetes cluster with the command:

gcloud container clusters get-credentials kubeflow-codelab --zone us-central1-a

This command writes a context into our local ~/.kube/context file so kubectl knows where to look for the current cluster we're working with. In some cases, you will be working with multiple clusters, and their context information will also be stored in this file.

Once we can connect to our kubernetes cluster with kubectl, we can check out the status of the running cluster with the command:

kubectl cluster-info

We should see output similar to below:

Kubernetes master is running at https://31.239.115.73 GLBCDefaultBackend is running at https://31.239.115.73/api/v1/namespaces/kube-system/services/default-http-backend:http/proxy Heapster is running at https://31.239.115.73/api/v1/namespaces/kube-system/services/heapster/proxy KubeDNS is running at https://31.239.115.73/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy Metrics-server is running at https://31.239.115.73/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Installing Ksonnet

Ksonnet is a CLI-supported framework for extensible Kubernetes configurations. Ksonnet provides an organizational structure and specialized features for managing configurations across different clusters and environments. For this demo we'll use Ksonnet (the ks command) to install a specific version of Kubeflow (v3.5) and deploy it to our new managed GKE cluster. Let's take a quick look at some key terminology used with ksonnet applications.

A Ksonnet environment is a unique location to deploy our application to. It consists of:

a unique name
the kubernetes cluster address
the cluster's namespace
the kubernetes API version

You can customize different cluster deployments with environments.

A Ksonnet prototype is an object that describes a set of kubernetes resources in an abstract way. This object also includes associated parameters for these resources. An example of this is how Kubeflow has a prototype for tf-job-operator as we'll see later in this article.

A Ksonnet component is a specific implementation of a prototype. We create a component by "filling in" the parameters for the prototype. A prototype can be deployed to a kubernetes cluster and can also directly generate standard Kubernetes YAML files. Each environment may be customized with different parameters for a prototype.

To install Ksonnet on OSX easily just use brew with the command:

brew install ks

This will get ksonnet install locally, which will pull code from github and use kubectl to deploy the application code (Kubeflow) to our GKE cluster.

Setting Up a Github Auth Token

We need to use a GitHub personal access token with ksonnet otherwise we quickly run into GitHub API limits. We need to create a personal access token and use it in place of a password when ksonnet is performing operations over HTTPS with Git on the command line against github's API. Once you have logged in and created the token, we set it as an env variable with the command:

export GITHUB_TOKEN=ece2d65f0070abf00283f000460fc10952a87a2

Now we are ready to use Ksonnet and deploy Kubeflow to our cluster.

CLI Install Steps for Kubeflow v3.5

Use Ksonnet's ks command to initalize your new kubernetes application.

ks init [app-name]
cd [app-name]

The output on the screen should look similar to what you see below.

INFO Using context "gke_kubeflow-3-5-project_us-central1-a_kf-3-5-k8s" from kubeconfig file "/Users/josh/.kube/config" INFO Creating environment "default" with namespace "default", pointing to "version:v1.10.11" cluster at address "https://35.239.115.73" INFO Generating ksonnet-lib data at path '/Users/josh/Documents/workspace/PattersonConsulting/kubeflow_gke_test/jp-gke-kf-3-5/lib/ksonnet-lib/v1.10.11'

As we can see above, ksonnet found our GKE cluster context in our kubeconfig file and was able to configure our new ksonnet application to use it. Ksonnet also initialized our application directory with some template code. Now we need to customize this application code with the Kubeflow codebase from github.

At this point we need to add Kubeflow to our custom ksonnet project. To do this we nee add the Kubeflow repository to our project and then we pull the individual kubeflow packages into our local project. Specifically here we need to add kubeflow 3.5 to the ksonnet registry so it knows here to look to download the code. We do this with the following comand:

ks registry add kubeflow github.com/kubeflow/kubeflow/tree/v0.3.5/kubeflow

Next we want to install packages from this github repository in our local ksonnet application. We want to work with a specific version of Kubeflow for this tutorial, so we'll specify the v0.3.5 version of Kubeflow with the command below.

ks pkg install kubeflow/core@v0.3.5

The console won't report much, but you should see the following output:

INFO Retrieved 38 files

We should now be able to see the installed package in our ./app.yaml file and also in our ./vendor directory.

Now we need to generate the components for the application from the package we installed from the v0.3.5 Kubeflow codebase on github. We generate components with the following commands:

ks generate ambassador ambassador
ks generate jupyterhub jupyterhub
ks generate centraldashboard centraldashboard
ks generate tf-job-operator tf-job-operator

This components will give us a minimal install of Kubeflow. The console output for of the commands above will look similar to what we see below.

INFO Writing component at '/Users/josh/Documents/workspace/PattersonConsulting/kubeflow_gke_test/jp-gke-kf-3-5/components/ambassador.jsonnet'

Each of these components was installed in our custom Ksonnet application in our local directory.

Finally, we want to send these components to our kubernetes cluster. We apply local Kubernetes manifests (components) to remote clusters with the following commands:

# Create all the core components
ks apply default -c ambassador
ks apply default -c jupyterhub
ks apply default -c centraldashboard
ks apply default -c tf-job-operator

For each of these commands, we will send output on the console similar to what we show below:

INFO Applying services default.ambassador INFO Creating non-existent services default.ambassador INFO Applying services default.statsd-sink INFO Creating non-existent services default.statsd-sink INFO Applying services default.ambassador-admin INFO Creating non-existent services default.ambassador-admin INFO Applying roles default.ambassador INFO Creating non-existent roles default.ambassador INFO Applying serviceaccounts default.ambassador INFO Creating non-existent serviceaccounts default.ambassador INFO Applying rolebindings default.ambassador INFO Creating non-existent rolebindings default.ambassador INFO Applying services default.k8s-dashboard INFO Creating non-existent services default.k8s-dashboard INFO Applying deployments default.ambassador INFO Creating non-existent deployments default.ambassador

As we can see above, the Kubeflow components were installed on GKE remotely. We'll confirm those components are on the kubernetes cluster in a moment, but first we want to take a quick look at what was deployed.

Kubeflow 3.5 Component Details

Quick notes on each component are listed below.

ambassador: The Kubeflow project uses Ambassador as a central point of authentication and routing for their services
jupyterhub: a multi-user Hub, spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server
centraldashboard: To make it easy to navigate the components in Kubeflow a central navigation UI is provided
tf-job-operator: Kubernetes custom resource and operator For TensorFlow jobs

Effectively these components work together to provide a scalable and secure system for running machine learning jobs (notebook-based jobs and also jobs outside of notebooks). Given the rise of Kubernetes as an enterprise platform management system, it makes a lot of sense to have a way to manage our machine learning workloads in a similar manner.

The system also provides compelling flexibility in how data scientists can use the library (language independent as well) of their choice in a notebook or outside of a notebook on Kubeflow. It becomes a compelling offering quickly as now a data scientist can quickly move a workload built in the language of their choice from their laptop to an on-premise enterprise cloud or to a public cloud and leverage more hardware. This fits in with the pattern where machine learning jobs typically are prototyped on a user laptop and then once validated are moved to a more powerful system for training and then model deployment.

At this point, we have installed the basic version of Kubeflow 0.3.5 on our remote GKE kubernetes cluster.

Confirming Kubeflow is Operational

To confirm our cluster is operational and the components are running, try the following command:

kubectl get services

We should see a list of components running that match the components we just installed on our cluster.

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ambassador ClusterIP 10.31.244.0 <none> 80/TCP 6m ambassador-admin ClusterIP 10.31.245.255 <none> 8877/TCP 6m centraldashboard ClusterIP 10.31.251.36 <none> 80/TCP 4m k8s-dashboard ClusterIP 10.31.252.214 <none> 443/TCP 6m kubernetes ClusterIP 10.31.240.1 <none> 443/TCP 6d statsd-sink ClusterIP 10.31.242.245 <none> 9102/TCP 6m tf-hub-0 ClusterIP None <none> 8000/TCP 5m tf-hub-lb ClusterIP 10.31.243.236 <none> 80/TCP 5m tf-job-dashboard ClusterIP 10.31.249.94 <none> 80/TCP 4m