This tutorial is on how to get Kubeflow 3.5 installed and working on GKE in under 30 minutes. For a review on why Kubeflow is compelling in the current machine leanring infrastructure landscape, check out our report.
Tasks in this tutorial:
Get the latest python
Setup GCloud Acccount
Setup GCloud SDK
Install kubernetes cluster running on GKE
Install Kubectl locally
Install Ksonnet locally
Setup github token setup
Use Ksonnet to deploy Kubeflow to our cluster on GCP
Before you install Google Cloud's SDK, make sure and upgrade python to the latest version to avoid issues (e.g., SSL issues).
After you have python updated, take a look at the instructions for getting the GCloud SDK working.
Note that the name of the project and the project ID may not be exactly the same, so be careful. Most of the time we want to use the Project_ID of our project when working from the command line. It may take 3-5 minutes for the system to complete the kubernetes cluster setup on GCP.
kubectl controls the Kubernetes cluster manager and is a command line interface for running commands against Kubernetes clusters.
We use kubectl to deploy and manage applications on Kubernetes. Using kubectl, we can
inspect cluster resources
For a more complete list of functions in kubectl, check out this cheatsheet.
An easy way to install kubectl on OSX is to use the brew command:
brew install kubernetes-cli
Once we have kubectl, we need permission for it to talk to our remote managed kubernetes cluster on GCP.
We get the credentials for our new kubernetes cluster with the command:
This command writes a context into our local ~/.kube/context file so kubectl knows where to look for the current cluster we're working with. In some cases, you will be working with multiple clusters, and their context information will also be stored in this file.
Once we can connect to our kubernetes cluster with kubectl, we can check out the status of the running cluster with the command:
We should see output similar to below:
Kubernetes master is running at https://126.96.36.199
GLBCDefaultBackend is running at https://188.8.131.52/api/v1/namespaces/kube-system/services/default-http-backend:http/proxy
Heapster is running at https://184.108.40.206/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://220.127.116.11/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://18.104.22.168/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Ksonnet is a CLI-supported framework for extensible Kubernetes configurations. Ksonnet provides an organizational structure and specialized features for managing configurations across different clusters and environments. For this demo we'll use Ksonnet (the ks command) to install a specific version of Kubeflow (v3.5) and deploy it to our new managed GKE cluster. Let's take a quick look at some key terminology used with ksonnet applications.
A Ksonnet environment is a unique location to deploy our application to. It consists of:
a unique name
the kubernetes cluster address
the cluster's namespace
the kubernetes API version
You can customize different cluster deployments with environments.
A Ksonnet prototype is an object that describes a set of kubernetes resources in an abstract way. This object also includes associated parameters for these resources. An example of this is how Kubeflow has a prototype for tf-job-operator as we'll see later in this article.
A Ksonnet component is a specific implementation of a prototype. We create a component by "filling in" the parameters for the prototype. A prototype can be deployed to a kubernetes cluster and can also directly generate standard Kubernetes YAML files. Each environment may be customized with different parameters for a prototype.
To install Ksonnet on OSX easily just use brew with the command:
brew install ks
This will get ksonnet install locally, which will pull code from github and use kubectl to deploy the application code (Kubeflow) to our GKE cluster.
Setting Up a Github Auth Token
We need to use a GitHub personal access token with ksonnet otherwise we quickly run into GitHub API limits.
We need to create a personal access token and use it in place of a password when ksonnet is performing operations over HTTPS with Git on the command line against github's API. Once you have logged in and created the token, we set it as an env variable with the command:
Now we are ready to use Ksonnet and deploy Kubeflow to our cluster.
CLI Install Steps for Kubeflow v3.5
Use Ksonnet's ks command to initalize your new kubernetes application.
ks init [app-name]
The output on the screen should look similar to what you see below.
INFO Using context "gke_kubeflow-3-5-project_us-central1-a_kf-3-5-k8s" from kubeconfig file "/Users/josh/.kube/config"
INFO Creating environment "default" with namespace "default", pointing to "version:v1.10.11" cluster at address "https://22.214.171.124"
INFO Generating ksonnet-lib data at path '/Users/josh/Documents/workspace/PattersonConsulting/kubeflow_gke_test/jp-gke-kf-3-5/lib/ksonnet-lib/v1.10.11'
As we can see above, ksonnet found our GKE cluster context in our kubeconfig file and was able to configure our new ksonnet application to use it. Ksonnet also initialized our application directory with some template code. Now we need to customize this application code with the Kubeflow codebase from github.
At this point we need to add Kubeflow to our custom ksonnet project. To do this we nee add the Kubeflow repository to our project and then we pull the individual kubeflow packages into our local project.
Specifically here we need to add kubeflow 3.5 to the ksonnet registry so it knows here to look to download the code. We do this with the following comand:
Next we want to install packages from this github repository in our local ksonnet application. We want to work with a specific version of Kubeflow for this tutorial, so we'll specify the v0.3.5 version of Kubeflow with the command below.
ks pkg install email@example.com
The console won't report much, but you should see the following output:
INFO Retrieved 38 files
We should now be able to see the installed package in our ./app.yaml file and also in our ./vendor directory.
Now we need to generate the components for the application from the package we installed from the v0.3.5 Kubeflow codebase on github. We generate components with the following commands:
For each of these commands, we will send output on the console similar to what we show below:
INFO Applying services default.ambassador
INFO Creating non-existent services default.ambassador
INFO Applying services default.statsd-sink
INFO Creating non-existent services default.statsd-sink
INFO Applying services default.ambassador-admin
INFO Creating non-existent services default.ambassador-admin
INFO Applying roles default.ambassador
INFO Creating non-existent roles default.ambassador
INFO Applying serviceaccounts default.ambassador
INFO Creating non-existent serviceaccounts default.ambassador
INFO Applying rolebindings default.ambassador
INFO Creating non-existent rolebindings default.ambassador
INFO Applying services default.k8s-dashboard
INFO Creating non-existent services default.k8s-dashboard
INFO Applying deployments default.ambassador
INFO Creating non-existent deployments default.ambassador
As we can see above, the Kubeflow components were installed on GKE remotely. We'll confirm those components are on the kubernetes cluster in a moment, but first we want to take a quick look at what was deployed.
Kubeflow 3.5 Component Details
Quick notes on each component are listed below.
ambassador: The Kubeflow project uses Ambassador as a central point of authentication and routing for their services
jupyterhub: a multi-user Hub, spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server
centraldashboard: To make it easy to navigate the components in Kubeflow a central navigation UI is provided
Effectively these components work together to provide a scalable and secure system for running machine learning jobs (notebook-based jobs and also jobs outside of notebooks). Given the rise of Kubernetes as an enterprise platform management system, it makes a lot of sense to have a way to manage our machine learning workloads in a similar manner.
The system also provides compelling flexibility in how data scientists can use the library (language independent as well) of their choice in a notebook or outside of a notebook on Kubeflow. It becomes a compelling offering quickly as now a data scientist can quickly move a workload built in the language of their choice from their laptop to an on-premise enterprise cloud or to a public cloud and leverage more hardware. This fits in with the pattern where machine learning jobs typically are prototyped on a user laptop and then once validated are moved to a more powerful system for training and then model deployment.
At this point, we have installed the basic version of Kubeflow 0.3.5 on our remote GKE kubernetes cluster.
Confirming Kubeflow is Operational
To confirm our cluster is operational and the components are running, try the following command:
kubectl get services
We should see a list of components running that match the components we just installed on our cluster.