Over the past month, I’ve been working to add the capability for Compute Studio (C/S) to host plots made with Dash/Plotly and Bokeh. This work also paves the way for supporting Shiny and other plotting libraries. For quite some time, I've admired how easy and fast it is to create valuable tools for exploring data using Bokeh or Dash. It makes me incredibly excited that C/S can now host apps created with them. This is also a way for C/S to meet develepers where they are by supporting the tools that they already use every day. In turn, C/S offers them the capability to share their work with colleagues, friends, or the public at large.
As always, publishing apps on C/S is free, but either the app developer, app sponsor, or app user must pay for the compute costs. One of the requirements when building this out was to make sure that the apps are only running when they are being used. This resulted in a low-cost way for developers to deploy and share their Bokeh and Dash/Plotly creations.
If you read this and are interested in publishing your own app on C/S, head on over to the Publish page. Not sure where to get started? Send me an email at [email protected] and I'll walk you through the publishing process.
One of the things that I’ve learned in the short time that I’ve
been working with Kubernetes is that you should start simple and
check your progress often. So before I started trying to automate
anything with the Kubernetes Python API, I created a couple basic
YAML files describing the
Ingress objects needed to serve my example project.
Once I got this to work, I knew that I was going in the right direction and could more confidently write the Python code required to automatically create the necessary Kubernetes objects.
Next, I translated the YAML files into a new
Server class in the
package to be used to automatically create, delete, and check the
status of visualization deployments and services.
from cs_workers.models.clients import Server viz = server.Server( owner="hdoupe", title="myviz", tag="v1", callable_name="start_dash_server", incluster=False, ) # Create a visualization viz.create() # Check visualization status viz.ready_stats() # Delete visualization viz.delete()
After starting at the low-level of just getting the visualization server running and viewing it in my browser. I needed to see it embedded on a C/S webpage. Beyond just being excited, this was a helpful sanity check to make sure that embedding it as an IFrame was the right approach.
Next, it was time to share some early progress. I deployed it to the C/S development site and got a big message from my browser saying that I had a problem, specifically an HTTPS problem. For good reason, you can’t embed a link on a site with HTTPS turned on if the embedded link does not support HTTPS. So, I set about learning how to add TLS to a service in a Kubernetes cluster.
Traefik is a routing service that can be deployed several ways including with Docker and Kubernetes. Traefik routes traffic from your domain to your services based on simple rules such as:
Host(`viz.compute.studio`) && PathPrefix(`/hdoupe/ccc-widget/dash/`)
which corresponds to this URL:
Further, it’s straightforward to integrate with your DNS provider and Let’s Encrypt to make your routes secure with TLS and HTTPS.
One of the cool things about open-source software is that you can learn a lot just by following a project and seeing how they approach problems. For example, I learned a lot about Traefik by reading about how JupyterHub and Dask-Gateway use it. This was an enormous help when setting Traefik up for this project.
I’ll spare the technical details for this section, but the
take-away is that the viz was live at
and rendering correctly as an IFrame on the development
Now that the cluster is more-or-less ready to start serving visualizations, it’s time to bring in the brains of the operation, aka the frontend of C/S. A key design decision for C/S is that the compute cluster is simple but reliable. The frontend is where most data is stored and where higher-level cluster automation logic lives.
There are three data sources that are important for this project:
frame-ancestorsCSP to match the website where the viz will be embedded.
We now have the data we need to answer some key questions, and we have the operations needed to act on these answers. The general lifecycle of a visualization is as follows:
A user goes to the visualization page.
API endpoint letting it know the deployment is still in use. The
snippet only “phones home” while
is True. If the user navigates to another tab, opens another
window, or their computer falls asleep, the document loses focus
and the page no longer phones home.
Another user comes to the visualization page.
Is there a deployment running? Yes! The timestamp of the deployment’s most recent full page load is updated, and the visualization page continues to phone home as long as the user’s webpage has focus.
It’s been some time since we’ve heard from the viz.
Is anyone still using this deployment?
Every 15 minutes a process runs that asks this for every live
deployment. If there hasn’t been a full page load or ping in
over 30 minutes, the deployment is spun down. The frontend does
this by doing a
request to a REST API endpoint on the compute cluster.
These rules will need some tweaking. Bots caused problems early on
because their constant pinging of the viz page meant that the
deployment was never spun down. This was corrected to some extent by
adding the viz pages to the
file. However, there are more measures that can be taken in the
future to mitigate this.
The check for deleting deployments could also be tweaked by setting a different stale-after time limit for the most recent load time and most recent ping time.
Now that some of the Kubernetes resources are controlled by a REST API endpoint, I needed to make sure that not just anyone can manipulate the compute cluster. After some research, it seemed like there are two approaches that could be used to authenticate requests to the compute cluster. One is OAuth 2.0, and the other is to use JSON Web Tokens (JWT). I noticed that JupyterHub uses OAuth, but for our situation, adding OAuth seemed like it would add a great deal of complexity to the architecture of C/S since it required an external authentication service. In the end, the simplest way forward was to use a JWT approach where the frontend and compute cluster use a securely stored secret to encode and decode JWT’s. Then, when the cluster receives requests, it ensures that it can decode the JWT token in the request header with the shared JWT secret.
Time to deploy! The capability to host Bokeh and Dash/Plotly apps has been live for about 10 days now. Deployments are being spun up and down quite smoothly. This isn’t to say that there haven’t been any bugs. However, the primary components of serving apps to users and spinning them up and down quickly have worked as expected.
Here are the numbers for the first Dash app published on Compute Studio: It’s been live for 10 days, has been deployed 54 times and has been running for about 25% of the time during this period. On average, each deployment is live for about an hour.
Over the past few months, I have spent a most of my time improving Compute Studio’s infrastructure to be more scalable, more maintainable, and cheaper. Each of these projects has pushed me to learn a different part of the Kubernetes API. By the end of each project, I feel like I have gained at least twice as much Kubernetes and general infrastructure knowledge as I had at the beginning. It’s been an immensely rewarding (and at times frustrating) experience.
This project had two major pain points and I hope that others learn from them:
IngressRouteobject which is a Custom Resource Definition. You are looking for the
CustomObjectsApi, and here’s an example for how to use it with Traefik’s
Thanks for reading! As always, I’d love to hear your feedback either about the visualizations or the engineering behind them. Please feel welcome to email me at hank at compute.studio, join our community chat, or open an issue at the compute-studio repo.