gcloud-monitor

gcloud-monitor is a repository of tasks that monitor and restart preemptible Google Cloud Platform Compute Engine instances to maximize availability.

Continuing on the streetcred project, gcloud-monitor was spun out of necessity in the deployment of streetcred. streetcred compiles various traffic indicators from LTA’s DataMall every 15 minutes and makes the historic data publicly available on Google Cloud Storage. I’m running streetcred on a Google Compute Engine instance. The monthly free instance was not powerful enough to complete the traffic images processing tasks, so the next best way to manage costs (while utilizing more compute) was through preemptible instances. They’re decent, with a couple of drawbacks, and gcloud-monitor was designed to address these drawbacks to make preemptible instances almost perfect for this use case.

What is a preemptible instance?

The major difference of preemptible Compute Engine instances to regular Compute Engine instances is that preemptible instances may be stopped any time if there are other tasks that require the compute resources. In addition, preemptible instances run for a maximum of 24 hours, after which they are terminated. In exchange for this severe limitation, preemptible instances are considerably cheaper to run than regular instances.

What tasks are ideal for preemptible instances?

Since it is not possible to predict when exactly a machine may go down, such instances are only advisable for fault-tolerant jobs that will not be catastrophically affected by a shutdown occurring in the middle of a run. Furthermore, these instances are not covered by any SLA, so non-critical tasks where cost savings are of higher priority (such as streetcred) are more suited.

Next, here are 4 reasons why I created gcloud-monitor:

1. Preemptible instances are cheaper

As mentioned, preemptible instances are much cheaper than regular instances. How much cheaper? According to Google, savings are up to 80%. For example, an n1-standard-1 instance (1 vCPU. 3.75 GB memory) running in Iowa would cost US$24.27 on a regular instance, and US$7.30 on a preemptible instance. That’s almost 70% cheaper. Which makes a lot of difference for a weekend project. Putting it another way, with the same budget I could run the project 2-3 times longer.

2. Preemptible instances last for a maximum of 24 hours

Next, the drawback. In practice, preemptible instances terminated due to resources needed elsewhere does not seem to occur often. In my few weeks of experimentation, terminations were always triggered by reaching the end of the 24 hour period . These instances are not deleted, just stopped. What I needed was just a trigger to start them back up again.

3. Preemptible instances do not self-restart

While there is an automatic restart option for non user-initiated instance terminations, that option is not available for preemptible instances. And so came the first iteration of gcloud-monitor, which basically pinged the Compute Engine API every minute to check if the instance in question was running, and if not, send a second call to start the instance.

4. Preemptible instances do not terminate at a fixed time

Finally, while instances last as long as 24 hours, there are no guarantees as to when the terminations will occur. Factoring in the additional minute required to detect and restart the instance, this results in the termination time actually shifting from day to day. streetcred compiled data on road conditions around Singapore every 15 minutes, so eventually the shutdown collided with a collection event and while nothing crashed, the collection was missed. The entire time an instance is down could be as little as 1 minute a day or less. In theory it is possible to obtain a relatively reliable instance where terminations do not interfere with running of tasks. To achieve this, I added a second function to gcloud-monitor, which would terminate the instance at a fixed time every day and reset the 24 hour counter at a time outside a data collection run.

But wait, where does gcloud-monitor run?

For the monitoring to work gcloud-monitor had to be run on a separate instance that was not preemptible. Fortunately, the resources required for gcloud-monitor are very small, and can be run off the instance provided under the always free tier, thus not adding to the cost of the project.

And that brings gcloud-monitor to where it is today! I found it really helpful in allowing me to make the most of the preemptible instance, and I hope it will be helpful to some of you too. Here’s a link to the Github repository.

Leave a Reply

Your email address will not be published. Required fields are marked *