Event-driven multicluster autoscaling with Calisti

Cody Kai Hartsook
Cody Kai Hartsook

Friday, April 22nd, 2022

Multi-cluster service meshes have emerged as an architecture pattern to enable high availability, fault isolation, and failover for distributed applications. In my experience, this setup can also empower teams to run services across various cloud providers and on-premise resources. Why would this be advantageous? Well aside from mitigating potential vendor-lock, it can enable teams to optimize discrepancies in infrastructure resource availability, scalability, and cost. Within a hybrid-cloud mesh, for instance, we can manage the tradeoff between scalability and cost by splitting or replicating workloads across on-prem and cloud-hosted clusters.

In this blog, we will explore a dynamic, or event-driven, method for replicating workloads across a multi-cluster service mesh. We will create an event-driven autoscaler, utilizing service-mesh properties and APIs from Calisti in addition to Kubernetes’ client-go. This model may be of particular interest for hybrid cloud meshes to implement cloud-bursting, where demand spikes trigger a burst of on-prem services into the cloud. With this in mind, we will design the autoscaler for a primary/peer service mesh setup. Building upon the concept of Kubernetes' horizontal-pod-autoscaler, we can ingest host-level as well as application-level metrics to inform scaling events.

Section 1: Multi-Cluster Mesh Setup

Let us begin by installing Calisti on our primary and peer clusters, creating a multi-cluster, single mesh control plane.

On our peer cluster:

smm install -a

On our primary cluster:

smm install -a
smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE>
smm istio cluster status

Check out the Calisti docs for more installation and usage details. Once both clusters are attached, forming a single mesh, we can deploy an application spanning the mesh. As we will see, cross-cluster service discovery enables microservices in both clusters to communicate with each other. A feature of the namespace sameness concept, dictating that all services across the clusters are shared by default.

smm demoapp install -s frontpage,catalog,bookings,postgresql

smm -c <PEER_CLUSTER_KUBECONFIG_FILE> demoapp install -s movies,payments,notifications,analytics,database,mysql —peer

Now that our mesh is set up, we can turn to implement the autoscaler. As mentioned, we will utilize two primary building blocks; Calisti, as well as Kubernetes’ client-go. We will define the autoscaler as a control loop with a listener, informer, and work queue. To utilize service-level metrics we will instruct the listener to periodically queries Calisti’s graphql API. The informer will then compare these metrics to scaling and reconciliation policies to determine if our microservices need to be replicated across the mesh.

Section 2: Policies

Before we implement these components we will first define an event policy configuration. Since we aim to prioritize resource availability and cost, our policies could take into account three levels of metrics; provider metrics (cost), host metrics (resource availability), and service metrics (health). For the following examples, we will create a policy for service metrics, namely service requests-per-second.

kind: multi-cluster-autoscaler
metadata:
  namespace: smm-demo
spec:
  groupVersionKind:
    kind: Deployment

  selector:
    app: bookings

  policy:
    type: “rps-burst”
    burst-value: 100
    reconcile-value: 60
    throttle-delay: 120

This event policy indicates that our control loop should watch request-per-second metrics for the bookings microservice, of type deployment. The event is triggered if over 100 requests-per-second (rps) are measured for a period of 120 seconds. At which point the controller should replicate or burst bookings into the peer cluster. Conversely, if the bookings deployments receive less than 60 rps for 120 seconds, the traffic and microservice should be scaled back to the primary cluster. Note that these policies could be implemented as Kubernetes custom-resource-definitions, but for simplicity, we will stick with yaml configs.

Section 3: Metrics

For all host and service level metrics we can utilize Calisti's graphql API. For instance, we can retrieve the requests-per-second received by the bookings microservice by sending a query to http://127.0.0.1:50500/api/graphql. Note that the Calisti dashboard must be running to access the API via localhost.

{
  service(name: "bookings", namespace: "smm-demo") {
    metrics(evaluationDurationSeconds: 5) {
      latencyP50
      rps
    }
  }
}
...
{
  "data": {
    "service": {
      "metrics": {
        "latencyP50": 0.08447085452073383,
        "rps": 30.281078348468693
      }
    }
  }
}

If our informer determines the policy is met, enqueuing an event, we will employ our own multi-cluster replication controller to replicate or reconcile runtime objects across clusters. Additionally, we will create an Calisti virtual service and route rule to split application traffic accordingly. The core of the autoscaler implementation lies in the Kubernetes multi-cluster replication controller. We will be discussing two implementations, one which solely utilizes Kubernetes’ client-go scaffolding while the other builds upon Calisti's internal cluster-registry-controller.

Section 4: Multi-Cluster Replication Controller

Let’s first look at how we can create a multi-cluster replication controller using Kubernetes’ client-go library. Our control flow will be as follows; Upon replication or scale-out, retrieve the desired runtime spec from the primary cluster then apply the in-memory spec to the peer cluster. Upon scale back, simply remove the resources from the peer cluster. For the following example, we will be showing sample code from the Deployment multi-cluster replication handler. Note that the implementation is practically identical for all k8s core types given their distinct clientset interfaces.

Given an app label or deployment name from the informer, we can retrieve the desired runtime obeject spec.

func (d *DeploymentHandler) GetDeploymentsByAppLabel(
	cl *kubernetes.Clientset,
	ns string,
	app string) (*app.DeploymentList, error) {

	client := cl.AppsV1().Deployments(ns)
	deployments, err := client.List(context.TODO(), metav1.ListOptions{
		LabelSelector: fmt.Sprintf("app=%s", app),
	})

	if err != nil {...}

	return deployments, nil
}

We then must be able to create a deployment given the in-memory specification that was retrieved.

func (d *DeploymentHandler) createDeployment(
    cl *kubernetes.Clientset,
    ns string,
    deployment *app.Deployment,
    ) error {

    client := cl.AppsV1().Deployments(ns)

    _, err := client.Create(context.TODO(),
        deployment, metav1.CreateOptions{})

    if err != nil {...}

    ctx, cancel := context.WithTimeout(context.Background(), time.Second*60)
    defer cancel()

    // signal when pods are available
    err = WaitForRCPods(cl, ctx, deployment.Spec.Template.Labels["app"], ns, int(*deployment.Spec.Replicas))
    log.Println("all replicas up")
    return err
}

Providing a clientset and deployment spec, we create the deployment and wait for all pod replicas to be available, or interrupt after 60 seconds. We can ensure pods are up by watching the status of pods that belong to the k8s replication controller, in this case, the deployment.

watch, err := cl.CoreV1().Pods(ns).Watch(context.TODO(), metav1.ListOptions{
    LabelSelector: fmt.Sprintf("app=%s",
      rcLabel),
  })
...
for event := range watch.ResultChan() {
        p, ok := event.Object.(*cv1.Pod)
        if !ok {...}

        // check status of pods
        switch p.Status.Phase {
        case "Pending":
            ...
        case "Running":
            ...
        }
}

We will now tie these two primary functionalities together to complete the cross-cluster replication. First, retrieve the desired spec using the primary cluster’s clientset, then apply the spec to the peer cluster using the peer cluster’s clientset.

func (d *DeploymentHandler) Replicate(
    clSource,
	clTarget *kubernetes.Clientset,
    ns string,
    application string) []error {

    deployments, err := d.GetDeploymentsByAppLabel(clSource, ns, application)
    if err != nil {...}

    for _, deployment := range deployments.Items {
        deepCpy := deployment.DeepCopy()
        deepCpy.ResourceVersion = ""  // could add uuid tag or peer-cluster id

        err = d.createDeployment(clTarget, ns, deepCpy)
        if err != nil {...}
    }

    return nil
}

Each time a runtime object is replicated to a peer cluster, we must also replicate any corresponding services. This will enable our virtual service to seamlessly split traffic between the primary and replicated services. Service type replication can be done in the same manner as our Deployment handler examples.

With these client methods, we can dynamically move Kubernetes resources between participating clusters. As mentioned, we can also achieve the cross-cluster replication functionality by building a control layer on top of Calisti’s internal cluster-registry-controller. The registry controller is responsible for synchronizing Kubernetes resources across clusters according to certain rules, defined by a custom-resource-definition (CRD). For instance, the following ResourceSyncRule CRD may be used to synchronize or copy the matched Secret to all participating clusters.

apiVersion: clusterregistry.k8s.cisco.com/v1alpha1
kind: ResourceSyncRule
metadata:
  name: test-secret-sink
spec:
  groupVersionKind:
    kind: Secret
    version: v1
  rules:
  - match:
    - objectKey:
      name: test-secret
      namespace: cluster-registry

Using these rules, we can redefine our multi-cluster replication control flow. Upon replication or scale-out, create a ResourceSyncRule for the desired runtime objects and associated services on the primary cluster. This will synchronize these objects to the peer cluster(s). Upon scale back, remove the ResourceSyncRules on the primary cluster and remove the associated resources on the peer cluster.

To utilize the cluster-registry-controller for replication we will first generate a clientset for the cluster-registry public CRDs using Kubernetes’ code-generator. This will give us a type-safe method for listing, creating, and deleting the defined custom-resource-definitions. With the generated ResourceSyncRule clientset, creating and deleting the CRD is no different from core Kubernetes objects.

ruleSpec := clusterregistryv1alpha1.ResourceSyncRule{...}

rule, err := ruleCRDClient.Create(context.TODO(), ruleSpec, metav1.CreateOptions{})

The cluster-registry-controller takes the burden of replicating resources into the peer cluster but we will still use the client-go to remove objects upon reconcile. We can reuse a helper function used by both implementations to retrive the correct deletion function for all resource types.

func GetDeleter(cl *cls.Clientsets, kind, ns string) (func(context.Context, string, metav1.DeleteOptions) error, error) {
	var deleter func(context.Context, string, metav1.DeleteOptions) error

	switch kind {
	case "ResourceSyncRule":
		deleter = cl.ResourceSyncRuleV1(ns).Delete
	case "Deployment":
		deleter = cl.AppsV1().Deployments(ns).Delete
	case "Statefulset":
		deleter = cl.AppsV1().StatefulSets(ns).Delete
	case "Daemonset":
		deleter = cl.AppsV1().DaemonSets(ns).Delete
	case "Pod":
		deleter = cl.CoreV1().Pods(ns).Delete
	case "Service":
		deleter = cl.CoreV1().Services(ns).Delete
	default:
		return nil, fmt.Errorf("unsupported kind: %v", kind)
	}
	return deleter, nil
}

Upon a reconcile policy trigger, we call the deleter for the ResourceSyncRule in the primary cluster and the replicated core type resources in the peer cluster.

func (r *ResourceSyncHandler) Reconcile(clPrimary, clPeer *cls.Clientsets, resourceName, kind, ns string) error {
	deleter, err := GetDeleter(clPrimary, "ResourceSyncRule", ns)
	if err != nil {...}

	ruleName := rulePrefix + resourceName
	err = deleter(context.TODO(), ruleName, metav1.DeleteOptions{})
	if err != nil {...}

	deleter, err = GetDeleter(clPeer, kind, ns)
	if err != nil {...}
	err = deleter(context.TODO(), resourceName, metav1.DeleteOptions{})
	return err
}

Section 5: Traffic Shifting

The final piece of this autoscaler implementation is traffic shifting. When a policy is met and resources replicated, we will create a Calisti virtual service. The virtual service will split traffic between the microservice in the primary cluster and the replicated version in the peer cluster. We can define destination weights to tell the virtual service how much traffic to send to the two microservices. This can be accomplished by creating a gaphql mutation query. Here we have a sample mutation query that creates a virtual service with two service destinations and their weights. Note that these services are in separate clusters.

applyHTTPRoute(
    input: {
      selector: {
        namespace: "smm-demo"
        hosts: ["bookings"]
      }
      rule: {
        route: [
          {
            destination: { host: "bookings", port: { number: 8080 } }
            weight: 75
          }
          {
            destination: { host: "bookings-repl", port: { number: 8080 } }
            weight: 25
          }
        ]
      }
    }
  )

In our autoscaler controller, we can either use a golang graphql client or marshal this query into JSON and send an HTTP Post request to the Calisti graphql API. Note that when sending a request to the API outside of the graphql console, we will need to provide the authentication cookie generated by the smm dashboard command. To acquire this cookie we can inspect any request from the Calisti UI to Calisti graphql API.

Once the virtual service takes effect, we should see the replicated microservice appear in the service mesh as it handles application traffic.

Section 6: Autoscaler Demo

Now that we have defined the core components, let’s run the completed autoscaler and apply a sample request-per-second event policy for the bookings microservice. For demonstation purposes, we will choose rps values relative to Calisti's demo-app traffic generator, a burst-value of 40 rps and a reconcile-value of 20 rps. Prior to execution, we can confirm that the bookings microservice is within our primary cluster.

replctl apply bookings-controller.yaml

To quickly test the controller we can force a cross-cluster replication event by generating additional load to the bookings service via Calisti’s per-service HTTP load generator. Specifying the service, port, and method, we will generate 100 requests-per-second for a period of 30 seconds.

Checking the controller logs, we should eventually see an event triggered as the service’s 5-second average for requests-per-second surpasses the bust-value.

…
burst triggered for app=bookings
2022/03/27 14:21:55 deployments created and being evaluated
2022/03/27 14:21:55 Waiting for 2 pods to be running.
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:56 pod status: Pending
2022/03/27 14:21:58 pod status: Pending
2022/03/27 14:21:59 pod status: Pending
2022/03/27 14:22:01 all replicas up
2022/03/27 14:22:01 setting v-service route...
2022/03/27 14:22:01 route set.

We can verify that the virtual service and destination rules were added to the bookings services.

We can see that there is a route policy splitting traffic between the bookings service and the bookings-repl service in the peer cluster. If we again check the topology of the mesh, we should see the new bookings deployment in the peer cluster.

The topology confirms that the new deployment is up and is routing traffic to downstream microservices in the peer cluster.

Since we added a short burst of artificial HTTP load, the received requests-per-second will eventually fall back below our event policy’s reconcile-value. This will trigger a reconcile or scale-back event, removing the traffic rule and deployment from the peer cluster.

Summary

This blog highlighted how a service mesh framework, namely Calisti, can be leveraged to dynamically scale or replicate services across a multi-cluster mesh. Using Calisti's graphql API we were able to seamlessly extract service level metrics to inform scaling events. Furthermore, utilizing Kubernetes' client-go and Calisti's cluster-registry-controller, we were able to replicate and reconcile Kubernetes objects across clusters. This is intended to be a starting point for anyone interested in service meshes, cloud-native automation, and of course, Kubernetes.

References
Calisti docs

k8s-client-go

cluster-registry-api

By continuing to use our website, you
acknowledge the use of cookies. Privacy Statement