Sign up for daily dose of tech articles at your inbox.

Horizontal Pod Autoscaling in Kubernetes

Horizontal Pod Autoscaling in Kubernetes
Horizontal Pod Autoscaling in Kubernetes

In this article, you’ll get to learn how to perform horizontal pod autoscaling in Kubernetes based on the resource metrics like CPU, RAM, or other custom metrics. In Kubernetes HPA(Horizontal Pod AutoScaling) can be used with HorizontalAutoScaler API.

Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler scales the number of pods of an application based on the resource metrics.

API version autoscaling/v2 is the stable and default version; this version of API supports CPU utilization-based autoscaling, multiple metrics, and custom and external metrics.

Supported API versions for your cluster can be found using the following command:

$ kubectl api-versions | grep autoscaling

An output similar to the following will be displayed. It will list all supported versions; in this case, we see that all three versions are supported.


Note: Horizontal Pod Autoscaler is used with a Metrics Server installed in the Kubernetes cluster. A Metric Server is a container resource metric such as RAM and CPU usage.


How to Use Horizontal Pod Autoscaling in Kubernetes

Step-1: Installation of Metrics Server

1.1) Using kubectl

You can check if the metric server is installed or not by using the following command:

$ kubectl top pods

error: Metrics API not available

As you can see, I haven’t installed a metric server in my cluster yet.

If you haven’t installed Metrics Server, then run the following command to install Metrics Server:

$ kubectl apply -f <>

serviceaccount/metrics-server created created created created created created
service/metrics-server created
deployment.apps/metrics-server created created

1.2) Using Helm (Preferred)

Set Up Helm first, for Helm values go here.

Add the Metrics-Server Helm repository to your local repo:

$ helm repo add metrics-server <>

Now, install Metrics-Server using Helm: (If you have a self-signed certificate, you have to use --kubelet-insecure-tls argument)

$ helm upgrade --install metrics-server metrics-server/metrics-server
# OR
$ helm upgrade --install --set 'args={--kubelet-insecure-tls}' metrics-server metrics-server/metrics-server

Release "metrics-server" does not exist. Installing it now.
NAME: metrics-server
LAST DEPLOYED: Fri Aug 19 17:25:27 2022
NAMESPACE: linkmi-dev
STATUS: deployed
* Metrics Server                                                      *
  Chart version: 3.8.2
  App version:   0.6.1
  Image tag:

1.3) Verify Installation

Wait for a while and try the top command again:

$ kubectl get pods

NAME                              CPU(cores)   MEMORY(bytes)   
metrics-server-85584546c9-bgdbk   6m           14Mi

Check nodes:

$ kubectl top nodes

NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
minikube   177m         8%     1539Mi          41%

Step-2: Set Up Horizontal Pod AutoScaling

As we have two API versions of this object, it would be good to examine both; however, autoscaling/v2 is the recommended version to use at the time of writing. It is because we can define two different metrics for CPU and memory.

Note: You can use your own appliacation to examine with autoscaling. OR you can checkout my other article on how to deploy python application in Kubernetes.

2.2) Create a deployment

Before that, let’s create a server with deployment using the Nginx image: (demo-server-deployment.yaml)

apiVersion: apps/v1
kind: Deployment
  name: demo-servers
    app: demo-servers
  replicas: 1
      app: demo-servers
        app: demo-servers
      - name: nginx
        image: nginx
        - containerPort: 80
            cpu: 150m
            cpu: 75m

2.2) Create a service


apiVersion: v2
kind: Service
    app: demo-servers
  name: demo-servers
  namespace: default
  - name: demo-servers-port
    port: 80
    app: demo-servers
  sessionAffinity: None
  type: NodePort

2.3) Create HorizontalPodAutoscaler with autoscaling/v2 API Version

Now create scaling configuration using autoscaling/v2 API version.

Lastly, let’s configure our HorizontalPodAutoscaler matching demo-servers deployment in autoscaling/v2 API version for those that choose. Here we have defined two metrics for CPU and memory.


apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
  name: demo-servers
    apiVersion: apps/v1
    kind: Deployment
    name: demo-servers
  minReplicas: 1
  maxReplicas: 5
  - type: Resource
      name: cpu
        type: Utilization
        averageUtilization: 15
  - type: Resource
      name: memory
        type: AverageValue
        averageValue: 20Mi

Let’s check the HPA entries.

$ kubectl get hpa demo-servers

NAME           REFERENCE                 TARGETS                MINPODS   MAXPODS   REPLICAS   AGE
demo-servers   Deployment/demo-servers   7286784/20Mi, 0%/15%   1         5         1          11m

We can also use the describe subcommand to gather more information.

$ kubectl describe hpa web-servers

Name:                                                  demo-servers
Namespace:                                             linkmi-dev
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Fri, 19 Aug 2022 19:59:33 +0545
Reference:                                             Deployment/demo-servers
Metrics:                                               ( current / target )
  resource memory on pods:                             7286784 / 20Mi
  resource cpu on pods  (as a percentage of request):  0% (0) / 15%
Min replicas:                                          1
Max replicas:                                          5
Deployment pods:                                       1 current / 1 desired
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooManyReplicas      the desired replica count is more than the maximum replica count
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Normal   SuccessfulRescale             29m                 horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Sometimes, you may need to use --kubelet-preferred-address-types=InternalIP,ExternalIP to your metric server command (if you’re using on-premise server and throws some errors). i.e.

$ helm upgrade --install --set 'args={--kubelet-insecure-tls, --kubelet-preferred-address-types=InternalIP,ExternalIP}' metrics-server metrics-server/metrics-server

Now run the command again:

$ kubectl get hpa demo-servers

NAME           REFERENCE                 TARGETS                MINPODS   MAXPODS   REPLICAS   AGE
demo-servers   Deployment/demo-servers   7286784/20Mi, 0%/15%   1         5         1          23m
$ kubectl describe hpa demo-servers

Name:                                                  demo-servers
Namespace:                                             linkmi-dev
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Fri, 19 Aug 2022 19:59:33 +0545
Reference:                                             Deployment/demo-servers
Metrics:                                               ( current / target )
  resource memory on pods:                             9420800 / 20Mi
  resource cpu on pods  (as a percentage of request):  0% (0) / 15%
Min replicas:                                          1
Max replicas:                                          5
Deployment pods:                                       2 current / 2 desired
  Type            Status  Reason                   Message
  ----            ------  ------                   -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooManyReplicas      the desired replica count is more than the maximum replica count
  Type     Reason                        Age                 From                       Message
  ----     ------                        ----                ----                       -------
  Normal   SuccessfulRescale             9s                horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Step-3: Horizontal Pod AutoScaling in Operation

Now to check whether our HPA works or not, let’s create some traffic loads to our demo servers. For load, I’m using Hey, a load generator. You can install on your machine with curl/wget commands also.

Port forward our demo-servers service:

$ kubectl port-forward svc/demo-servers 8080:80

Forwarding from -> 80
Forwarding from [::1]:8080 -> 80

Now, run the hey command from your local shell with -n 2000, meaning it should send 10000 requests with five workers concurrently.

$ hey -n 10000 -c 5 <http://localhost:8080/>

To see the effects of the load, let’s check the HPA entry.

$ kubectl get hpa demo-servers

kubectl get hpa demo-servers
NAME           REFERENCE                 TARGETS                MINPODS   MAXPODS   REPLICAS   AGE
demo-servers   Deployment/demo-servers   5260416/10Mi, 46%/15%   1         5         1          43m

Here, at this point, we can see that CPU and memory usage has dramatically increased.

After a short delay, Horizontal Pod Autoscaler gets the new metrics for the pod and calculates the number of replicas it needs for upscale/downscale.

$ kubectl get hpa web-servers

NAME           REFERENCE                 TARGETS                MINPODS   MAXPODS   REPLICAS   AGE
demo-servers   Deployment/demo-servers   35260416/10Mi,57%/15%   1         5         5          45m

Autoscaling is in effect; a total of 5 replicas are created.

We can take a more detailed look using the describe subcommand.

$ kubectl describe hpa web-servers

Conditions and events fields are crucial for troubleshooting and understanding the behavior of the HPA.

Name:                                                  demo-servers
Namespace:                                             linkmi-dev
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Fri, 19 Aug 2022 19:59:33 +0545
Reference:                                             Deployment/demo-servers
Metrics:                                               ( current / target )
  resource memory on pods:                             3579904 / 10Mi
  resource cpu on pods  (as a percentage of request):  34% (21m) / 15%
Min replicas:                                          1
Max replicas:                                          5
Deployment pods:                                       5 current / 5 desired
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooManyReplicas      the desired replica count is more than the maximum replica count
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  2m1s  horizontal-pod-autoscaler  New size: 3; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  1m1s  horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  5s    horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target

Also, we can check the deployment object to see events and several other fields related to autoscaling.

$ kubectl describe deployments web-servers
Name:                   demo-servers
Namespace:              linkmi-dev
CreationTimestamp:      Fri, 19 Aug 2022 19:54:12 +0545
Labels:                 app=demo-servers
Annotations:   3
Selector:               app=demo-servers
Replicas:               5 desired | 5 updated | 5 total | 5 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=demo-servers
    Image:      nginx
    Port:       80/TCP
    Host Port:  0/TCP
      cpu:  50m
      cpu:        25m
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   demo-servers-747744bc5c (5/5 replicas created)
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  4m50s  deployment-controller  Scaled up replica set demo-servers-77cbb55d6 to 3
  Normal  ScalingReplicaSet  3m50s  deployment-controller  Scaled up replica set demo-servers-77cbb55d6 to 4
  Normal  ScalingReplicaSet  2m49s  deployment-controller  Scaled up replica set demo-servers-77cbb55d6 to 5

Here are all the replicas created.

$ kubectl get pods

NAME                              READY   STATUS    RESTARTS   AGE
demo-servers-747744bc5c-bhfm9     1/1     Running   0          11m
demo-servers-747744bc5c-k9tjs     1/1     Running   0          21m
demo-servers-747744bc5c-bhfm9     1/1     Running   0          21m
demo-servers-747744bc5c-k9tjs     1/1     Running   0          22m
demo-servers-747744bc5c-k9tjs     1/1     Running   0          22m
metrics-server-768c786db5-xgsdc   1/1     Running   0          64m

That’s it.


In this, we discussed how to configure Horizontal Pod Autoscaling(HPA). And also how to scale using the custom metric option with the observation of the metrics.

Thank YOU!

Sign up for daily dose of tech articles at your inbox.