In this article, you’ll get to learn how to perform horizontal pod autoscaling in Kubernetes based on the resource metrics like CPU, RAM, or other custom metrics. In Kubernetes HPA(Horizontal Pod AutoScaling) can be used with HorizontalAutoScaler API.
Horizontal Pod Autoscaler
The Horizontal Pod Autoscaler scales the number of pods of an application based on the resource metrics.
API version autoscaling/v2
is the stable and default version; this version of API supports CPU utilization-based autoscaling, multiple metrics, and custom and external metrics.
Supported API versions for your cluster can be found using the following command:
$ kubectl api-versions | grep autoscaling
An output similar to the following will be displayed. It will list all supported versions; in this case, we see that all three versions are supported.
autoscaling/v1
autoscaling/v2
autoscaling/v2beta1
autoscaling/v2beta2
Note: Horizontal Pod Autoscaler is used with a Metrics Server installed in the Kubernetes cluster. A Metric Server is a container resource metric such as RAM and CPU usage.
Prerequisites
- Kubernetes Cluster set up with one of these — Minikube, OpenShift Developer Sandbox, MicroK8s, etc.
How to Use Horizontal Pod Autoscaling in Kubernetes
Step-1: Installation of Metrics Server
1.1) Using kubectl
You can check if the metric server is installed or not by using the following command:
$ kubectl top pods
error: Metrics API not available
As you can see, I haven’t installed a metric server in my cluster yet.
If you haven’t installed Metrics Server, then run the following command to install Metrics Server:
$ kubectl apply -f <https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml>
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
1.2) Using Helm (Preferred)
Set Up Helm first, for Helm values go here.
Add the Metrics-Server Helm repository to your local repo:
$ helm repo add metrics-server <https://kubernetes-sigs.github.io/metrics-server/>
Now, install Metrics-Server using Helm: (If you have a self-signed certificate, you have to use --kubelet-insecure-tls
argument)
$ helm upgrade --install metrics-server metrics-server/metrics-server
# OR
$ helm upgrade --install --set 'args={--kubelet-insecure-tls}' metrics-server metrics-server/metrics-server
Release "metrics-server" does not exist. Installing it now.
NAME: metrics-server
LAST DEPLOYED: Fri Aug 19 17:25:27 2022
NAMESPACE: linkmi-dev
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
* Metrics Server *
***********************************************************************
Chart version: 3.8.2
App version: 0.6.1
Image tag: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
***********************************************************************
1.3) Verify Installation
Wait for a while and try the top
command again:
$ kubectl get pods
NAME CPU(cores) MEMORY(bytes)
metrics-server-85584546c9-bgdbk 6m 14Mi
Check nodes:
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
minikube 177m 8% 1539Mi 41%
Step-2: Set Up Horizontal Pod AutoScaling
As we have two API versions of this object, it would be good to examine both; however, autoscaling/v2
is the recommended version to use at the time of writing. It is because we can define two different metrics for CPU and memory.
Note: You can use your own appliacation to examine with autoscaling. OR you can checkout my other article on how to deploy python application in Kubernetes.
2.2) Create a deployment
Before that, let’s create a server with deployment using the Nginx image: (demo-server-deployment.yaml)
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-servers
labels:
app: demo-servers
spec:
replicas: 1
selector:
matchLabels:
app: demo-servers
template:
metadata:
labels:
app: demo-servers
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
resources:
limits:
cpu: 150m
requests:
cpu: 75m
2.2) Create a service
demo-server-service.yaml:
apiVersion: v2
kind: Service
metadata:
labels:
app: demo-servers
name: demo-servers
namespace: default
spec:
ports:
- name: demo-servers-port
port: 80
selector:
app: demo-servers
sessionAffinity: None
type: NodePort
2.3) Create HorizontalPodAutoscaler with autoscaling/v2
API Version
Now create scaling configuration using autoscaling/v2 API version.
Lastly, let’s configure our HorizontalPodAutoscaler matching demo-servers
deployment in autoscaling/v2
API version for those that choose. Here we have defined two metrics for CPU and memory.
demo-server-autoscaler.yaml:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: demo-servers
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: demo-servers
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 15
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 20Mi
Let’s check the HPA entries.
$ kubectl get hpa demo-servers
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
demo-servers Deployment/demo-servers 7286784/20Mi, 0%/15% 1 5 1 11m
We can also use the describe
subcommand to gather more information.
$ kubectl describe hpa web-servers
Name: demo-servers
Namespace: linkmi-dev
Labels: <none>
Annotations: <none>
CreationTimestamp: Fri, 19 Aug 2022 19:59:33 +0545
Reference: Deployment/demo-servers
Metrics: ( current / target )
resource memory on pods: 7286784 / 20Mi
resource cpu on pods (as a percentage of request): 0% (0) / 15%
Min replicas: 1
Max replicas: 5
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited True TooManyReplicas the desired replica count is more than the maximum replica count
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 29m horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Sometimes, you may need to use --kubelet-preferred-address-types=InternalIP,ExternalIP
to your metric server command (if you’re using on-premise server and throws some errors). i.e.
$ helm upgrade --install --set 'args={--kubelet-insecure-tls, --kubelet-preferred-address-types=InternalIP,ExternalIP}' metrics-server metrics-server/metrics-server
Now run the command again:
$ kubectl get hpa demo-servers
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
demo-servers Deployment/demo-servers 7286784/20Mi, 0%/15% 1 5 1 23m
$ kubectl describe hpa demo-servers
Name: demo-servers
Namespace: linkmi-dev
Labels: <none>
Annotations: <none>
CreationTimestamp: Fri, 19 Aug 2022 19:59:33 +0545
Reference: Deployment/demo-servers
Metrics: ( current / target )
resource memory on pods: 9420800 / 20Mi
resource cpu on pods (as a percentage of request): 0% (0) / 15%
Min replicas: 1
Max replicas: 5
Deployment pods: 2 current / 2 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited True TooManyReplicas the desired replica count is more than the maximum replica count
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 9s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Step-3: Horizontal Pod AutoScaling in Operation
Now to check whether our HPA works or not, let’s create some traffic loads to our demo servers. For load, I’m using Hey, a load generator. You can install on your machine with curl/wget commands also.
Port forward our demo-servers service:
$ kubectl port-forward svc/demo-servers 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
Now, run the hey
command from your local shell with -n 2000, meaning it should send 10000 requests with five workers concurrently.
$ hey -n 10000 -c 5 <http://localhost:8080/>
To see the effects of the load, let’s check the HPA entry.
$ kubectl get hpa demo-servers
kubectl get hpa demo-servers
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
demo-servers Deployment/demo-servers 5260416/10Mi, 46%/15% 1 5 1 43m
Here, at this point, we can see that CPU and memory usage has dramatically increased.
After a short delay, Horizontal Pod Autoscaler gets the new metrics for the pod and calculates the number of replicas it needs for upscale/downscale.
$ kubectl get hpa web-servers
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
demo-servers Deployment/demo-servers 35260416/10Mi,57%/15% 1 5 5 45m
Autoscaling is in effect; a total of 5 replicas are created.
We can take a more detailed look using the describe
subcommand.
$ kubectl describe hpa web-servers
Conditions and events fields are crucial for troubleshooting and understanding the behavior of the HPA.
Name: demo-servers
Namespace: linkmi-dev
Labels: <none>
Annotations: <none>
CreationTimestamp: Fri, 19 Aug 2022 19:59:33 +0545
Reference: Deployment/demo-servers
Metrics: ( current / target )
resource memory on pods: 3579904 / 10Mi
resource cpu on pods (as a percentage of request): 34% (21m) / 15%
Min replicas: 1
Max replicas: 5
Deployment pods: 5 current / 5 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited True TooManyReplicas the desired replica count is more than the maximum replica count
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 2m1s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 1m1s horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 5s horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
Also, we can check the deployment object to see events and several other fields related to autoscaling.
$ kubectl describe deployments web-servers
Name: demo-servers
Namespace: linkmi-dev
CreationTimestamp: Fri, 19 Aug 2022 19:54:12 +0545
Labels: app=demo-servers
Annotations: deployment.kubernetes.io/revision: 3
Selector: app=demo-servers
Replicas: 5 desired | 5 updated | 5 total | 5 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=demo-servers
Containers:
nginx:
Image: nginx
Port: 80/TCP
Host Port: 0/TCP
Limits:
cpu: 50m
Requests:
cpu: 25m
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: demo-servers-747744bc5c (5/5 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 4m50s deployment-controller Scaled up replica set demo-servers-77cbb55d6 to 3
Normal ScalingReplicaSet 3m50s deployment-controller Scaled up replica set demo-servers-77cbb55d6 to 4
Normal ScalingReplicaSet 2m49s deployment-controller Scaled up replica set demo-servers-77cbb55d6 to 5
Here are all the replicas created.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
demo-servers-747744bc5c-bhfm9 1/1 Running 0 11m
demo-servers-747744bc5c-k9tjs 1/1 Running 0 21m
demo-servers-747744bc5c-bhfm9 1/1 Running 0 21m
demo-servers-747744bc5c-k9tjs 1/1 Running 0 22m
demo-servers-747744bc5c-k9tjs 1/1 Running 0 22m
metrics-server-768c786db5-xgsdc 1/1 Running 0 64m
That’s it.
Conclusion
In this, we discussed how to configure Horizontal Pod Autoscaling(HPA). And also how to scale using the custom metric option with the observation of the metrics.
Thank YOU!