Kubernetes cluster sizing overview

Application requirement : static[ deployment /replication controller / stateful set ]

dynamic pod : pod creation is fine [ replication controller / deployment ] = based on the application requirment

iso-

Master node shouldn’t have any application

Kubernetes datastore – ETCD [ quorum – 1 masternode,3masternode,5node,7node]

1 master node +ETCD – same node [ 16vcpu +64gbram]- Linux vm [ Kubeadm] [ Ranchertool]

[Master Node Backup is ready and if its down then we can enable it like FT][16vcpu+64gbram]

1 worker node := Linux vm [ kubeadm]

1pod = order

Container default = 110 containers [ docker] [0.5vcpu *110 containers] = 32 vcpu

Container default = application team [ memory = 1gbram] = 110* 1gbram = 100 gbram

Volumes = persistent storage volume [ multiple datacenter ] = NFS Storage persistent disk

Local disk = 300gb = local volume [ persistent volume ]

VMFS storage – SDRS Cluster – storage classes – persistent volume

Network = NSX-T [ Docker = Bridge= |host = Local [ Default]|Overlay = NFS storage]

Worker nod :

8 vcpu * 32gb ram local = 300gb [ on-demand]

Network:-

VM = worker node = 10.10.10.10 [ 255.255.255.0] 10.10.10.1

Pod : 100 containers = 172.1.1.1 – 172.1.1.100 [ 255.255.254.0]

Node ip : [10ipaddress = external ip : 10.10.10.150 to 10.10.10.200= reserved for pod exposure]

Node 1 = pod1 -10.101.010 / pod 2 – 10.10.10.10

Kubeproxy: ip

Kubelet: ip

Kubedns: ip

Var/log =10gb

Var/lib/docker= 30 gb

Home = 20 gb

Root location = 20 gb

===========================================

Master

Vm – master node = 10.10.10.10 [ 255.255.255.0] 10.10.10.1

ETCD endpoints [1 etcd = reserve rip 10.10.10.200 to 203] =

Scheduler : ip

Controller :ip

Api :ip

Kubedns :ip

Flannel / Calico:ip

Fluentd: Loginng: ip

==================================

Promethus /Grafana dashboard monitoring:-

Splunk setup = application logging/ kubernetes monitoring = regex command

Node1 : 10.10.10.10 = vm = subnetmask = 255.255.255.0

Container = 110 ip address : 172.x.x.x = gateay and subnet mask

5 worker node

POd - 8 core - 1 pod 2 cpu - 4 pod –

Order -pod

Inventory check

Payment gateway

Database pod

API

=====================================================

Kubernetes Cluster Hardware Recommendations

Overview

This document covers the minimal hardware recommendations for the Kublr Platform and Kublr Kubernetes cluster. Once read, you can proceed with the deployment of the Kublr Platform and Kubernetes cluster.

Kublr Kubernetes Cluster Requirements

Role

Minimal required memory

Minimal required CPU (cores)

Components

Master node

2 GB

1.5

Kublr-Kubernetes master components (k8s-core, cert-updater, fluentd, kube-addon-manager, rescheduler, network, etcd, proxy, kubelet)

Worker node

700 mB

0.5

Kublr-Kubernetes worker components (fluentd, dns, proxy, network, kubelet)

Centralized monitoring agent *

2 GB

0.7

Prometheus. We recommend limit 2GB for typical installation of managed cluster which has 8 working, 40 pods per node with total 320 nodes. Retention period for Prometheus agent is 1 hour.

Kublr Platform Feature Requirements

Feature

Required memory

Required CPU

Feature: Control Plane

1.9 GB

1.2

Feature: Centralized monitoring

5 GB

1.2

Feature: Centralized logging

11 GB

1.4

Feature: k8s core components

0.5 GB

0.15

Kublr Platform Deployment Example

Single master Kubernetes cluster, at one-two worker nodes, use all Kublr’s features (two for basic reliability)

For a minimal Kublr Platform installation you should have one master node with 4GB memory and 2 CPU and worker node(s) with total 10GB + 1GB × (number of nodes) and 4.4 + 0.5 × (number of nodes) CPU cores.

Please note: We do not recommend using this configuration in production but this configuration is suitable to start exploring the Kublr Platform.

Provider

Master Instance Type

Worker Instance Type

Amazon Web Services

t2.large/t3.large (2 vCPU, 4GB)

2 × t2(t3) xlarge (4 vCPU, 16GB)

Google Cloud Platform

n1-standard-2 (2 vCPU, 7.5GB)

2 × n1-standard-4 (4 vCPU, 15GB)

Microsoft Azure

A2 v2 (2 vCPU, 4GB)

2 × A8 v2 (8 vCPU, 16GB)

On-Premises

2 vCPU, 5GB

2 × VM (3 vCPU, 16GB)

Workload Example

Master node: Kublr-Kubernetes master components (2 GB, 1.5 vCPU),

Worker node 1: Kublr-Kubernetes worker components (0.7 GB, 0.5 vCPU), Feature: ControlPlane (1.9GB, 1.2 vCPU), Feature: Centralized monitoring (5 GB, 1.2 vCPU) Feature: k8s core components (0.5 GB, 0.15 vCPU) Feature: Centralized logging (11GB, 1.4 vCPU)

Worker node 2: Kublr-Kubernetes worker components (0.7 GB, 0.5 vCPU), Feature: Centralized logging (11GB, 1.4 vCPU)

Self-Hosted Features

Kublr has several self-hosted features, which could be installed separately in Kublr-Kubernetes clusters.

Feature

Required memory

Required CPU

Self-hosted logging

9GB

Self-hosted monitoring

2.8GB

1.4

Calculating Needed Memory and CPU Availability for Business Applications

Note: By default Kublr disables scheduling business application on the master, which can be modified. Thus, we use only worker nodes in our formula.

Available memory = (number of nodes) × (memory per node) - (number of nodes) × 0.7GB - (has Self-hosted logging) × 9GB - (has Self-hosted monitoring) × 2.9GB - 0.4 GB - 2GB (Central monitoring agent per every cluster).

Available CPU = (number of nodes) × (vCPU per node) - (number of nodes) × 0.5 - (has Self-hosted logging) × 1 - (has Self-hosted monitoring) × 1.4 - 0.1 - 0.7 (Central monitoring agent per every cluster).

Example

User wants to create a Kublr-Kubernetes cluster with 5 n1-standard-4 nodes (in Google Cloud Platform) with enabled Self-hosted logging, but disabled Self-hosted monitoring, then:

Available memory = 5 × 15 - 5 × 0.7 - yes ×9 - no × 2.8 - 0.4 - 2= 60.1GB.
Available CPU = 5 × 4 - 5 × 0.5 - yes × 1 - no × 1.4 - 0.1 - 0.7= 15.7 vCPUs.

Note: You will use centralized monitoring available in the Kublr Platform instead of Self-hosted monitoring

Total Required Disk calculation for Prometheus

To plan the disk capacity of a Prometheus server, you can use the rough formula:

RequiredDiskSpaceInBytes = RetentionPeriodInSeconds * IngestedSamplesPerSecond * BytesPerSample

RetentionPeriodInSeconds = 7 days by default (7 * 24 * 3600) BytesPerSample = 2 bytes in accordance with Prometheus documentation (http://prometheus.io/docs/prometheus/latest/storage/) IngestedSamples can be calculated as following:

IngestedSamples = IngestedSamplesPerKublrPlatform + Sum(IngestedSamplesPerKublrCluster)

IngestedSamplesPerKublrPlatform = (IngestedSamplesPerMasterNode * NumOfMasterNodes) + (IngestedSamplesPerWorkingNode * NumOfWorkingNodes) + IngestedSamplesPerControlPlane

IngestedSamplesPerKublrCluster = (IngestedSamplesPerMasterNode * NumOfMasterNodes) + (IngestedSamplesPerWorkingNode * NumOfWorkingNodes) + Sum(IngestedSamplesPerUserApplication)

IngestedSamplesPerMasterNode = 1000 samples can be used for regular Kublr Cluster Installation IngestedSamplesPerWorkingNode = 500 samples can be used for regular Kublr Cluster Installation IngestedSamplesPerControlPlane = 2500 samples can be used for regular Kublr ControlPlane deployment IngestedSamplesPerUserApplication = should be estimated by user

Total Required Disk calculation for Elasticsearch

To plan the disk capacity of Elasticsearch, you can use the rough formula:

RequiredDiskSpaceInGBytes = 4NumberOfElasticsearchMasterNodes + (0.7NumberOfPlatformMasterNodes + 0.5NumberOfPlatformWorkingNodes + 0.7NumberOfAllClusterMasterNodes + 0.07*NumberOfAllClusterWorkingNodes + AllClustersDailyPayload) * (CuratorPeriod+1) * SafetyFactor

AllClustersDailyPayload = Ratio * SizeOfAllLogsGeneratedByNonKublrContainers

Recommended Ratio is 7 for average size of log records equals 132 bytes (we have established ratio = 9.5 for average size of log records equals 49 bytes)

Default CuratorPeriod = 2. It means Curator will delete indexes older than 2 days. To change please refer https://docs.kublr.com/logging/#5-change-parameters-to-collect-logs-for-more-than-2-days

For example, let’s calculate required space for platform (with 3 master nodes and 2 work nodes) and two clusters created by platform (each cluster has 3 master node, 5 work nodes), each one deployed with some business application that generates 3.4Gb of logs every day. CuratorPeriod (period of logs cleaning) will be 14 days. Let’s use Safety Factor equals 1.3 (+30% of minimal calculated disk space to compensate for the errors of calculation)

AllClustersDailyPayload = 7 * (3.42) = 47.6 RequiredDiskSpaceInGBytes = 43 + ( 0.73 + 0.52 + 0.76 + 0.0710 + 47.6)*(14+1) * 1.3 = 1096.2 To plan the disk capacity of a SelfHosted Elasticsearch, you can use the rough formula:

RequiredDiskSpaceInGBytes = 4NumberOfElasticsearchMasterNodes + (0.5NumberOfClusterMasterNodes + 0.4*NumberOfClusterWorkingNodes + DailyPayload) * (CuratorPeriod+1) * SafetyFactor

Elasticsearch configuration recommendations

Default number of Master/Data/Client nodes is 1/1/1. It is highly recommended to use 3 or more master nodes in production.

Please research Elasticsearch memory recommendations. Default heap size for data node is 3072m. To change it, please override elasticsearch.data.heapSize value during cluster creation as in example. It is possible to provide additional Elasticsearch environment variables by setting elasticsearch.cluster.env values.

According to load tests, 100 pods (one record, the size of 16kbytes, is generated every second) raise CPU consumption of Elasticsearch data node to 0.4. In case of 100 pods generating 10-50 records of 132 bytes every second, CPU consumption of Elasticsearch data node