Blog All Levels

Kubernetes Interview Questions & Answers (2025)

50+ Kubernetes interview questions and answers from basic to advanced — covering Pods, Deployments, Services, Networking, RBAC, Helm, Autoscaling, Security, and real-world troubleshooting scenarios.

January 20, 2025 34 min read DB

Kubernetes (K8s) is an open-source container orchestration platform originally built by Google. It automates the deployment, scaling, and management of containerized applications.

Why use Kubernetes?

  • Self-healing — restarts failed containers, replaces and reschedules pods
  • Auto-scaling — scales pods up/down based on CPU, memory, or custom metrics
  • Rolling updates — deploy new versions with zero downtime
  • Load balancing — distributes traffic across pod replicas automatically
  • Storage orchestration — mounts local or cloud storage automatically
  • Secret management — stores and manages sensitive configuration data

Core architecture:

Control Plane                    Worker Nodes
┌─────────────────┐              ┌────────────────────┐
│  kube-apiserver  │◄────────────│  kubelet           │
│  etcd            │             │  kube-proxy        │
│  kube-scheduler  │             │  container runtime │
│  controller-mgr  │             │  (containerd)      │
└─────────────────┘              └────────────────────┘
ComponentRole
kube-apiserverEntry point for all API requests
etcdDistributed key-value store — cluster source of truth
kube-schedulerAssigns pods to nodes based on resources
kube-controller-managerRuns reconciliation loops (ReplicaSet, Node, etc.)
kubeletNode agent — ensures containers run as specified
kube-proxyMaintains network rules for Service IPs

A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share the same network namespace, IP address, and storage volumes.

Key differences:

ContainerPod
UnitSingle processOne or more containers
NetworkOwn namespaceShared namespace — containers use localhost
StorageOwn filesystemShared volumes between containers
Managed byDocker/containerdKubernetes

Single-container Pod (most common):

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.25
    ports:
    - containerPort: 80
    resources:
      requests:
        cpu: "100m"
        memory: "128Mi"
      limits:
        cpu: "250m"
        memory: "256Mi"

Multi-container Pod (sidecar pattern):

apiVersion: v1
kind: Pod
metadata:
  name: app-with-sidecar
spec:
  containers:
  - name: app           # Main container
    image: my-app:v1
    ports:
    - containerPort: 8080
  - name: log-shipper   # Sidecar — reads logs from shared volume
    image: fluent-bit:latest
    volumeMounts:
    - name: logs
      mountPath: /var/log/app
  volumes:
  - name: logs
    emptyDir: {}

Pods are ephemeral — when a pod dies, it is NOT restarted unless managed by a controller (Deployment, StatefulSet, DaemonSet).

Each workload type serves a different purpose:

KindUse ForPod NamesStorageOrdering
DeploymentStateless apps (APIs, web)RandomEphemeralAny order
StatefulSetStateful apps (DBs, queues)Stable (app-0, app-1)Persistent PVC per podOrdered (0→1→2)
DaemonSetOne pod per node (agents)One per nodeNode-localN/A
JobRun-to-completion tasksRandomOptionalN/A
CronJobScheduled recurring tasksRandomOptionalN/A

Deployment — stateless, rolling updates:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
      - name: api
        image: my-api:v2

StatefulSet — stable identity for databases:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
  volumeClaimTemplates:          # Each pod gets its own PVC
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 20Gi

DaemonSet — one pod per node (log agents, monitoring):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      containers:
      - name: fluentd
        image: fluent/fluentd:v1.16

A Service gives pods a stable network endpoint. Pods come and go, but the Service IP stays constant.

TypeAccessible FromPort RangeUse Case
ClusterIPInside cluster onlyAnyInternal service-to-service
NodePortOutside via NodeIP:Port30000–32767Dev/test access
LoadBalancerExternal via cloud LBAnyProduction external traffic
ExternalNameInside clusterN/AAlias to external DNS name

ClusterIP (default):

apiVersion: v1
kind: Service
metadata:
  name: backend-svc
spec:
  type: ClusterIP       # default — can omit this line
  selector:
    app: backend
  ports:
  - port: 80            # Service port
    targetPort: 8080    # Container port

LoadBalancer (production — AWS NLB example):

apiVersion: v1
kind: Service
metadata:
  name: web-svc
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - port: 443
    targetPort: 8080
    protocol: TCP

Headless Service (StatefulSets — direct pod DNS):

apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
spec:
  clusterIP: None       # Makes it headless
  selector:
    app: postgres
  ports:
  - port: 5432
# DNS: postgres-0.postgres-headless.default.svc.cluster.local

ConfigMap stores non-sensitive configuration. Secret stores sensitive data (base64-encoded, can be encrypted at rest).

Create ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  APP_ENV: "production"
  LOG_LEVEL: "info"
  config.yaml: |
    server:
      port: 8080
      timeout: 30s    

Create Secret:

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
stringData:              # stringData auto-encodes to base64
  DB_HOST: "postgres.prod.svc.cluster.local"
  DB_USER: "appuser"
  DB_PASS: "supersecretpassword"

Use in Pod — as environment variables:

spec:
  containers:
  - name: app
    image: my-app:v1
    env:
    - name: APP_ENV
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: APP_ENV
    - name: DB_PASS
      valueFrom:
        secretKeyRef:
          name: db-secret
          key: DB_PASS
    envFrom:             # Load ALL keys from ConfigMap as env vars
    - configMapRef:
        name: app-config

Use in Pod — as mounted files:

spec:
  containers:
  - name: app
    image: my-app:v1
    volumeMounts:
    - name: config-vol
      mountPath: /etc/app/config.yaml
      subPath: config.yaml
  volumes:
  - name: config-vol
    configMap:
      name: app-config
# Quick commands
kubectl create configmap app-config --from-file=config.yaml
kubectl create secret generic db-secret --from-literal=DB_PASS=mysecret
kubectl get secret db-secret -o jsonpath='{.data.DB_PASS}' | base64 -d

A Namespace is a virtual cluster within a physical cluster. It provides a mechanism for isolating groups of resources.

Default namespaces:

kubectl get namespaces
# NAME              STATUS   AGE
# default           Active   30d   ← where resources go if not specified
# kube-system       Active   30d   ← Kubernetes system components
# kube-public       Active   30d   ← publicly readable resources
# kube-node-lease   Active   30d   ← node heartbeat objects

Create and use a namespace:

# Create namespace
kubectl create namespace production
kubectl create namespace staging

# Deploy to a specific namespace
kubectl apply -f deployment.yaml -n production

# Set default namespace for your context
kubectl config set-context --current --namespace=production

# View all resources in a namespace
kubectl get all -n production

# View resources across ALL namespaces
kubectl get pods --all-namespaces
kubectl get pods -A   # shorthand

Namespace with resource quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
    services: "10"

Key facts:

  • Some resources are cluster-scoped (Nodes, PersistentVolumes, ClusterRoles) — they do NOT belong to a namespace
  • DNS format: service-name.namespace.svc.cluster.local
  • ResourceQuotas and LimitRanges are set per namespace

Requests = minimum guaranteed resources (used for scheduling). Limits = maximum allowed resources (enforced at runtime).

spec:
  containers:
  - name: app
    image: my-app:v1
    resources:
      requests:
        cpu: "250m"       # 0.25 CPU cores — guaranteed
        memory: "256Mi"   # 256 MiB — guaranteed
      limits:
        cpu: "500m"       # 0.5 CPU cores — maximum
        memory: "512Mi"   # 512 MiB — maximum

What happens when limits are exceeded:

ResourceExceeded Behavior
CPUContainer is throttled — slowed down but NOT killed
MemoryContainer is OOMKilled (exit code 137) — killed immediately
# Check if a pod was OOMKilled
kubectl describe pod <pod-name>
# Look for: OOMKilled in the Last State section

# Check exit code
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
# 137 = OOMKilled, 1 = app crash, 0 = success

CPU units:

1 CPU    = 1 vCPU = 1 AWS vCPU = 1 GCP Core = 1000m
500m     = 0.5 CPU
100m     = 0.1 CPU (minimum recommended)

Quality of Service (QoS) classes:

Guaranteed  → requests == limits (best — never evicted first)
Burstable   → requests < limits  (middle)
BestEffort  → no requests or limits set (worst — evicted first)
# Check QoS class of a pod
kubectl get pod <pod-name> -o jsonpath='{.status.qosClass}'

PersistentVolume (PV) — a piece of storage provisioned in the cluster (like an AWS EBS volume). PersistentVolumeClaim (PVC) — a request for storage by a user/pod (like requesting a specific size).

Developer creates PVC  →  K8s finds/creates matching PV  →  Pod mounts PVC

Dynamic provisioning (most common — cloud environments):

# 1. StorageClass defines HOW storage is created
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
reclaimPolicy: Delete
allowVolumeExpansion: true

---
# 2. PVC requests storage (PV created automatically)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-pvc
spec:
  accessModes:
  - ReadWriteOnce      # RWO = one node at a time
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 50Gi

---
# 3. Pod mounts the PVC
apiVersion: v1
kind: Pod
metadata:
  name: postgres
spec:
  containers:
  - name: postgres
    image: postgres:15
    volumeMounts:
    - name: data
      mountPath: /var/lib/postgresql/data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: database-pvc

Access modes:

ModeShortDescription
ReadWriteOnceRWORead/write by one node
ReadOnlyManyROXRead-only by many nodes
ReadWriteManyRWXRead/write by many nodes (NFS, EFS)
kubectl get pv               # List all PersistentVolumes
kubectl get pvc              # List all PersistentVolumeClaims
kubectl describe pvc my-pvc  # Check binding status

A Service exposes pods inside the cluster or with a single external IP/port. An Ingress is an API object that manages HTTP/HTTPS routing to multiple services from a single entry point — like a smart router.

Internet → Ingress Controller (NGINX/ALB) → Ingress Rules → Services → Pods

Without Ingress — need a LoadBalancer per service (expensive):

app.com    → LoadBalancer 1 ($$$) → Service A
api.com    → LoadBalancer 2 ($$$) → Service B
admin.com  → LoadBalancer 3 ($$$) → Service C

With Ingress — one LoadBalancer, smart routing:

                    ┌── /        → Service A (frontend)
LoadBalancer ───────┼── /api/    → Service B (backend)
                    └── /admin/  → Service C (admin)

Ingress resource:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - yourdomain.com
    secretName: tls-cert-secret
  rules:
  - host: yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-svc
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: backend-svc
            port:
              number: 8080

Install NGINX Ingress Controller (most common):

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace

Kubernetes networking follows 3 fundamental rules:

  1. Every pod gets its own IP address
  2. Pods can communicate with all other pods without NAT
  3. Nodes can communicate with all pods without NAT

CNI (Container Network Interface) plugins implement these rules:

CNI PluginUse CaseFeatures
CalicoMost popularNetworkPolicy, BGP, eBPF
FlannelSimple, lightweightBasic overlay network
CiliumHigh performanceeBPF, L7 policies, observability
WeaveEasy setupEncrypted by default
AWS VPC CNIEKS nativePods get real VPC IPs

Pod-to-Pod communication:

Same node:    Pod A → veth → cbr0 bridge → veth → Pod B
Across nodes: Pod A → veth → cbr0 → eth0 → [overlay/BGP] → eth0 → cbr0 → Pod B

DNS resolution in the cluster:

# Pod DNS format
<pod-ip-dashes>.<namespace>.pod.cluster.local
# Example: 10-0-0-1.default.pod.cluster.local

# Service DNS format
<service-name>.<namespace>.svc.cluster.local
# Example: my-svc.production.svc.cluster.local

# Test DNS from inside a pod
kubectl run dns-test --image=busybox --rm -it --restart=Never -- nslookup my-svc.production

NetworkPolicy — restrict traffic between pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-netpol
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend         # Apply to backend pods
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend    # Only allow traffic from frontend pods
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres    # Backend can only talk to postgres
    ports:
    - protocol: TCP
      port: 5432

RBAC (Role-Based Access Control) controls who can do what in Kubernetes. It uses 4 objects:

ObjectScopePurpose
RoleNamespaceDefines permissions within a namespace
ClusterRoleCluster-wideDefines cluster-wide permissions
RoleBindingNamespaceBinds Role/ClusterRole to users/groups/SAs in a namespace
ClusterRoleBindingCluster-wideBinds ClusterRole cluster-wide

Complete read-only setup for a developer:

# 1. Role — read-only in 'development' namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer-readonly
  namespace: development
rules:
- apiGroups: [""]                    # Core API group
  resources: ["pods", "pods/log", "services", "configmaps", "endpoints"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets", "statefulsets", "daemonsets"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
  resources: ["jobs", "cronjobs"]
  verbs: ["get", "list", "watch"]

---
# 2. RoleBinding — attach role to a user
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-readonly-binding
  namespace: development
subjects:
- kind: User
  name: john@company.com   # IAM user or OIDC user
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer-readonly
  apiGroup: rbac.authorization.k8s.io

Test permissions:

# Check what a user can do
kubectl auth can-i list pods -n development --as=john@company.com
# yes

kubectl auth can-i delete pods -n development --as=john@company.com
# no

# List all permissions for a user
kubectl auth can-i --list -n development --as=john@company.com

HPA automatically adjusts the number of Pod replicas based on observed metrics (CPU, memory, or custom).

How it works:

Metrics Server → HPA Controller (polls every 15s) → adjusts replica count

Requirements:

  • Metrics Server must be installed in the cluster
  • Pods must define resource requests (otherwise HPA cannot calculate utilization)

Complete HPA with CPU + Memory + Custom metric:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
  # Scale on CPU utilization
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70    # Target 70% CPU across all pods
  # Scale on Memory
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 512Mi       # Target 512Mi per pod
  # Scale on custom metric (requests per second)
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  # Scaling behaviour (prevent flapping)
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60      # Wait 60s before scaling up again
      policies:
      - type: Pods
        value: 4                          # Max 4 pods per scale-up
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300     # Wait 5 min before scaling down
      policies:
      - type: Percent
        value: 10                         # Scale down max 10% at a time
        periodSeconds: 60
# Install Metrics Server (required for HPA)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Watch HPA in action
kubectl get hpa -n production -w

# Check current metrics
kubectl top pods -n production

Helm is the package manager for Kubernetes. It lets you define, install, and upgrade complex Kubernetes applications using charts (packaged YAML templates).

Without Helm — manually apply 10+ YAML files:

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml
kubectl apply -f configmap.yaml
kubectl apply -f secret.yaml
# ...and more

With Helm — one command:

helm install my-app ./my-chart --namespace production

Chart structure:

my-chart/
├── Chart.yaml         ← Chart metadata (name, version, description)
├── values.yaml        ← Default configuration values
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── configmap.yaml
│   └── _helpers.tpl   ← Reusable template functions
└── charts/            ← Dependency charts

Key Helm commands:

# Add a chart repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# Search for charts
helm search repo nginx

# Install a chart
helm install my-nginx bitnami/nginx \
  --namespace web \
  --create-namespace \
  --set replicaCount=3 \
  --set service.type=LoadBalancer

# Install with custom values file
helm install my-nginx bitnami/nginx -f custom-values.yaml

# Upgrade a release
helm upgrade my-nginx bitnami/nginx --set image.tag=1.25

# Rollback to previous version
helm rollback my-nginx 1

# List all releases
helm list --all-namespaces

# Uninstall a release
helm uninstall my-nginx -n web

# Render templates without installing (dry-run)
helm template my-nginx bitnami/nginx
helm install my-nginx bitnami/nginx --dry-run --debug

Taints are applied to nodes to repel pods from being scheduled there. Tolerations are applied to pods to allow them to schedule on tainted nodes.

Taint on node  →  repels all pods
Toleration on pod  →  "I can tolerate this taint, schedule me here"

Taint effects:

EffectBehavior
NoScheduleNew pods without toleration won’t schedule here
PreferNoScheduleScheduler tries to avoid, but not guaranteed
NoExecuteNew pods won’t schedule AND existing pods are evicted

Real-world example — dedicated GPU nodes:

# 1. Taint the GPU node
kubectl taint nodes gpu-node-1 dedicated=gpu:NoSchedule

# 2. View taints on a node
kubectl describe node gpu-node-1 | grep -i taint

# 3. Remove a taint (add minus at the end)
kubectl taint nodes gpu-node-1 dedicated=gpu:NoSchedule-
# 4. Pod with toleration can schedule on GPU node
apiVersion: v1
kind: Pod
metadata:
  name: ml-training
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"
  nodeSelector:
    dedicated: gpu           # Also use nodeSelector to ATTRACT to GPU node
  containers:
  - name: training
    image: tensorflow/tensorflow:latest-gpu
    resources:
      limits:
        nvidia.com/gpu: 1

Common real-world taints:

# Taint spot instances (avoid running critical workloads)
kubectl taint nodes spot-node-1 spot=true:NoSchedule

# Taint nodes for maintenance
kubectl taint nodes node-1 maintenance=true:NoExecute

# Dedicated nodes for monitoring stack
kubectl taint nodes monitoring-node dedicated=monitoring:NoSchedule

Kubernetes uses three types of probes to monitor container health:

ProbeQuestionFailure ActionWhen Used
LivenessIs the app alive?Restart containerApp is stuck/deadlocked
ReadinessIs the app ready for traffic?Remove from Service endpointsApp still initializing
StartupHas the app started?Restart container (overrides liveness during startup)Slow-starting apps

Complete probe configuration:

spec:
  containers:
  - name: api
    image: my-api:v1
    ports:
    - containerPort: 8080

    # Startup probe — give app 5 mins to start (30 * 10s)
    # Liveness/Readiness are paused until startup succeeds
    startupProbe:
      httpGet:
        path: /health/startup
        port: 8080
      failureThreshold: 30
      periodSeconds: 10

    # Liveness probe — restart if app is dead
    livenessProbe:
      httpGet:
        path: /health/live
        port: 8080
      initialDelaySeconds: 10   # Wait 10s after container starts
      periodSeconds: 15         # Check every 15s
      failureThreshold: 3       # Restart after 3 failures
      timeoutSeconds: 5         # Fail if no response in 5s

    # Readiness probe — remove from load balancer if not ready
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
      failureThreshold: 3
      successThreshold: 1       # Need 1 success to be marked ready

Probe types:

# HTTP GET — most common
httpGet:
  path: /healthz
  port: 8080
  httpHeaders:
  - name: X-Health-Check
    value: "true"

# TCP Socket — for non-HTTP services
tcpSocket:
  port: 5432     # Just checks if port is open

# Exec command — run a command inside container
exec:
  command:
  - /bin/sh
  - -c
  - "redis-cli ping | grep PONG"

A ServiceAccount provides an identity for processes running inside a Pod to interact with the Kubernetes API. Every pod automatically gets the default ServiceAccount if not specified.

Why use custom ServiceAccounts?

  • Grant specific pods only the permissions they need (least privilege)
  • Use with IRSA (IAM Roles for Service Accounts) on EKS for AWS access
  • Audit trail — know which pod made which API call

Create a ServiceAccount with RBAC:

# 1. Create ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: deployment-manager
  namespace: production

---
# 2. Create Role with needed permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: deployment-role
  namespace: production
rules:
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "update", "patch"]

---
# 3. Bind the Role to the ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: deployment-role-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: deployment-manager
  namespace: production
roleRef:
  kind: Role
  name: deployment-role
  apiGroup: rbac.authorization.k8s.io

---
# 4. Use ServiceAccount in Pod
apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  serviceAccountName: deployment-manager    # Attach the SA here
  automountServiceAccountToken: true
  containers:
  - name: app
    image: my-app:v1

EKS IRSA — give pods AWS IAM permissions:

# Associate ServiceAccount with IAM role (no access keys needed in pods!)
eksctl create iamserviceaccount \
  --cluster my-cluster \
  --namespace production \
  --name s3-access-sa \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
  --approve

Rolling update replaces old pods with new ones gradually — ensuring zero downtime by keeping some old pods running until new ones are ready.

Old pods: [v1] [v1] [v1]
Step 1:   [v1] [v1] [v2]   ← 1 new pod added
Step 2:   [v1] [v2] [v2]   ← 1 old pod removed
Step 3:   [v2] [v2] [v2]   ← update complete

Configure rolling update strategy:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2          # Max 2 extra pods above desired count
      maxUnavailable: 0    # Never go below 6 pods (zero downtime)
  template:
    spec:
      containers:
      - name: web
        image: web-app:v2.0

Perform and monitor an update:

# Update the image
kubectl set image deployment/web-app web=web-app:v2.0 -n production

# Or edit deployment directly
kubectl edit deployment web-app -n production

# Watch rollout progress
kubectl rollout status deployment/web-app -n production
# Waiting for deployment "web-app" rollout to finish: 2 out of 6 new replicas...

# View rollout history
kubectl rollout history deployment/web-app

# View details of a specific revision
kubectl rollout history deployment/web-app --revision=2

Rollback:

# Rollback to previous version immediately
kubectl rollout undo deployment/web-app -n production

# Rollback to a specific revision
kubectl rollout undo deployment/web-app --to-revision=3

# Pause a rollout (freeze mid-rollout)
kubectl rollout pause deployment/web-app

# Resume a paused rollout
kubectl rollout resume deployment/web-app

Recreate strategy (causes downtime — for stateful apps):

strategy:
  type: Recreate    # All old pods killed, then all new pods started

Init Containers run to completion before any application containers start. They run sequentially — each must succeed before the next starts.

Use cases:

  • Wait for a dependency (database) to be ready
  • Download configuration/secrets before app starts
  • Set file permissions on shared volumes
  • Run database migrations before app starts

Real-world example — wait for DB + run migrations:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  template:
    spec:
      initContainers:

      # Init 1 — wait for database to be ready
      - name: wait-for-db
        image: busybox:1.36
        command:
        - /bin/sh
        - -c
        - |
          until nc -z postgres-svc 5432; do
            echo "Waiting for database..."
            sleep 5
          done
          echo "Database is ready!"          

      # Init 2 — run database migrations
      - name: run-migrations
        image: my-app:v2.0
        command: ["python", "manage.py", "migrate"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: DATABASE_URL

      # Main application container starts ONLY after both inits succeed
      containers:
      - name: web-app
        image: my-app:v2.0
        ports:
        - containerPort: 8000

Key differences from regular containers:

Init ContainerRegular Container
RunsBefore app containersParallel with other containers
CompletionMust finish (exit 0)Runs indefinitely
RestartRestarts until successBased on restartPolicy
ProbesNo liveness/readinessSupported

A PodDisruptionBudget (PDB) limits the number of pods that can be simultaneously unavailable during voluntary disruptions (node drains, upgrades) — ensuring your app stays highly available during maintenance.

Voluntary disruptions (PDB protects against these):

  • kubectl drain node during upgrades
  • Cluster autoscaler scaling down nodes
  • Node pool upgrades

Involuntary disruptions (PDB does NOT protect):

  • Node hardware failure
  • Kernel panic
  • OOMKill

Create a PDB:

# Option 1: minAvailable — always keep at least N pods running
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
  namespace: production
spec:
  minAvailable: 2           # Always keep at least 2 pods up
  selector:
    matchLabels:
      app: web-app

---
# Option 2: maxUnavailable — allow at most N pods to be down
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
  namespace: production
spec:
  maxUnavailable: 1         # Allow at most 1 pod to be unavailable
  selector:
    matchLabels:
      app: api-server

---
# Option 3: percentage
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: backend-pdb
spec:
  minAvailable: "80%"       # Always keep 80% of pods available
  selector:
    matchLabels:
      app: backend
# View PDBs
kubectl get pdb -n production

# Check PDB status
kubectl describe pdb web-app-pdb -n production
# Shows: Allowed disruptions, Current pods, etc.

# Drain a node (respects PDBs)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

By default, Kubernetes Secrets are stored base64-encoded (NOT encrypted) in etcd. This means anyone with etcd access can read them.

Best practices for Kubernetes Secret management:

1. Enable Encryption at Rest:

# /etc/kubernetes/encryption-config.yaml (on API server)
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}   # Fallback for unencrypted data

2. Use External Secret Managers (recommended for production):

# External Secrets Operator — syncs AWS Secrets Manager → K8s Secret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secretsmanager
    kind: ClusterSecretStore
  target:
    name: db-secret              # Creates this K8s Secret
  data:
  - secretKey: DB_PASSWORD
    remoteRef:
      key: prod/myapp/database   # AWS Secrets Manager path
      property: password

3. RBAC to limit Secret access:

# Only allow specific service accounts to read secrets
rules:
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["db-secret", "api-key"]  # Specific secrets only
  verbs: ["get"]

4. Never do these:

# ❌ Never put secrets in ConfigMaps
# ❌ Never commit secrets to Git
# ❌ Never echo secrets in logs
# ❌ Never use environment variables for very sensitive data
#    (they can appear in process lists)

# ✅ Mount secrets as files instead
volumeMounts:
- name: db-credentials
  mountPath: /etc/secrets
  readOnly: true
volumes:
- name: db-credentials
  secret:
    secretName: db-secret
    defaultMode: 0400    # Read-only for owner only
# Audit who accessed a secret
kubectl get events | grep secret
# Enable audit logs in kube-apiserver for full audit trail

A pod in Pending means the scheduler cannot find a suitable node. This is always a resource or constraint issue.

Systematic diagnosis:

# Step 1 — describe the pod (most important command)
kubectl describe pod <pod-name> -n <namespace>
# Look at the 'Events' section at the bottom

# Step 2 — check node resources
kubectl describe nodes | grep -A 5 "Allocated resources"
kubectl top nodes

# Step 3 — check if PVC is bound (if pod mounts one)
kubectl get pvc -n <namespace>

Common causes and fixes:

Cause 1: Insufficient CPU/Memory:

# Events will show:
# "0/3 nodes are available: 3 Insufficient cpu"

# Fix: Scale up node group or reduce resource requests
kubectl get nodes
kubectl describe node <node> | grep -A 10 "Allocated resources"

# Check what's using resources
kubectl top pods --all-namespaces --sort-by=cpu

Cause 2: No nodes match nodeSelector/Affinity:

# Events: "0/3 nodes are available: 3 node(s) didn't match Pod's node affinity"

# Check node labels
kubectl get nodes --show-labels

# Add missing label to node
kubectl label node <node-name> disktype=ssd

Cause 3: PVC not bound:

# Events: "persistentvolumeclaim not found" or PVC stuck in Pending

kubectl describe pvc <pvc-name>
# Check if StorageClass exists
kubectl get storageclass

Cause 4: Taint not tolerated:

# Events: "0/3 nodes are available: 3 node(s) had untolerated taint"

kubectl describe nodes | grep -i taint
# Add toleration to pod spec

Cause 5: Too many pods on nodes (maxPods limit):

# Each node has a default limit of 110 pods
kubectl describe node <node> | grep "Non-terminated Pods"

Quick diagnosis script:

# One command to see all pending pods and their reason
kubectl get pods --all-namespaces --field-selector=status.phase=Pending
kubectl describe pods --all-namespaces | grep -A 10 "Events:"

CrashLoopBackOff = the container starts, crashes, Kubernetes restarts it — in a loop. The backoff time doubles each time (10s → 20s → 40s → … up to 5 min).

Systematic diagnosis:

# Step 1 — describe pod for events and exit codes
kubectl describe pod <pod-name> -n <namespace>

# Step 2 — current logs (may be empty if app crashes immediately)
kubectl logs <pod-name> -n <namespace>

# Step 3 — PREVIOUS container logs (before the crash) — most useful
kubectl logs <pod-name> -n <namespace> --previous

# Step 4 — check exit code
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'

Exit code reference:

Exit CodeMeaningFix
0Success (not a crash issue)Check restart policy
1App errorCheck app logs
137OOMKilled (Out of Memory)Increase memory limit
139SegfaultBug in app or wrong binary
143SIGTERM — graceful shutdownCheck if liveness probe is too aggressive

Common fixes:

# Fix OOMKill (exit 137) — increase memory limit
kubectl patch deployment <name> -p \
  '{"spec":{"template":{"spec":{"containers":[{"name":"app","resources":{"limits":{"memory":"1Gi"}}}]}}}}'

# Fix: App can't connect to database
# Check if DB service is reachable from pod
kubectl exec -it <pod-name> -- nc -zv postgres-svc 5432

# Fix: Wrong image command — override to debug
kubectl run debug-pod \
  --image=<same-image> \
  --restart=Never \
  --rm -it \
  --command -- /bin/sh

# Fix: Liveness probe killing app too early — increase initialDelaySeconds
livenessProbe:
  initialDelaySeconds: 60    # Give app more time to start
  failureThreshold: 5

# Fix: Missing environment variable or secret
kubectl exec -it <pod-name> -- env | grep DB_   # Check env vars
kubectl describe pod <pod-name> | grep -A 5 "Environment"

Zero-downtime deployments require a combination of correct deployment strategy, pod lifecycle hooks, and health probes.

Complete zero-downtime deployment configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2          # Can temporarily have 8 pods (6+2)
      maxUnavailable: 0    # Never drop below 6 healthy pods
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      # 1. Give pods time to finish in-flight requests before shutdown
      terminationGracePeriodSeconds: 60

      containers:
      - name: web
        image: web-app:v2.0
        ports:
        - containerPort: 8080

        # 2. Readiness probe — pod only gets traffic when truly ready
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 3

        # 3. Liveness probe — restart if pod is dead
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 15

        # 4. preStop hook — wait for traffic to drain before shutdown
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - sleep 15    # Wait 15s for load balancer to remove pod from rotation

        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Canary deployment pattern:

# Deploy v2 to 10% of traffic first
kubectl scale deployment web-app-v1 --replicas=9    # 9 old pods
kubectl scale deployment web-app-v2 --replicas=1    # 1 new pod (10%)

# If v2 is healthy, gradually increase
kubectl scale deployment web-app-v2 --replicas=5    # 50%
kubectl scale deployment web-app-v2 --replicas=10   # 100%
kubectl scale deployment web-app-v1 --replicas=0    # Remove old

Blue-Green deployment:

# Switch Service selector from blue to green instantly
kubectl patch service web-svc \
  -p '{"spec":{"selector":{"version":"v2"}}}'

# Rollback instantly by switching back
kubectl patch service web-svc \
  -p '{"spec":{"selector":{"version":"v1"}}}'

etcd is a distributed, consistent key-value store that serves as Kubernetes’ source of truth. Every object (pods, services, configmaps, secrets) is stored in etcd.

Architecture:

All cluster state stored in etcd:
/registry/pods/default/my-pod
/registry/services/default/my-svc
/registry/deployments/production/web-app
/registry/secrets/default/db-secret

etcd uses the Raft consensus algorithm:

  • Requires a quorum (majority) to function: (n/2) + 1
  • 3 members → can tolerate 1 failure
  • 5 members → can tolerate 2 failures
  • Always use odd numbers of etcd members
Cluster SizeQuorumTolerable Failures
110
321
532
743

What happens when etcd goes down:

etcd down → API server cannot read/write state
           → No new pods can be scheduled
           → Existing pods keep running (kubelet works independently)
           → kubectl commands fail
           → New deployments fail

Backup etcd (critical for disaster recovery):

# Take an etcd snapshot
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y%m%d).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Verify the snapshot
ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot-$(date +%Y%m%d).db

# Restore from snapshot
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
  --data-dir=/var/lib/etcd-restored

# Automate daily backups via CronJob
kubectl apply -f etcd-backup-cronjob.yaml

Best practices:

  • Always run etcd on separate dedicated nodes from worker nodes
  • Use SSDs — etcd is I/O intensive
  • Monitor etcd latency (should be < 10ms)
  • Take snapshots before every cluster upgrade

Kubernetes security is a defence-in-depth approach with multiple layers:

Layer 1 — API Server security:

# Restrict anonymous access
--anonymous-auth=false

# Enable audit logging
--audit-log-path=/var/log/kubernetes/audit.log
--audit-policy-file=/etc/kubernetes/audit-policy.yaml

# Disable insecure port
--insecure-port=0

Layer 2 — RBAC (least privilege):

# Never use cluster-admin in applications
# Create minimal roles per service

# Audit RBAC permissions
kubectl auth can-i --list --as=system:serviceaccount:production:my-sa

Layer 3 — Network Policies (zero-trust networking):

# Deny all traffic by default, then allow explicitly
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  podSelector: {}          # Applies to ALL pods in namespace
  policyTypes:
  - Ingress
  - Egress
  # No rules = deny all

Layer 4 — Pod Security (Security Context):

spec:
  securityContext:
    runAsNonRoot: true       # Never run as root
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault   # Enable seccomp filtering

  containers:
  - name: app
    image: my-app:v1
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true    # Container cannot write to filesystem
      capabilities:
        drop:
        - ALL                          # Drop all Linux capabilities
        add:
        - NET_BIND_SERVICE             # Add only what you need

Layer 5 — Image Security:

# Scan images before pushing
trivy image my-app:v1
grype my-app:v1

# Use Image Policy Webhook to block vulnerable images
# Use private registry — never use :latest tag in production

Layer 6 — Secrets Management:

# Enable encryption at rest for etcd
# Use External Secrets Operator with AWS Secrets Manager / Vault
# Rotate secrets regularly

Layer 7 — Runtime Security:

# Use Falco for runtime threat detection
helm install falco falcosecurity/falco \
  --namespace falco-system \
  --create-namespace

# Falco detects: shell in containers, privilege escalation,
# unexpected network connections, file system changes

Pod Security Admission (replaces deprecated PodSecurityPolicy):

# Label namespace to enforce security standards
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted    # Most strict
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

A Kubernetes Operator is a method of packaging, deploying, and managing a Kubernetes application using Custom Resource Definitions (CRDs) and custom controllers that encode operational knowledge.

The Operator pattern:

Human Operator knowledge → encoded in → Custom Controller
                                           ↓
CRD (custom resource) → Controller reconciles → Desired state

When to build a Kubernetes Operator:

  • Managing stateful applications (databases, message queues)
  • Automating complex operational tasks (backups, upgrades, failover)
  • When your app needs more than Deployment/StatefulSet
  • Encoding domain-specific knowledge (e.g., how to scale a database cluster)

Example CRD — custom database resource:

# 1. Define the Custom Resource Definition
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: postgresclusters.db.example.com
spec:
  group: db.example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              replicas:
                type: integer
              version:
                type: string
              backupSchedule:
                type: string
  scope: Namespaced
  names:
    plural: postgresclusters
    singular: postgrescluster
    kind: PostgresCluster

---
# 2. Use the custom resource (like any K8s object now)
apiVersion: db.example.com/v1
kind: PostgresCluster
metadata:
  name: my-database
  namespace: production
spec:
  replicas: 3
  version: "15.2"
  backupSchedule: "0 2 * * *"    # Operator handles backups automatically

Popular real-world operators:

# Install cert-manager operator (manages TLS certificates)
helm install cert-manager jetstack/cert-manager --set installCRDs=true

# Install Prometheus operator (manages monitoring stack)
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack

# Install Strimzi operator (manages Kafka clusters)
helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator

Build your own operator:

# Use Operator SDK (most popular framework)
operator-sdk init --domain example.com --repo github.com/example/my-operator
operator-sdk create api --group apps --version v1 --kind MyApp --resource --controller

# Or use Kubebuilder
kubebuilder init --domain example.com
kubebuilder create api --group apps --version v1 --kind MyApp

Three mechanisms control resource usage at different levels:

1. ResourceQuota — limits resources for an entire namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    # Compute resources
    requests.cpu: "20"           # Max 20 CPU cores requested in namespace
    requests.memory: 40Gi        # Max 40Gi memory requested
    limits.cpu: "40"
    limits.memory: 80Gi

    # Object count limits
    pods: "100"
    services: "20"
    persistentvolumeclaims: "30"
    secrets: "50"
    configmaps: "50"

    # Storage limits
    requests.storage: "500Gi"
    storageclass.storage.k8s.io/fast-ssd.requests.storage: "200Gi"

2. LimitRange — sets defaults and limits per Pod/Container:

apiVersion: v1
kind: LimitRange
metadata:
  name: production-limitrange
  namespace: production
spec:
  limits:
  # Container-level defaults and max/min
  - type: Container
    default:               # Default LIMIT if not specified
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:        # Default REQUEST if not specified
      cpu: "100m"
      memory: "128Mi"
    max:                   # Maximum any container can request
      cpu: "4"
      memory: "8Gi"
    min:                   # Minimum any container must request
      cpu: "50m"
      memory: "64Mi"

  # Pod-level maximum
  - type: Pod
    max:
      cpu: "8"
      memory: "16Gi"

  # PVC storage limits
  - type: PersistentVolumeClaim
    max:
      storage: "100Gi"
    min:
      storage: "1Gi"

3. PriorityClass — controls eviction order during resource pressure:

# High priority — for critical system workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Critical production services"

---
# Low priority — for batch jobs
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 100
description: "Batch jobs and non-critical workloads"

---
# Use in Pod
spec:
  priorityClassName: high-priority    # This pod won't be evicted first
  containers:
  - name: critical-app
    image: my-app:v1

Eviction order during node pressure:

BestEffort (no requests/limits) → evicted FIRST
Burstable (requests < limits)   → evicted SECOND
Guaranteed (requests = limits)  → evicted LAST

GitOps is a deployment methodology where Git is the single source of truth for cluster state. ArgoCD continuously syncs the cluster to match what’s in Git.

GitOps principles:

  1. Entire system described declaratively in Git
  2. Desired state versioned in Git
  3. Approved changes automatically applied to the cluster
  4. Software agents ensure correctness and alert on divergence

Install ArgoCD:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Get initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

# Port-forward to access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

Create an ArgoCD Application:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-app-production
  namespace: argocd
spec:
  project: default

  # Source — where your manifests live in Git
  source:
    repoURL: https://github.com/myorg/k8s-configs
    targetRevision: main
    path: apps/web-app/production

  # Destination — where to deploy in the cluster
  destination:
    server: https://kubernetes.default.svc
    namespace: production

  # Sync policy — auto-sync when Git changes
  syncPolicy:
    automated:
      prune: true         # Delete resources removed from Git
      selfHeal: true      # Revert manual changes to cluster
    syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    retry:
      limit: 5
      backoff:
        duration: 5s
        maxDuration: 3m
        factor: 2

GitOps workflow:

# Developer makes a change
git checkout -b feature/update-image
# Edit k8s/deployment.yaml — change image tag
git commit -m "deploy: bump web-app to v2.5"
git push origin feature/update-image

# Create PR → review → merge to main
# ArgoCD detects the change within 3 minutes
# ArgoCD applies the change to cluster automatically

# Check sync status
argocd app get web-app-production
argocd app sync web-app-production    # Manual sync if needed
argocd app history web-app-production # Deployment history

NotReady means the control plane cannot communicate with the node or the node’s conditions are failing.

Immediate diagnosis:

# Step 1 — check node status and conditions
kubectl get nodes
kubectl describe node <node-name>

# Look for conditions:
# Ready = False/Unknown
# MemoryPressure = True
# DiskPressure = True
# PIDPressure = True
# NetworkUnavailable = True

Step 2 — SSH into the node and check:

# Check kubelet status (most common cause)
sudo systemctl status kubelet
sudo journalctl -u kubelet -f --no-pager | tail -50

# Common kubelet errors:
# "failed to get node info" → network issue
# "certificate expired" → renew kubelet certificates
# "PLEG is not healthy" → pod lifecycle event generator issues (often disk pressure)

# Check node resources
df -h              # Disk usage (DiskPressure if >85%)
free -m            # Memory (MemoryPressure)
top                # CPU and process check
ps aux | wc -l     # PID count (PIDPressure if >1000)

Fix common causes:

# Fix 1: kubelet not running
sudo systemctl restart kubelet

# Fix 2: Disk pressure — clean up
docker system prune -af         # Clean Docker images/containers
crictl rmi --prune              # Clean containerd images
sudo journalctl --vacuum-size=500M    # Clean journal logs

# Fix 3: Certificate expired
sudo kubeadm alpha certs renew all
sudo systemctl restart kubelet

# Fix 4: Network plugin not running
kubectl get pods -n kube-system | grep -E "calico|flannel|cilium"
kubectl delete pod -n kube-system <broken-cni-pod>   # Restart CNI pod

# Fix 5: Node has too many pods — eviction happening
kubectl describe node <node> | grep -i "eviction\|pressure"

Cordon and drain a problematic node:

# Prevent new pods from scheduling on this node
kubectl cordon <node-name>

# Move existing pods to other nodes
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force

# After fixing the node, uncordon it
kubectl uncordon <node-name>

A production-grade Kubernetes architecture for high-traffic needs to address availability, security, scalability, and observability.

Cluster architecture:

                    ┌─────────────────────────────────┐
                    │   CONTROL PLANE (HA)             │
                    │   3x master nodes (multi-AZ)     │
                    │   etcd cluster (separate nodes)  │
                    └──────────────┬──────────────────┘
                                   │
         ┌─────────────────────────┼─────────────────────────┐
         ▼                         ▼                          ▼
  ┌─────────────┐          ┌─────────────┐           ┌─────────────┐
  │ AZ-1 Nodes  │          │ AZ-2 Nodes  │           │ AZ-3 Nodes  │
  │ App workers │          │ App workers │           │ App workers │
  │ GPU nodes   │          │ GPU nodes   │           │ Spot nodes  │
  └─────────────┘          └─────────────┘           └─────────────┘

Node pool strategy:

# System node pool — control plane components
nodePool: system
  instanceType: m5.xlarge
  count: 3
  taints: [CriticalAddonsOnly=true:NoSchedule]

# Application node pool — production workloads (on-demand)
nodePool: app-ondemand
  instanceType: m5.2xlarge
  minCount: 6
  maxCount: 50
  availabilityZones: [us-east-1a, us-east-1b, us-east-1c]

# Spot node pool — batch/non-critical workloads (80% cheaper)
nodePool: app-spot
  instanceTypes: [m5.2xlarge, m5.4xlarge, m5a.2xlarge]
  spot: true
  minCount: 0
  maxCount: 100

Production deployment configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
spec:
  replicas: 9              # 3 per AZ
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 3
      maxUnavailable: 0
  template:
    spec:
      # Spread across zones and nodes
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: web-app
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: web-app
      # Don't schedule on spot nodes (critical app)
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.kubernetes.io/lifecycle
                operator: NotIn
                values: [spot]
      containers:
      - name: web-app
        image: web-app:v3.0
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "1"
            memory: "1Gi"
        readinessProbe:
          httpGet: {path: /ready, port: 8080}
          periodSeconds: 5
        livenessProbe:
          httpGet: {path: /health, port: 8080}
          periodSeconds: 15
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 15"]
      terminationGracePeriodSeconds: 60

Observability stack:

# Metrics — Prometheus + Grafana
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack

# Logs — EFK or Loki stack
helm install loki grafana/loki-stack --set grafana.enabled=false

# Tracing — Jaeger or Tempo
helm install jaeger jaegertracing/jaeger

# Alerts — configure PagerDuty/Slack in Alertmanager

Key SLOs to monitor:

# Error rate < 0.1%
# P99 latency < 200ms
# Availability > 99.9%
# Pod restart rate < 1/hour
# Node CPU < 70%
# Node Memory < 80%

Found an error or want to suggest a topic?

Help us improve! Submit feedback, report mistakes, or request new tutorials via our Google Form.

Open Google Form