Kubernetes in Production: The Best Practices That Actually Matter

The Gap Between Tutorials and Production

Deploying an Nginx pod to a local Kubernetes cluster takes five minutes. Running a production Kubernetes platform that handles real traffic, real failures, and real security threats is an entirely different discipline.

Kubernetes is the number one searched DevOps topic on Pluralsight, with over 4 million tutorial views on YouTube. But here's the problem: most content stops at kubectl apply and never addresses what happens when your cluster is serving 10,000 requests per second at 3 AM and a node goes down.

This guide covers the production patterns that separate demo clusters from reliable infrastructure. Every recommendation here comes from real operational experience — the kind of knowledge that takes years of on-call rotations to accumulate.

Resource Management: The First Thing That Breaks

Resource misconfiguration is the number one cause of Kubernetes production incidents. Either pods don't get enough resources and degrade under load, or they get too much and starve other workloads.

Always Set Resource Requests

Every container in every pod should have resource requests defined. Requests tell the scheduler how much CPU and memory the container needs, and the scheduler uses this to place pods on nodes with sufficient capacity.

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    memory: "512Mi"

Set memory limits. Be careful with CPU limits. Memory limits prevent a single misbehaving container from OOM-killing everything on the node. CPU limits are more controversial — they can cause throttling even when the node has spare CPU capacity. Many production teams set CPU requests but omit CPU limits, relying on the scheduler for fair distribution.

Use LimitRanges and ResourceQuotas

In multi-tenant clusters, enforce guardrails at the namespace level:

LimitRanges set default requests/limits for containers that don't specify them and cap maximum resource claims per container.
ResourceQuotas cap total resource consumption per namespace so one team can't monopolize the cluster.

The Pod Without Requests

A pod without resource requests is treated as BestEffort QoS class. It's the first to be evicted when a node runs low on resources. In production, this means your most important workloads can be killed because someone forgot to add two lines of YAML. Use LimitRanges to set defaults and never let a pod run without requests.

Horizontal Pod Autoscaling

Static replica counts don't survive traffic spikes. Configure HPA for any workload with variable demand:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Set minReplicas to at least 3 for high-availability workloads. One pod handles traffic, one is ready for failover, and the third covers rolling updates. Scale based on CPU or custom metrics depending on your workload.

Security Hardening

A default Kubernetes cluster is not secure. The defaults prioritize ease of use over security. Production clusters need deliberate hardening.

Pod Security Standards

Enforce Pod Security Standards at the namespace level to prevent pods from running with dangerous privileges:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

The restricted profile prevents running as root, disallows privilege escalation, requires a read-only root filesystem, and blocks host networking and privileged containers. Start with baseline if restricted is too disruptive for your existing workloads.

Network Policies

By default, every pod can talk to every other pod. In production, implement deny-by-default network policies and explicitly allow only the traffic your application needs:

# Default deny all ingress in the namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress

Then add specific policies to allow legitimate traffic paths. This limits the blast radius of a compromised pod.

RBAC Best Practices

Never use cluster-admin for workloads. Create namespace-scoped Roles with the minimum necessary permissions.
Use ServiceAccounts per workload, not the default ServiceAccount. Disable automounting of ServiceAccount tokens for pods that don't need API access.
Audit RBAC regularly. Use kubectl auth can-i --list --as=system:serviceaccount:ns:sa to verify actual permissions.

✓

Image Security

Only pull images from trusted registries. Use image digests (image: nginx@sha256:...) instead of mutable tags in production. Run an admission controller like Kyverno or OPA Gatekeeper to enforce image policies cluster-wide. A compromised image tag is one of the easiest attack vectors in Kubernetes.

Observability: You Can't Fix What You Can't See

Production Kubernetes requires three pillars of observability: metrics, logs, and traces.

Metrics with Prometheus

Prometheus is the de facto standard for Kubernetes metrics. Deploy it with the kube-prometheus-stack Helm chart, which includes Prometheus, Grafana, and pre-built dashboards for cluster and workload monitoring.

Key metrics to alert on:

Node: CPU utilization > 80%, memory utilization > 85%, disk pressure, not-ready status
Pod: Restart count increasing, OOMKilled events, pending pods > 5 minutes
Application: Request latency p99, error rate, request rate (the RED method)
Cluster: API server latency, etcd leader changes, scheduler failures

For a complete walkthrough of setting up Prometheus and Grafana on Kubernetes, see our Kubernetes monitoring guide.

Structured Logging

Ensure all applications log in JSON format. Deploy a log aggregation stack (Loki, EFK, or a managed service) to centralize logs across all pods. Key practices:

Include correlation IDs in every log line for distributed tracing.
Set log levels appropriately — INFO in production, DEBUG only when troubleshooting.
Don't log to files inside containers. Write to stdout/stderr and let the Kubernetes logging pipeline handle collection.

Distributed Tracing

For microservice architectures, deploy OpenTelemetry collectors to capture distributed traces. Traces show you the full request path across services and pinpoint exactly where latency or failures originate.

GitOps: Declarative Cluster Management

Managing production clusters with kubectl apply from a laptop doesn't scale. GitOps uses Git as the single source of truth for cluster state.

How GitOps Works

All Kubernetes manifests live in a Git repository.
A GitOps operator (Argo CD or Flux) watches the repository.
When manifests change in Git, the operator applies the changes to the cluster.
The operator continuously reconciles cluster state with the Git repository.

Why GitOps Matters for Production

Audit trail: Every change is a Git commit with an author, timestamp, and description.
Rollback: Reverting a bad deployment is git revert. No need to remember what the previous state looked like.
Consistency: No manual kubectl commands that create drift between what's in Git and what's running.
Access control: Developers submit pull requests. Only the GitOps operator has cluster write access.

Argo CD vs. Flux

Both are mature, CNCF-graduated projects. Argo CD has a web UI for visualization and is slightly easier to get started with. Flux is more modular and integrates tightly with Helm and Kustomize. Pick either — the GitOps pattern matters more than the tool choice.

High Availability and Disaster Recovery

Control Plane HA

Run at least 3 control plane nodes across different availability zones.
Use an external etcd cluster or ensure etcd runs on all control plane nodes with proper backup schedules.
Place an internal load balancer in front of the API servers.

Application-Level HA

Run at least 3 replicas for critical workloads.
Use pod anti-affinity to spread replicas across nodes and AZs.
Configure Pod Disruption Budgets to prevent too many pods from being evicted simultaneously during node maintenance.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

etcd Backup Strategy

Back up etcd automatically on a schedule. Store backups outside the cluster (S3, GCS, Azure Blob). Test restoration quarterly. An etcd failure without a backup means rebuilding the entire cluster from scratch.

Cluster Cost Optimization

Kubernetes clusters waste money in predictable, fixable ways. Cloud bills for Kubernetes workloads often run 40–60% higher than necessary.

Right-Size Your Nodes

The most common waste: running large general-purpose nodes when your workloads are predominantly small services. Profile your workload CPU and memory usage over 30 days, then pick node types that match the actual utilization pattern.

CPU-bound workloads: use compute-optimized instances (c-family on AWS, compute-optimized on GCP)
Memory-bound workloads (caches, ML inference): use memory-optimized instances (r-family on AWS)
Mixed: general purpose (m/t-family) with spot/preemptible nodes for stateless workloads

Use Spot/Preemptible Instances for Stateless Workloads

Spot instances cost 60–90% less than on-demand. Any stateless workload that tolerates interruption (batch jobs, CI runners, replica sets with more than 2 replicas) is a spot candidate.

Configure node pools with mixed instance strategies: on-demand nodes for system-critical workloads, spot nodes for application replicas. The Kubernetes cluster autoscaler handles the scheduling transparently.

Vertical Pod Autoscaler

VPA analyzes historical resource usage and automatically recommends or applies right-sized resource requests. Without VPA, teams either over-provision (wasting money) or under-provision (causing OOMKills).

Deploy VPA in recommendation mode first:

kubectl describe vpa api-vpa -n production

Review the recommendations for 2 weeks before enabling auto-update. VPA and HPA can conflict — use VPA for right-sizing requests and HPA for scaling replicas.

Cluster Autoscaler vs. Karpenter

The Cluster Autoscaler adds nodes from pre-configured node groups. Karpenter (AWS-native, increasingly adopted elsewhere) provisions nodes from scratch based on pending pod requirements — picking the optimal instance type and size per workload.

Karpenter typically achieves 15–30% better bin-packing than the Cluster Autoscaler, which directly translates to lower node counts and lower costs. For AWS clusters, Karpenter is the recommended choice for new deployments in 2026.

Upgrade Strategy

Kubernetes releases a new minor version every four months, and each version is supported for 14 months. Staying current is non-negotiable — running unsupported versions means no security patches.

Upgrade Safely

Read the changelog. Every release has deprecations and breaking changes.
Upgrade non-production first. Validate your workloads work on the new version before touching production.
Upgrade one minor version at a time. Never skip versions (e.g., 1.28 → 1.30).
Use node pool rolling updates. Drain and replace nodes one at a time to maintain availability.
Run admission webhook dry-runs to catch manifest incompatibilities before applying.

Production Readiness Checklist

Before going to production, verify every item:

Where to Go From Here

The patterns in this guide are the foundation. Once they're in place, the next layer is validating them continuously. The CNCF's Cloud Native Landscape tracks the full ecosystem of tools across each of these categories — useful for evaluating alternatives to the tools covered here as your requirements grow.

For teams running Kubernetes alongside CI/CD pipelines, the patterns connect directly — GitOps operators like ArgoCD or Flux are pull-based CD systems that sit on top of the Git workflow your pipelines already produce. Our guide to CI/CD with GitHub Actions and Docker covers the pipeline side of that equation.

If you're building toward the CKA certification to validate these production skills formally, our CKA certification study guide covers the exam domains and the hands-on skills that map directly to what's covered in this post.

Want to practice this hands-on?

CloudaQube generates complete labs from a simple description. Try it free.

Get Started Free