Skip to content

Bits of .NET

Daily micro-tips for C#, SQL, performance, and scalable backend engineering.

  • Asp.Net Core
  • C#
  • SQL
  • JavaScript
  • CSS
  • About
  • ErcanOPAK.com
  • No Access
  • Privacy Policy
Kubernetes

Kubernetes Monitoring: The Prometheus + Grafana Stack That Saved Our Production

- 23.02.26 - ErcanOPAK

🚨 The 3 AM Wake-Up Call

Your Kubernetes cluster is down. Users are angry. You have no idea what happened. This is why Fortune 500 companies spend $500K/year on observability.

The Complete Observability Stack

📊 The Three Pillars

  1. Metrics – What’s happening right now (Prometheus)
  2. Logs – What happened in the past (Loki)
  3. Traces – Why requests are slow (Jaeger)

âš¡ Quick Install (Helm)

# Add Prometheus repo
helm repo add prometheus-community \\
  https://prometheus-community.github.io/helm-charts

# Install Prometheus + Grafana stack
helm install monitoring prometheus-community/kube-prometheus-stack \\
  --namespace monitoring --create-namespace

# Get Grafana password
kubectl get secret -n monitoring monitoring-grafana \\
  -o jsonpath="{.data.admin-password}" | base64 --decode

# Port forward to access
kubectl port-forward -n monitoring \\
  svc/monitoring-grafana 3000:80
                

Result: Full monitoring stack in 5 minutes. 50+ pre-built dashboards included.

🎯 Critical Alerts

  • Pod restarts > 5 in 10 min
  • Memory > 90% used
  • CPU throttling detected
  • Disk > 85% full
  • Node not ready > 2 min

📈 Key Metrics

  • Request rate (QPS)
  • Error rate (4xx, 5xx)
  • Latency (p50, p95, p99)
  • Saturation (CPU, RAM)
  • Availability (uptime %)

🔥 Production War Stories

The Memory Leak That Cost $50K

Pod memory slowly climbing. Grafana showed it weeks before the crash. We ignored it. Pod OOMKilled during Black Friday. Lost $50K in sales. Now we have alerts.

The DDoS We Caught in 60 Seconds

Request spike alert fired. Dashboard showed 10,000% traffic increase from one IP range. Blocked at CDN level. Attack neutralized before impacting users. Total downtime: 0 seconds.

Dashboard Purpose When to Check
Cluster Overview Overall health Daily morning check
Node Metrics Hardware utilization Capacity planning
Pod Metrics Application performance During deployments
API Server Kubernetes health When things feel slow

“Before Prometheus: We found out about outages from angry customers. After Prometheus: We fix issues before users notice them.”

— DevOps Lead, SaaS company

Related posts:

Kubernetes: Keeping Critical Pods Safe with Taints and Tolerations

Kubernetes: Liveness vs Readiness Probes - Don't Kill Your Traffic

Kubernetes: Force Delete Stuck Pods in Terminating State Instantly

Post Views: 9

Post navigation

WordPress Caching: The 4-Layer Strategy That Handles 1 Million Daily Visitors
Docker Multi-Stage Builds: From 1.2GB to 50MB – A Production Story

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

June 2026
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  
« May    

Most Viewed Posts

  • Get the User Name and Domain Name from an Email Address in SQL (953)
  • How to add default value for Entity Framework migrations for DateTime and Bool (882)
  • Get the First and Last Word from a String or Sentence in SQL (838)
  • How to select distinct rows in a datatable in C# (808)
  • How to make theater mode the default for Youtube (806)
  • How to enable, disable and check if Service Broker is enabled on a database in SQL Server (580)
  • Add Constraint to SQL Table to ensure email contains @ (580)
  • Average of all values in a column that are not zero in SQL (538)
  • How to use Map Mode for Vertical Scroll Mode in Visual Studio (506)
  • Find numbers with more than two decimal places in SQL (455)

Recent Posts

  • C#: Use String Interpolation Instead of Concatenation
  • C#: Use Tuples to Return Multiple Values from Methods
  • SQL: Use ISNULL and NULLIF for Smart NULL Handling
  • .NET Core: Use Data Annotations for Model Validation
  • Git: Use Git Clean to Remove Untracked Files
  • Ajax: Add Custom Headers to Fetch Requests
  • JavaScript: Use console.table to Display Arrays as Tables
  • HTML: Use Spellcheck Attribute to Enable Browser Spell Check
  • CSS: Use user-select to Prevent Text Selection
  • Windows 11: Use Snipping Tool for Instant Screenshots

Most Viewed Posts

  • Get the User Name and Domain Name from an Email Address in SQL (953)
  • How to add default value for Entity Framework migrations for DateTime and Bool (882)
  • Get the First and Last Word from a String or Sentence in SQL (838)
  • How to select distinct rows in a datatable in C# (808)
  • How to make theater mode the default for Youtube (806)

Recent Posts

  • C#: Use String Interpolation Instead of Concatenation
  • C#: Use Tuples to Return Multiple Values from Methods
  • SQL: Use ISNULL and NULLIF for Smart NULL Handling
  • .NET Core: Use Data Annotations for Model Validation
  • Git: Use Git Clean to Remove Untracked Files

Social

  • ErcanOPAK.com
  • GoodReads
  • LetterBoxD
  • Linkedin
  • The Blog
  • Twitter
© 2026 Bits of .NET | Built with Xblog Plus free WordPress theme by wpthemespace.com