K8sGPT was introduced in spring 2023 and accepted into the CNCF Sandbox in December of the same year. Written in Go, it works as either a standalone CLI binary or a Kubernetes operator that runs continuously inside the cluster. The CLI approach is straightforward: run k8sgpt analyze --explain and the tool scans the cluster, collects diagnostic data from resource statuses and events, sends anonymized context to the configured AI backend, and returns explanations with specific kubectl commands to resolve each issue. Without the --explain flag, K8sGPT still provides structured diagnostic output using its internal analyzers — essentially codified SRE playbooks — without making any AI calls at all.
The built-in analyzers cover core Kubernetes resources: Pods, Deployments, ReplicaSets, StatefulSets, Services, Ingress, PersistentVolumeClaims, CronJobs, and Nodes. Beyond these defaults, K8sGPT integrates with Trivy for security vulnerability scanning across container images in the cluster, and with AWS Controllers for Kubernetes to analyze AWS resources managed via CRDs. The AI backend options are broad — OpenAI, Azure OpenAI, Google Gemini and Vertex AI, Amazon Bedrock and SageMaker, Cohere, Hugging Face, IBM watsonx.ai, and local models through Ollama or LocalAI for air-gapped environments where no data can leave the network.
When deployed as a Kubernetes operator, K8sGPT runs in the background and writes analysis results to custom Result resources, enabling integration with existing monitoring stacks like Prometheus and Alertmanager for automated alerting on detected issues. The operator mode suits production environments that need continuous cluster health monitoring rather than ad-hoc troubleshooting. Installation is available through Homebrew, apt/apk packages, Windows binaries, and Helm charts for the operator. The latest release is v0.4.31, and the sister project Sympozium extends the concept to managing AI agents within Kubernetes clusters.