49 tools tagged
Showing 24 of 49 tools
Cloud-native distributed vector search engine built for Kubernetes with automatic indexing and horizontal scaling.
Vald is a highly scalable distributed approximate nearest neighbor (ANN) vector search engine designed for cloud-native, Kubernetes-based architectures. Maintained by LY Corporation and listed in the CNCF Landscape, it uses the NGT algorithm (developed at Yahoo Japan), supports automatic incremental index backup, and handles billion-scale datasets across loosely coupled microservice components that scale horizontally via Helm.
AI Lakehouse with Feature Store for real-time ML
Hopsworks is a data-intensive AI platform combining a Python-centric Feature Store with MLOps capabilities for production ML systems. Provides sub-millisecond feature retrieval powered by RonDB, dual offline and online storage for batch and real-time inference, experiment tracking, model registry, and deployment pipelines. Available as managed cloud on AWS, Azure, and GCP, self-hosted on Kubernetes, or serverless platform.
Kernel-space host intrusion detection system
Elkeid is ByteDance's open-source HIDS for hosts, containers, Kubernetes, and serverless workloads. Its kernel-level data collection via Kprobe hooks captures process lineage, privilege escalation attempts, file access patterns, and network connections with minimal overhead. Includes an Agent for telemetry, Detector for rule evaluation, Controller for policy management, and a Dashboard for alerts and investigation.
Container-based CI/CD automation system
Concourse is an open-source CI/CD system built on composable primitives: resources for external artifacts, tasks for containerized work units, and jobs for orchestration. All pipelines are declarative YAML with version control, every task runs in an isolated container, and stateless workers enable horizontal scaling. Deployable via BOSH, Helm, Docker Compose, or standalone binary across any infrastructure.
High-performance S3-compatible object storage
MinIO is a high-performance, S3-compatible object storage server designed for AI, machine learning, and data-intensive workloads. Written in Go, it delivers industry-leading throughput for both read and write operations while maintaining full compatibility with the Amazon S3 API. MinIO includes an embedded web console for bucket management, a command-line client, and supports erasure coding, bitrot protection, and encryption at rest for enterprise-grade data durability.
Cloud-native POSIX filesystem on object storage
JuiceFS is a high-performance distributed POSIX filesystem built on object storage like S3 and metadata engines like Redis or MySQL. It enables seamless data sharing across thousands of clients with low latency and elastic throughput. JuiceFS ships with a Kubernetes CSI driver, Hadoop SDK compatibility, and FUSE mount support for AI training, big data analytics, and shared storage workloads. Apache 2.0 licensed with 13K+ GitHub stars.
GitHub's Kubernetes controller for autoscaling GitHub Actions runners
actions-runner-controller (ARC) is GitHub's official Kubernetes controller for managing self-hosted GitHub Actions runners. It automatically scales runner pods up and down based on workflow demand, provisioning runners when jobs queue and terminating them when complete. Supports runner groups, custom runner images, and organization-level runner management. Over 6,100 GitHub stars.
Lightweight Kubernetes distribution for edge, IoT, and development
k3s is a CNCF Sandbox lightweight Kubernetes distribution packaged as a single binary under 100MB. Created by Rancher Labs and now maintained by SUSE, it strips non-essential components and bundles containerd, Flannel, CoreDNS, and Traefik into a minimal but fully conformant K8s distribution. Ideal for edge computing, IoT, ARM devices, and local development environments.
Leading open-source service mesh for Kubernetes microservices
Istio is the most widely adopted open-source service mesh for Kubernetes, providing traffic management, security, and observability for microservice architectures. It uses Envoy proxy sidecars to intercept and manage service-to-service communication with mutual TLS, fine-grained traffic routing, circuit breaking, and distributed tracing. CNCF Graduated project used in production by Google, IBM, and Salesforce.
Heroku-like PaaS built on Kubernetes with YC backing
Porter is a YC-backed platform-as-a-service that provides a Heroku-like deployment experience on top of Kubernetes. It abstracts away cluster management while giving teams full access to underlying infrastructure when needed. Supports deploying from Git repos, Docker images, or Helm charts with automatic HTTPS, scaling, and preview environments. Runs on AWS, GCP, or Azure.
CNCF Sandbox Kubernetes alert enrichment and automation platform
Robusta is a CNCF Sandbox project that enriches Kubernetes alerts with diagnostic context and automates remediation workflows. It intercepts Prometheus alerts, attaches relevant logs, pod status, resource metrics, and troubleshooting suggestions before delivering them to Slack, Teams, or PagerDuty. Supports custom playbooks for automated incident response and AI-powered root cause analysis.
Kubernetes ChatOps bot for Slack, Teams, and Discord
Botkube is a Kubernetes ChatOps platform that brings cluster management into Slack, Microsoft Teams, and Discord. It provides real-time alerts for cluster events, enables kubectl command execution from chat, and supports automated workflows triggered by Kubernetes resource changes. Features plugin architecture for extensibility and RBAC-based access control for team collaboration.
AI-powered SRE agent for Kubernetes troubleshooting
Metoro is an AI SRE platform for Kubernetes that combines observability with autonomous troubleshooting. Its Guardian agent monitors cluster health, correlates metrics, logs, and traces to identify root causes, and suggests remediation actions. Features an MCP server for integration with AI coding agents and natural language querying of infrastructure state.
Kubernetes cost monitoring and optimization platform
Kubecost provides real-time cost monitoring and optimization for Kubernetes clusters. It allocates infrastructure costs to namespaces, deployments, pods, and labels with granular accuracy. Acquired by IBM, it has become the standard for K8s cost visibility. Features include savings recommendations, budget alerts, cluster right-sizing, and multi-cluster cost aggregation across AWS, GCP, and Azure.
Zero-friction single-binary Kubernetes distribution by Mirantis
k0s is a lightweight, CNCF-certified Kubernetes distribution packaged as a single binary with zero host dependencies. Backed by Mirantis, it simplifies cluster deployment by bundling all required components into one executable that works on any Linux system. Supports x86-64, ARM64, and ARMv7 architectures with automatic upgrades and a built-in control plane load balancer.
eBPF-based networking, security, and observability for Kubernetes
Cilium is a CNCF Graduated project that provides networking, security, and observability for Kubernetes using eBPF technology. It replaces kube-proxy with efficient eBPF-based load balancing, enforces L3-L7 network policies using identity-based security, and includes Hubble for network flow observability and Tetragon for runtime security enforcement. Adopted by Google GKE, AWS EKS Anywhere, and Azure AKS.
Autonomous Kubernetes and GPU infrastructure optimization
ScaleOps provides autonomous real-time management of Kubernetes and GPU infrastructure, reducing cloud costs by up to 80 percent without manual configuration. Backed by 130 million in Series C funding at an 800 million dollar valuation, it serves enterprises including Adobe, Wiz, DocuSign, and Salesforce. The platform continuously rightsizes pods, optimizes replicas, manages nodes, and allocates GPUs based on live workload demand rather than static configurations.
Kubernetes-native framework for DevOps AI agents
kagent is a Kubernetes-native AI agent framework developed at Solo.io and accepted into the CNCF sandbox. It provides a structured environment for running DevOps-focused agents directly within Kubernetes clusters, with a dedicated kmcp toolkit for cloud-native operations. Unlike general-purpose agent frameworks, kagent targets platform engineers and SREs who need AI assistance with cluster management, troubleshooting, and infrastructure automation workflows.
Open-source control plane for AI workloads across multi-cloud GPU infrastructure
dstack is an open-source platform that orchestrates AI training and inference workloads across heterogeneous GPU infrastructure spanning multiple clouds, Kubernetes clusters, and bare-metal servers. It abstracts away cloud-specific APIs so teams define GPU requirements declaratively and dstack automatically provisions the cheapest available resources from AWS, GCP, Azure, Lambda, or on-premises hardware.
AI-native API gateway by Alibaba with MCP server hosting and LLM routing
Higress is an open-source AI-native API gateway developed by Alibaba that combines traditional API management with LLM-specific capabilities like token-based rate limiting, model routing, prompt caching, and MCP server hosting. Built on Envoy and Istio, it provides enterprise-grade traffic management while natively understanding AI workload patterns including streaming responses, long-lived connections, and multi-model fallback chains.
Trusted runtime environments for AI agents in production infrastructure
Teleport Beams provides cryptographically verified, policy-gated access for AI agents to interact with production infrastructure including servers, Kubernetes clusters, and databases. Launched at KubeCon EU 2026, Beams extends Teleport's zero-trust access platform with agent-specific runtime controls, audit trails, and policy enforcement to ensure AI agents operate within defined boundaries when deployed in production environments.
High-performance S3-compatible object storage built in Rust
RustFS is an open-source distributed object storage system built entirely in Rust, offering 2.3x faster performance than MinIO for small object payloads. It provides full S3 API compatibility, enabling seamless migration from MinIO, Ceph, and AWS S3 with existing SDKs and CLI tools. Released under Apache 2.0 license, it avoids MinIO's restrictive AGPL terms. Features include distributed architecture, erasure coding, WORM compliance, encryption via RustyVault, and a web management console.
Free and open-source Kubernetes IDE for managing clusters visually
Freelens is a free open-source Kubernetes IDE that provides a visual desktop interface for managing clusters, workloads, and configurations. Forked from the original Lens project after its licensing change, Freelens offers the same powerful cluster management experience with real-time monitoring, log viewing, and resource editing under the MIT license.
Kubernetes-native distributed LLM inference stack
llm-d is an open-source Kubernetes-native stack for distributed LLM inference with cache-aware routing and disaggregated serving. It separates prefill and decode stages across different GPU pools for optimal resource utilization, routes requests to nodes with warm KV caches, and integrates with vLLM as the serving engine. Apache-2.0 licensed with 2,900+ GitHub stars.