43 tools tagged
Showing 24 of 43 tools
Autonomous Kubernetes and GPU infrastructure optimization
ScaleOps provides autonomous real-time management of Kubernetes and GPU infrastructure, reducing cloud costs by up to 80 percent without manual configuration. Backed by 130 million in Series C funding at an 800 million dollar valuation, it serves enterprises including Adobe, Wiz, DocuSign, and Salesforce. The platform continuously rightsizes pods, optimizes replicas, manages nodes, and allocates GPUs based on live workload demand rather than static configurations.
Kubernetes-native framework for DevOps AI agents
kagent is a Kubernetes-native AI agent framework developed at Solo.io and accepted into the CNCF sandbox. It provides a structured environment for running DevOps-focused agents directly within Kubernetes clusters, with a dedicated kmcp toolkit for cloud-native operations. Unlike general-purpose agent frameworks, kagent targets platform engineers and SREs who need AI assistance with cluster management, troubleshooting, and infrastructure automation workflows.
Open-source control plane for AI workloads across multi-cloud GPU infrastructure
dstack is an open-source platform that orchestrates AI training and inference workloads across heterogeneous GPU infrastructure spanning multiple clouds, Kubernetes clusters, and bare-metal servers. It abstracts away cloud-specific APIs so teams define GPU requirements declaratively and dstack automatically provisions the cheapest available resources from AWS, GCP, Azure, Lambda, or on-premises hardware.
AI-native API gateway by Alibaba with MCP server hosting and LLM routing
Higress is an open-source AI-native API gateway developed by Alibaba that combines traditional API management with LLM-specific capabilities like token-based rate limiting, model routing, prompt caching, and MCP server hosting. Built on Envoy and Istio, it provides enterprise-grade traffic management while natively understanding AI workload patterns including streaming responses, long-lived connections, and multi-model fallback chains.
Trusted runtime environments for AI agents in production infrastructure
Teleport Beams provides cryptographically verified, policy-gated access for AI agents to interact with production infrastructure including servers, Kubernetes clusters, and databases. Launched at KubeCon EU 2026, Beams extends Teleport's zero-trust access platform with agent-specific runtime controls, audit trails, and policy enforcement to ensure AI agents operate within defined boundaries when deployed in production environments.
High-performance S3-compatible object storage built in Rust
RustFS is an open-source distributed object storage system built entirely in Rust, offering 2.3x faster performance than MinIO for small object payloads. It provides full S3 API compatibility, enabling seamless migration from MinIO, Ceph, and AWS S3 with existing SDKs and CLI tools. Released under Apache 2.0 license, it avoids MinIO's restrictive AGPL terms. Features include distributed architecture, erasure coding, WORM compliance, encryption via RustyVault, and a web management console.
Free and open-source Kubernetes IDE for managing clusters visually
Freelens is a free open-source Kubernetes IDE that provides a visual desktop interface for managing clusters, workloads, and configurations. Forked from the original Lens project after its licensing change, Freelens offers the same powerful cluster management experience with real-time monitoring, log viewing, and resource editing under the MIT license.
Kubernetes-native distributed LLM inference stack
llm-d is an open-source Kubernetes-native stack for distributed LLM inference with cache-aware routing and disaggregated serving. It separates prefill and decode stages across different GPU pools for optimal resource utilization, routes requests to nodes with warm KV caches, and integrates with vLLM as the serving engine. Apache-2.0 licensed with 2,900+ GitHub stars.
Kubernetes-native cloud infrastructure control plane
Crossplane is a CNCF Graduated open-source project that extends Kubernetes to manage cloud infrastructure through declarative APIs. Platform teams compose custom infrastructure abstractions as Compositions and publish them as self-service APIs. It provisions resources across AWS, Azure, GCP, and 200+ providers directly from kubectl. Used by 450+ organizations with 11,000+ GitHub stars.
Cloud native runtime security for Kubernetes
Falco is a CNCF graduated open-source runtime security tool that detects unexpected behavior and threats across containers, Kubernetes, and cloud workloads in real time. Originally created by Sysdig, Falco monitors Linux kernel syscalls using eBPF and applies customizable detection rules to alert on malicious activity like container escapes, cryptojacking, unauthorized file access, and anomalous network connections. It supports 50+ alert output channels including SIEM integration.
AI group-buying for AWS cost reduction
Pump is a YC-backed platform that uses AI and group-buying power to automate AWS cost reduction, claiming up to 60% savings on compute through collective purchasing of Reserved Instances and Savings Plans. By pooling demand across multiple customers, Pump negotiates volume discounts that individual organizations cannot access, providing enterprise-level pricing to startups and mid-market companies.
AI-managed spot instances for production workloads
Xosphere automates the use of AWS Spot Instances for production workloads using ML to select instances based on availability and cost-performance balance. It installs in 10 minutes via CloudFormation and provides high-availability reliability with cheap spot pricing, automatically managing instance selection, interruption handling, and failover for teams wanting significant compute cost savings.
Kubernetes dashboard with 360-degree visibility
Devtron is an open-source Kubernetes management dashboard that provides a 360-degree view of cluster resources with fine-grained RBAC for multi-cluster environments. Its upcoming agentic AI feature automates debugging and cluster optimization, while the current platform offers centralized visibility, GitOps-based deployment workflows, and security policy enforcement across distributed Kubernetes infrastructure.
Open-source MLOps platform for Kubernetes
Kubeflow is a CNCF open-source MLOps platform with 14,000+ GitHub stars for deploying and managing machine learning workflows on Kubernetes. It provides notebooks for experimentation, scalable training pipelines with distributed computing support, model serving with autoscaling, and comprehensive pipeline orchestration for teams running AI/ML workloads in cloud-native environments.
AI copilot for the Lens Kubernetes IDE
Lens Prism is an AI copilot integrated into the Lens Kubernetes IDE (the world's most popular K8s desktop client) that troubleshoots clusters, explains errors in plain English, and helps manage multi-cluster environments visually. It simplifies Kubernetes complexity for developers who prefer visual tools over CLI, providing AI-powered debugging and cluster management within a familiar desktop interface.
Agentic DevOps automation via ChatOps
Kubiya is an agentic automation platform for DevOps and platform teams that uses specialized agents with connectors for Kubernetes, AWS, GitHub, Jira, and Terraform to automate operational tasks through Slack or web portals. It provides Terraform module support for infrastructure-as-code configuration and manages agent behaviors with policy-based controls for enterprise-grade governance.
Unified multi-cloud cost management with MegaBill
Finout unifies cloud costs from AWS, Azure, GCP, Snowflake, and other providers into a single MegaBill dashboard with AI-based anomaly detection for flagging unusual spend patterns. Priced at approximately 1% of cloud spend, it solves the multi-tool cost fragmentation problem for organizations managing complex infrastructure budgets across multiple cloud and SaaS providers.
Cloud cost intelligence mapped to business units
CloudZero is a cost intelligence platform that maps cloud spend to engineering teams, product lines, and business units using AI-driven anomaly detection. It provides engineering-friendly insights that help developers understand the cost impact of their code changes, with per-commit cost tracking through CI/CD integration and flexible multi-cloud support across AWS, GCP, and Azure.
Autonomous cloud discount management with ML
ProsperOps uses machine learning to continuously optimize cloud commitment coverage including Savings Plans and Reserved Instances, achieving Effective Savings Rates of 40% or more on AWS, GCP, and Azure. It provides autonomous discount management with a performance-based pricing model where ProsperOps shares a percentage of the savings generated, aligning costs with actual value delivered.
Kubernetes troubleshooting with event context
Komodor is a Kubernetes troubleshooting platform that extracts event and change context from clusters, correlating deployments, config changes, and infrastructure events to quickly identify the root cause of pod failures. Its Slack integration delivers incident context directly into team channels, helping SRE and platform teams reduce mean time to resolution by connecting the dots between what changed and what broke.
Cloud-native observability with AI correlation
Middleware is a cloud-native observability platform that provides real-time insights into Kubernetes environments using AI to correlate metrics, logs, and traces for faster troubleshooting. It simplifies the debugging of complex microservice clusters by automatically connecting related signals across distributed systems, with a freemium model accessible to teams of all sizes.
Lightweight virtual Kubernetes clusters
vCluster creates lightweight, isolated virtual Kubernetes clusters inside physical host clusters, enabling teams to run sandboxed environments for development, testing, and AI agent experimentation without provisioning separate infrastructure. Each virtual cluster has its own API server, control plane, and resource isolation while sharing the underlying compute, reducing infrastructure costs by up to 90% compared to full cluster provisioning.
Open-source Kubernetes cost monitoring (CNCF)
OpenCost is a CNCF-certified open-source tool for real-time Kubernetes cost monitoring that maps cloud spend directly to namespaces, deployments, pods, and labels. It provides granular cost allocation across teams and projects without vendor lock-in, supporting AWS, GCP, Azure, and on-premises clusters as the industry standard for open-source FinOps visibility in cloud-native environments.
Autonomous Kubernetes management and predictive scaling
Sedai provides an autonomous control layer for Kubernetes that right-sizes workloads, remediates anomalies, and performs predictive autoscaling ahead of traffic demand. Managing over $3B in annual cloud spend for enterprises including Palo Alto Networks, it builds behavioral models to scale pods before demand arrives rather than reacting after performance degrades.