CUA addresses what may be the most critical infrastructure gap in the AI agent ecosystem: giving agents the ability to interact with operating systems and desktop applications without compromising host security. The platform creates ephemeral, sandboxed virtual machines where AI agents can take screenshots, control mouse and keyboard, execute shell commands, and manage files — all within an isolated environment that protects the host system from any agent misbehavior.
The architecture is built around three complementary components. CuaBot is a multi-agent CLI that lets developers run any agent — including Claude Code, OpenClaw, or custom implementations — inside a sandbox with H.265 video streaming and shared clipboard support. The Cua Agent SDK provides a Python framework for building observe-reason-act loops with budget limits and trajectory recording. Cua-Bench offers standardized benchmarks from OSWorld, ScreenSpot, and Windows Arena for evaluating agent performance.
Cross-platform support is genuinely comprehensive. Docker containers handle lightweight Linux environments. QEMU provides cross-platform virtualization for Windows and Linux. Apple's Virtualization.Framework delivers near-native macOS performance on Apple Silicon at 97% of native CPU speed. Android support extends to mobile testing scenarios. The unified Computer SDK abstracts OS-specific details so agent logic written once runs across all platforms.
Model flexibility through LiteLLM integration means developers are not locked into any single AI provider. CUA works with Anthropic Claude, OpenAI GPT, Google Gemini, Microsoft models, Alibaba Qwen, and local models through Ollama and LM Studio. This provider-agnostic approach future-proofs agent development against the rapidly shifting LLM landscape.
The benchmarking infrastructure deserves special attention. Cua-Bench lets developers run thousands of agent trajectories in parallel across hundreds of sandboxes, with programmatic rewards, oracle solutions, and a reinforcement learning dataloader. Trajectories can be exported for training, creating a virtuous cycle where agent evaluation directly feeds model improvement. This positions CUA not just as a runtime platform but as a research infrastructure.
MCP server integration transforms CUA sandboxes into tools accessible from Claude Desktop, Cursor, or any MCP-compatible client. An engineer can ask Claude to perform a complex desktop task, and Claude orchestrates a CUA sandbox to execute it — creating a seamless bridge between conversational AI and autonomous desktop automation.
Lume, the macOS VM management component, stands out for Apple Silicon environments. Using Apple's Virtualization.Framework rather than emulation, VMs achieve hardware-accelerated graphics, networking, and file sharing with near-native performance. Sandbox state can be saved and restored with hot-start in under one second, enabling rapid iteration during agent development.
The cloud offering complements self-hosted deployment. Cloud sandboxes support any OS with hot-start capability, and the free tier allows initial experimentation without infrastructure setup. Pro plans start at $10 per month with transparent per-resource billing for CPU, memory, and disk.