Lemonade is a local AI serving platform developed by AMD and its open-source community that brings multi-modal inference to developer workstations and consumer PCs. Unlike generic local LLM runners, Lemonade is built to take advantage of AMD's hardware stack where available. Current public copy emphasizes Ryzen AI, Radeon, and Strix Halo PCs, with CPU, GPU, and NPU backends selected for the workload. The practical value is hardware-aware local AI rather than a one-size-fits-all runtime.
The platform ships as a lightweight C++ binary with installers for Windows, Linux, and macOS, plus Docker images for containerized workflows. At its core sits a local server with OpenAI, Anthropic, and Ollama-compatible API surfaces, so many applications can switch to Lemonade by changing endpoint configuration. Beyond chat completion, Lemonade supports image generation via Stable Diffusion, speech-to-text via Whisper, text-to-speech via Kokoro, and embedding and reranking endpoints. Experimental cloud offload can route selected work to compatible providers when local execution is not enough.
Lemonade bundles a desktop application with a visual model manager for browsing, downloading, and organizing models from Hugging Face. The built-in chat, image generation, and speech interfaces let developers prototype and test locally before wiring up external applications. For CLI users the lemonade command provides model benchmarking, accuracy testing, and memory profiling. The project is Apache 2.0 licensed and actively maintained by AMD engineers and community contributors.
