Lemonade is a local AI serving platform developed by AMD and its open-source community that brings multi-modal inference to developer workstations and consumer PCs. Unlike generic local LLM runners, Lemonade is built from the ground up to exploit AMD's full hardware stack. Ryzen AI NPUs handle efficient prompt processing, Radeon GPUs drive fast token generation, and CPUs serve as a universal fallback. The hybrid NPU-plus-GPU execution mode on Ryzen AI 300-series machines delivers throughput and power efficiency that pure-GPU solutions cannot match.
The platform ships as a lightweight C++ binary with one-click installers for Windows, Linux, and macOS, plus Docker images for containerized workflows. At its core sits an OpenAI-compatible REST API, which means any application already targeting OpenAI endpoints can switch to Lemonade by changing a single URL. Validated integrations include VS Code Copilot, Open WebUI, Continue, n8n, and Dify. Beyond chat completion, Lemonade supports image generation via Stable Diffusion, speech-to-text via Whisper, text-to-speech via Kokoro, and embedding and reranking endpoints.
Lemonade bundles a desktop application with a visual model manager for browsing, downloading, and organizing models from Hugging Face. The built-in chat, image generation, and speech interfaces let developers prototype and test locally before wiring up external applications. For CLI users the lemonade command provides model benchmarking, accuracy testing, and memory profiling. The project is Apache 2.0 licensed and actively maintained by AMD engineers and community contributors.