LiteRT-LM is Google's production-ready, open-source inference framework for deploying Large Language Models on edge devices. It is the same engine that powers Gemini Nano across Google products including Chrome, Chromebook Plus, Pixel Watch, and Android system features. The framework supports cross-platform deployment on Android, iOS, Web, Desktop, and IoT devices like Raspberry Pi, with hardware acceleration through GPU and NPU accelerators for maximum on-device performance.
Beyond basic text generation, LiteRT-LM provides built-in support for multi-modal inputs including vision and audio, enabling developers to build sophisticated on-device AI applications. The framework also includes function calling and tool use APIs, allowing agentic workflows where on-device models can invoke external tools and services without cloud roundtrips. A streaming API delivers token-by-token output for responsive user experiences, while the underlying C++ interface gives advanced developers full control over custom inference pipelines.
LiteRT-LM works with Gemini Nano models available through Google AI Edge and supports quantized model formats optimized for constrained hardware. The framework handles model loading, memory management, and hardware-specific optimizations automatically, abstracting away the complexity of running LLMs on diverse device configurations. As an open-source project maintained by Google AI Edge, it receives regular updates aligned with Android and Chrome platform releases. For developers building privacy-sensitive, offline-capable, or latency-critical AI features, LiteRT-LM provides a battle-tested path from prototyping to production-scale edge deployment.