aicoolies logo

LiteRT-LM

Google's production on-device LLM inference framework

Share
open-sourceOpen Source
Visit Website →

LiteRT-LM is Google's official open-source framework for running large language models on-device across Android, iOS, Web, Desktop, and Raspberry Pi. Already deployed in Chrome and Pixel hardware, it provides production-grade on-device LLM inference with 1.4K+ GitHub stars. Apache 2.0 licensed.

LiteRT-LM is Google's production-ready, open-source inference framework for deploying Large Language Models on edge devices. It is the same engine that powers Gemini Nano across Google products including Chrome, Chromebook Plus, Pixel Watch, and Android system features. The framework supports cross-platform deployment on Android, iOS, Web, Desktop, and IoT devices like Raspberry Pi, with hardware acceleration through GPU and NPU accelerators for maximum on-device performance.

Beyond basic text generation, LiteRT-LM provides built-in support for multi-modal inputs including vision and audio, enabling developers to build sophisticated on-device AI applications. The framework also includes function calling and tool use APIs, allowing agentic workflows where on-device models can invoke external tools and services without cloud roundtrips. A streaming API delivers token-by-token output for responsive user experiences, while the underlying C++ interface gives advanced developers full control over custom inference pipelines.

LiteRT-LM works with Gemini Nano models available through Google AI Edge and supports quantized model formats optimized for constrained hardware. The framework handles model loading, memory management, and hardware-specific optimizations automatically, abstracting away the complexity of running LLMs on diverse device configurations. As an open-source project maintained by Google AI Edge, it receives regular updates aligned with Android and Chrome platform releases. For developers building privacy-sensitive, offline-capable, or latency-critical AI features, LiteRT-LM provides a battle-tested path from prototyping to production-scale edge deployment.

Pricing

Free and open-source (Apache 2.0)

Platforms

Android, iOS, Web, Desktop, Raspberry Pi

Categories

Tags

Use Cases

Alternatives

llama.cpp logo

llama.cpp

High-performance local LLM inference in C/C++

llama.cpp is the foundational C/C++ library with 75K+ GitHub stars powering local LLM inference on consumer hardware. Provides optimized CPU and GPU inference for quantized models in GGUF format. Supports LLaMA, Mistral, Phi, Gemma, and most open-weight families. Features 2-8 bit quantization for reduced memory, multi-GPU support, context extension, grammar-constrained output, and an OpenAI-compatible API server. The engine behind Ollama and LM Studio.

open-sourceOpen Source
ONNX Runtime logo

ONNX Runtime

Cross-platform high-performance ML inference engine

ONNX Runtime is Microsoft's open-source inference engine for machine learning models in ONNX format. It delivers cross-platform acceleration via execution providers for NVIDIA CUDA, TensorRT, DirectML, CoreML, OpenVINO, and more. Supports training acceleration, quantization, and GenAI workloads. Used in production across Windows, Azure, Office 365, and thousands of applications with pip-installable Python and native C++/C#/Java APIs.

open-sourceOpen Source
ExecuTorch logo

ExecuTorch

PyTorch on-device AI for mobile and edge devices

ExecuTorch is PyTorch's official solution for deploying AI models on mobile, embedded, and edge devices. It features a 50KB base runtime, 12+ hardware backends including Apple CoreML, Qualcomm QNN, ARM, and Vulkan, and native PyTorch export without format conversions. Powers Meta's on-device AI across Instagram, WhatsApp, Quest 3, and Ray-Ban Smart Glasses, supporting LLMs, vision, speech, and multimodal models.

open-sourceOpen Source
TensorFlow Lite logo

TensorFlow Lite

Google's lightweight ML framework for mobile and embedded

TensorFlow Lite is Google's lightweight ML framework for deploying models on mobile and embedded devices. It supports quantization, GPU/NPU delegation, and runs on Android, iOS, Linux, and microcontrollers. Provides pre-trained models, model conversion tools from TensorFlow and JAX, and hardware acceleration via GPU, Hexagon DSP, and CoreML delegates. Powers on-device ML in billions of Google app installations.

open-sourceOpen Source

Related Tools

KubeAI

Kubernetes operator for serving AI inference workloads

KubeAI is an Apache-2.0 Kubernetes operator for deploying and scaling AI inference workloads, including LLMs, embeddings, reranking, and speech-to-text. It gives platform teams OpenAI-compatible endpoints, model proxy/controller primitives, model caching, scale-from-zero behavior, and cluster-native resource management for self-hosted inference on Kubernetes.

open-sourceOpen Source
Deep Lake logo

Deep Lake

AI data runtime for multimodal datasets and vector search

Deep Lake is an open-source AI data runtime from Activeloop for storing, versioning, and querying multimodal data and embeddings. It fits teams building RAG, training, evaluation, or dataset-heavy agent workflows that need a bridge between vector search, structured metadata, and large image, text, audio, or video collections.

open-sourceOpen Source
SeekDB logo

SeekDB

AI-native state store with hybrid vector and full-text search

SeekDB is an open-source AI-native state store from the OceanBase ecosystem that combines MySQL-compatible data access with hybrid vector and full-text retrieval. It targets agent and AI application teams that need embedded or server deployment, copy-on-write style sandboxes, and searchable state without gluing together several separate storage layers.

open-sourceOpen Source

CLIProxyAPI

Self-hosted proxy API for routing AI CLI accounts into OpenAI-compatible endpoints

CLIProxyAPI is an open-source Go proxy server that wraps Gemini CLI, Claude Code, OpenAI Codex, Grok Build, and related CLI account flows behind OpenAI/Gemini/Claude-compatible API endpoints. Use it carefully: it can touch OAuth sessions, auth files, logs, and provider account policies, so production use needs credential and ToS review.

open-sourceOpen SourceTelemetry
OpenHuman logo

OpenHuman

Local-first personal AI agent with memory trees, desktop integrations, and private workspace context.

OpenHuman is an open-source, local-first personal AI agent from TinyHumans. It combines a desktop app, persistent memory trees, Obsidian-compatible storage, OAuth integrations, and local model support into a private assistant harness. It is most interesting for users who want agentic workflows and long-term memory without handing every context detail to a fully cloud-hosted assistant.

open-sourceOpen SourceTelemetry
DenchClaw logo

DenchClaw

Local AI CRM and workflow automation on OpenClaw

DenchClaw is a local AI CRM and workflow automation app built on OpenClaw. It runs on a Mac at localhost, lets users chat with local business data, and focuses on lead enrichment, founder/customer research, and outreach automation. It belongs beside local AI, workflow automation, and OpenClaw-style personal-agent tools rather than pure coding IDEs.

open-sourceOpen Source