MediaPipe is Google's production-grade framework for deploying machine learning models directly on user devices without requiring network connectivity or cloud processing. Originally developed internally at Google for products like Google Lens, Nest cameras, and YouTube, it was open-sourced to give developers access to the same optimized ML pipeline infrastructure. The framework handles the entire lifecycle from model optimization and quantization to efficient inference on CPUs, GPUs, and specialized accelerators.
The framework ships with ready-to-use solutions covering the most common on-device ML tasks: face detection and face mesh with 468 landmarks, hand tracking with 21 key points, full-body pose estimation, object detection and tracking, image segmentation, text classification and embedding, and audio classification. Each solution includes pre-trained models optimized for real-time performance on mobile hardware. MediaPipe also supports custom model deployment through its Task API, which wraps TensorFlow Lite models with pre- and post-processing logic.
With over 34,000 GitHub stars, MediaPipe has become the standard toolkit for developers building privacy-preserving, low-latency AI features into applications. Recent updates added on-device LLM inference capabilities and support for Google's Gemma models running locally. The framework is available for Android, iOS, web browsers via WebAssembly, and Python desktop applications under the Apache-2.0 license, making it accessible across virtually every platform where on-device ML is relevant.