TensorFlow Lite is Google's established framework for on-device machine learning, providing a compact runtime optimized for mobile phones, embedded Linux systems, and microcontrollers. The framework converts trained models from TensorFlow and JAX into a compact FlatBuffer format (.tflite) that's optimized for size and loading speed on resource-constrained devices. Post-training quantization tools reduce model size and inference latency by converting float32 weights to int8 or float16 with minimal accuracy loss.
Hardware acceleration is handled through a delegate system that dispatches operations to specialized hardware when available. The GPU delegate accelerates inference on mobile GPUs across Android and iOS, the Hexagon delegate targets Qualcomm DSPs, the CoreML delegate leverages Apple's Neural Engine, and the NNAPI delegate provides Android's standard neural network acceleration interface. For microcontrollers, TensorFlow Lite Micro provides a stripped-down runtime that runs models in as little as 16KB of memory.
TensorFlow Lite powers on-device ML across Google's product suite and billions of third-party app installations. It provides Java, Swift, Objective-C, C++, and Python APIs, along with a growing library of pre-trained models for common tasks like image classification, object detection, text classification, and pose estimation. The framework is open-source under Apache 2.0 and integrates with Android Studio and Xcode for native mobile development. For developers targeting the broadest possible device reach with on-device ML, TensorFlow Lite offers the most mature and widely deployed edge ML runtime available.