ExecuTorch is PyTorch's unified edge AI runtime, enabling deployment of AI models from smartphones to microcontrollers with a remarkably small 50KB base footprint. Developed collaboratively by Meta, Arm, Apple, and Qualcomm, it reached version 1.0 in late 2025, marking its transition from experimental to production-stable. The key innovation is direct model export from PyTorch — no ONNX, TFLite, or intermediate format conversions needed — preserving model semantics and eliminating the error-prone conversion step that plagues other edge deployment workflows.
The framework supports 12+ hardware backends through a delegate system, including Apple CoreML and Metal for iOS, Qualcomm QNN for Snapdragon, ARM Ethos-U for microcontrollers, Vulkan for cross-platform GPU, and XNNPACK for optimized CPU inference. Built-in quantization via torchao supports 8-bit, 4-bit, and dynamic quantization for reducing model size and inference latency. ExecuTorch already powers billions of on-device inferences at Meta across Instagram, WhatsApp, Quest 3 VR, and Ray-Ban Meta Smart Glasses.
ExecuTorch supports deploying LLMs like Llama 3.2, Qwen 3, and Phi-4-mini, along with vision, speech, and multimodal models on edge devices. It provides developer tools including ETDump profiler, ETRecord inspector, and selective build to strip unused operators. The project is fully open-source under BSD-3-Clause, installable via pip, and integrates with Hugging Face through Optimum-ExecuTorch for transformer model deployment. For mobile developers, it offers native Swift/Objective-C APIs for iOS and Android Studio integration.