# inference-engine

7 tools tagged

showing 7 of 7 tools

React Native ExecuTorch

On-device AI inference for React Native apps

Declarative framework for running AI models on-device in React Native applications, powered by Meta ExecuTorch runtime. Supports LLMs including Llama 3.2, computer vision, OCR, embeddings, and vision-language models on iOS 17+ and Android 13+. Developed by Software Mansion with pre-built optimized models, custom model export support, and privacy-first inference without any cloud dependency for mobile AI development.

open-sourceOpen Source

Nexa SDK

Cross-platform on-device AI model runtime

Nexa SDK enables running frontier LLMs and multimodal models locally across PC, mobile, IoT, and wearables with automatic hardware acceleration for GPU, NPU, and CPU. It supports Qwen, Gemma, Llama, DeepSeek models with Python/C++ desktop SDKs, Android/iOS mobile SDKs, and Docker for edge deployment. Includes an OpenAI-compatible API server with chat and function calling support.

open-sourceOpen Source

NCNN

High-performance mobile neural network inference

NCNN is Tencent's high-performance neural network inference framework optimized for mobile and embedded platforms. It features pure C++ with zero dependencies, ARM NEON assembly optimization, Vulkan GPU acceleration, and sophisticated memory management for minimal footprint. Supports importing models from PyTorch, ONNX, Caffe, TensorFlow, and Keras with 8-bit quantization and half-precision storage for efficient on-device deployment across Android, iOS, and Linux.

open-sourceOpen Source

RunAnywhere SDK

Cross-platform on-device AI inference SDK

RunAnywhere SDK is a production-ready toolkit for running AI models entirely on-device across iOS, macOS, Android, Web, React Native, and Flutter. It provides a unified C++ core with platform-specific bindings for LLM text generation via llama.cpp, vision-language models, Whisper speech-to-text, Piper text-to-speech, and on-device image generation. All processing stays local with zero cloud dependency, ensuring privacy and low latency for mobile and edge AI applications.

open-sourceOpen Source

MNN

Lightweight mobile and edge AI inference engine

MNN is a lightweight, high-performance deep learning inference engine developed by Alibaba and battle-tested across 30+ Alibaba apps including Taobao, DingTalk, and Youku. It supports TensorFlow, ONNX, PyTorch, and Caffe models with optimized backends for CPU, GPU, and NPU on mobile and edge devices. MNN includes on-device LLM inference, an OpenCV-like image processing library, and Python bindings for rapid prototyping. Apache 2.0 licensed with 15K+ stars.

open-sourceOpen Source

ONNX Runtime

Cross-platform high-performance ML inference engine

ONNX Runtime is Microsoft's open-source inference engine for machine learning models in ONNX format. It delivers cross-platform acceleration via execution providers for NVIDIA CUDA, TensorRT, DirectML, CoreML, OpenVINO, and more. Supports training acceleration, quantization, and GenAI workloads. Used in production across Windows, Azure, Office 365, and thousands of applications with pip-installable Python and native C++/C#/Java APIs.

open-sourceOpen Source

OpenVINO

Intel's open-source AI inference optimization toolkit

OpenVINO is Intel's open-source toolkit for optimizing and deploying AI inference across CPUs, GPUs, and NPUs. It supports models from PyTorch, TensorFlow, ONNX, and TFLite, providing graph optimizations, quantization, and hardware-specific acceleration. The toolkit includes a GenAI API for LLM deployment and runs on Intel, ARM, and x86 platforms for edge, desktop, and cloud inference workloads.

open-sourceOpen Source