Best Local AI Workflow Tools (2025)

agentmemory

Persistent memory layer for AI coding agents — keeps Claude Code, Codex, Cursor, and any MCP agent in context across sessions

agentmemory is an open-source MCP server that gives AI coding agents persistent, cross-session memory. Built on hybrid vector-graph search, it achieves 95.2% recall on the LongMemEval-S benchmark while using up to 92% fewer context tokens than naive context injection. Works out of the box with Claude Code, Codex, Cursor, Windsurf, Cline, OpenCode, Kilo Code, Hermes, and any MCP client through 51 MCP tools plus 12 hooks and 4 skills.

open-sourceOpen Source

GenericAgent

Self-evolving local computer agent with a reusable skill tree

GenericAgent is a minimal, self-evolving autonomous agent in roughly 3K lines of Python that gives LLMs system-level control of a local computer. It writes files, runs shell commands, and browses the web, but its defining feature is skill crystallization: successful task runs are saved as reusable skills inside a growing skill tree that cuts token cost on repeats.

open-sourceOpen Source

Cerebras

Wafer-scale inference at thousands of tokens per second

Cerebras Inference serves open-weight LLMs like Llama, Qwen, and GPT-OSS on wafer-scale CS-3 chips through an OpenAI-compatible API, benchmarking between 1,800 and 2,600 output tokens per second on Llama 3.1 8B and several hundred on 70B models. A free tier offers one million tokens per day with no credit card, while paid pay-per-token pricing starts at $0.04 per million tokens for the smaller Llama models.

freemium

Magentic-UI

Human-in-the-loop web agent you can co-pilot in real time

Magentic-UI is a Microsoft Research web agent with a human-in-the-loop interface for browsing, coding, and file tasks. It plans multi-step actions, asks for approval before executing, and lets users co-pilot by taking over the browser mid-task. Built on AutoGen, it runs a team of specialized agents for web browsing, file handling, and code execution with full action transparency and safety guardrails.

freeOpen Source

Chatbox

One desktop app for every LLM — private, cross-platform, extensible

Chatbox is a cross-platform desktop AI client supporting OpenAI, Claude, Gemini, DeepSeek, and local models via Ollama. All chat data stays on-device, making it ideal for privacy-conscious developers. Features include document analysis, code assistance with syntax highlighting, image generation, web search, and a local knowledge base for private Q&A. Available on Windows, macOS, Linux, Android, iOS, and web.

freemiumOpen Source

React Native ExecuTorch

On-device AI inference for React Native apps

Declarative framework for running AI models on-device in React Native applications, powered by Meta ExecuTorch runtime. Supports LLMs including Llama 3.2, computer vision, OCR, embeddings, and vision-language models on iOS 17+ and Android 13+. Developed by Software Mansion with pre-built optimized models, custom model export support, and privacy-first inference without any cloud dependency for mobile AI development.

open-sourceOpen Source

Nexa SDK

Cross-platform on-device AI model runtime

Nexa SDK enables running frontier LLMs and multimodal models locally across PC, mobile, IoT, and wearables with automatic hardware acceleration for GPU, NPU, and CPU. It supports Qwen, Gemma, Llama, DeepSeek models with Python/C++ desktop SDKs, Android/iOS mobile SDKs, and Docker for edge deployment. Includes an OpenAI-compatible API server with chat and function calling support.

open-sourceOpen Source

Twinny

Free local AI code completion for VSCode

Twinny is a free, open-source AI code completion extension for VS Code that works with any OpenAI API-compatible endpoint including local models via Ollama. It provides real-time inline suggestions, a sidebar chat for code discussion, workspace embeddings for context-aware completions, and a decentralized P2P inference network. A privacy-first alternative to GitHub Copilot with zero licensing cost.

open-sourceOpen Source

NCNN

High-performance mobile neural network inference

NCNN is Tencent's high-performance neural network inference framework optimized for mobile and embedded platforms. It features pure C++ with zero dependencies, ARM NEON assembly optimization, Vulkan GPU acceleration, and sophisticated memory management for minimal footprint. Supports importing models from PyTorch, ONNX, Caffe, TensorFlow, and Keras with 8-bit quantization and half-precision storage for efficient on-device deployment across Android, iOS, and Linux.

open-sourceOpen Source

RunAnywhere SDK

Cross-platform on-device AI inference SDK

RunAnywhere SDK is a production-ready toolkit for running AI models entirely on-device across iOS, macOS, Android, Web, React Native, and Flutter. It provides a unified C++ core with platform-specific bindings for LLM text generation via llama.cpp, vision-language models, Whisper speech-to-text, Piper text-to-speech, and on-device image generation. All processing stays local with zero cloud dependency, ensuring privacy and low latency for mobile and edge AI applications.

open-sourceOpen Source

RAG-Anything

All-in-one multimodal RAG framework

RAG-Anything is an all-in-one multimodal RAG framework from the University of Hong Kong that processes text, images, tables, and equations through a unified pipeline built on LightRAG. It constructs multi-modal knowledge graphs by extracting multimodal entities and establishing cross-modal relationships. The VLM-Enhanced Query mode integrates visual content into large language models for deeper document understanding beyond plain text retrieval.

open-sourceOpen Source

Amphion

Open-source toolkit for audio, music, and speech generation

Amphion is an open-source audio generation toolkit from OpenMMLab designed for reproducible research in speech synthesis, voice conversion, singing voice synthesis, and text-to-audio generation. It implements state-of-the-art models including MaskGCT, DualCodec, VITS, and VALL-E with built-in architecture visualizations for educational use. The project ships with the Emilia-Large dataset of 200,000 hours of speech data and includes multiple vocoders and evaluation metrics for benchmarking.

open-sourceOpen Source

VoxCPM

Tokenizer-free multilingual TTS with voice cloning

VoxCPM is an open-source text-to-speech system from OpenBMB generating continuous speech across 30 languages without traditional tokenization. Its 2B parameter end-to-end diffusion architecture produces 48kHz studio-quality audio with natural prosody and emotion. Key capabilities include voice design from text descriptions, few-shot voice cloning, and multilingual synthesis without language-specific modules. The Apache 2.0 project has 8,700 GitHub stars.

open-sourceOpen Source

Vosk

Offline speech recognition for 20+ languages

Vosk is an offline speech recognition toolkit supporting 20+ languages with compact 50MB models that run on Raspberry Pi, Android, iOS, and servers. It provides streaming API with zero-latency response, speaker identification, and reconfigurable vocabulary. Vosk offers bindings for Python, Java, Node.js, C#, Go, and Rust. Unlike cloud-based alternatives, all processing happens locally with no internet required. Apache 2.0 licensed with 14K+ GitHub stars.

open-sourceOpen Source

MNN

Lightweight mobile and edge AI inference engine

MNN is a lightweight, high-performance deep learning inference engine developed by Alibaba and battle-tested across 30+ Alibaba apps including Taobao, DingTalk, and Youku. It supports TensorFlow, ONNX, PyTorch, and Caffe models with optimized backends for CPU, GPU, and NPU on mobile and edge devices. MNN includes on-device LLM inference, an OpenCV-like image processing library, and Python bindings for rapid prototyping. Apache 2.0 licensed with 15K+ stars.

open-sourceOpen Source

Coqui TTS

Open-source deep learning text-to-speech toolkit

Coqui TTS is an open-source deep learning toolkit for text-to-speech synthesis, originally built by former Mozilla TTS engineers. It supports multi-speaker and multilingual synthesis, voice cloning from just six seconds of audio, and ships pre-trained models for 20+ languages. After Coqui shut down in 2023, the Idiap Research Institute forked and actively maintains it. With 45K+ GitHub stars, it remains the most popular open-source TTS framework in Python.

open-sourceOpen Source

Foundry

Blazing fast Solidity development toolkit

Foundry is a blazing-fast portable toolkit for Ethereum smart contract development written in Rust. It provides Forge for testing and deploying Solidity contracts with native Solidity tests that run orders of magnitude faster than JavaScript-based alternatives, Cast for interacting with EVM chains from the command line, Anvil as a local testnet node, and Chisel as a Solidity REPL. Foundry has become the standard toolchain for serious Solidity development.

open-sourceOpen Source

QMD

On-device hybrid search engine for your docs and notes

QMD is an on-device search engine built by Tobi Lütke (Shopify CEO) that indexes markdown notes, meeting transcripts, and documentation locally. It combines BM25 full-text search, vector semantic search, and LLM-powered re-ranking into a single hybrid pipeline. Ships with a built-in MCP server for seamless integration with Claude Code, Cursor, and other AI editors. All processing happens on your machine via node-llama-cpp with GGUF models — zero cloud dependency.

free

Google AI Edge Gallery

Run open-source LLMs on your phone, fully offline and private

Google AI Edge Gallery is an open-source mobile app that lets you download and run large language models like Gemma directly on Android and iOS devices with zero cloud dependency. Built on MediaPipe and LiteRT, it features AI chat with reasoning mode, multimodal image analysis, real-time audio transcription, and autonomous agent skills—all running entirely on-device for complete privacy. A reference implementation for developers building offline-first AI experiences.

open-sourceOpen Source

llama-swap

Hot-swap between local LLM models via OpenAI-compatible API

llama-swap is an open-source tool that manages multiple local LLM models behind a single OpenAI-compatible API endpoint. It automatically loads and unloads models on demand, letting developers hot-swap between different models without restarting services. With 3.1K+ GitHub stars, it solves the common pain point of running multiple specialized models on limited hardware.

open-sourceOpen Source

LiteRT-LM

Google's production on-device LLM inference framework

LiteRT-LM is Google's official open-source framework for running large language models on-device across Android, iOS, Web, Desktop, and Raspberry Pi. Already deployed in Chrome and Pixel hardware, it provides production-grade on-device LLM inference with 1.4K+ GitHub stars. Apache 2.0 licensed.

open-sourceOpen Source

MLX-VLM

Run and fine-tune Vision Language Models locally on Mac

Open-source Python package for running and fine-tuning Vision Language Models locally on Mac using Apple's MLX framework. Supports multimodal inference with images, audio, and video across Qwen, DeepSeek, Phi, and Gemma architectures. Features OpenAI-compatible API server, Gradio chat UI, and KV cache optimization. 3.8K+ GitHub stars.

open-sourceOpen Source

Morphik

YC-backed multimodal RAG platform for documents, images, and video

Morphik is a YC-backed multimodal RAG platform that ingests and retrieves information from documents, images, tables, and video content. It processes complex document layouts including charts, diagrams, and multi-column formats that traditional text-only RAG systems handle poorly. Provides API-first integration for building knowledge bases that understand visual as well as textual information.

freemiumOpen Source

Fish Speech

Multilingual emotional text-to-speech with 80+ language support

Fish Speech is an open-source text-to-speech system supporting 80+ languages with emotional expression, zero-shot voice cloning, and real-time streaming. It generates natural speech with controllable emotions, speaking styles, and prosody. Features a web interface, API server, and integration with AI agent frameworks for voice-enabled applications. Over 29,000 GitHub stars.

open-sourceOpen Source

Best tools for Local AI Workflows

agentmemory

GenericAgent

Cerebras

Magentic-UI

Chatbox

React Native ExecuTorch

Nexa SDK

Twinny

NCNN

RunAnywhere SDK

RAG-Anything

Amphion

VoxCPM

Vosk

MNN

Coqui TTS

Foundry

QMD

Google AI Edge Gallery

llama-swap

LiteRT-LM

MLX-VLM

Morphik

Fish Speech