Loading...
Loading...
Running AI models and agents locally without cloud dependencies
Showing 24 of 99 tools
Persistent memory layer for AI coding agents — keeps Claude Code, Codex, Cursor, and any MCP agent in context across sessions
agentmemory is an open-source MCP server that gives AI coding agents persistent, cross-session memory. Built on hybrid vector-graph search, it achieves 95.2% recall on the LongMemEval-S benchmark while using up to 92% fewer context tokens than naive context injection. Works out of the box with Claude Code, Codex, Cursor, Windsurf, Cline, OpenCode, Kilo Code, Hermes, and any MCP client through 51 MCP tools plus 12 hooks and 4 skills.
Self-evolving local computer agent with a reusable skill tree
GenericAgent is a minimal, self-evolving autonomous agent in roughly 3K lines of Python that gives LLMs system-level control of a local computer. It writes files, runs shell commands, and browses the web, but its defining feature is skill crystallization: successful task runs are saved as reusable skills inside a growing skill tree that cuts token cost on repeats.
Wafer-scale inference at thousands of tokens per second
Cerebras Inference serves open-weight LLMs like Llama, Qwen, and GPT-OSS on wafer-scale CS-3 chips through an OpenAI-compatible API, benchmarking between 1,800 and 2,600 output tokens per second on Llama 3.1 8B and several hundred on 70B models. A free tier offers one million tokens per day with no credit card, while paid pay-per-token pricing starts at $0.04 per million tokens for the smaller Llama models.
Human-in-the-loop web agent you can co-pilot in real time
Magentic-UI is a Microsoft Research web agent with a human-in-the-loop interface for browsing, coding, and file tasks. It plans multi-step actions, asks for approval before executing, and lets users co-pilot by taking over the browser mid-task. Built on AutoGen, it runs a team of specialized agents for web browsing, file handling, and code execution with full action transparency and safety guardrails.
One desktop app for every LLM — private, cross-platform, extensible
Chatbox is a cross-platform desktop AI client supporting OpenAI, Claude, Gemini, DeepSeek, and local models via Ollama. All chat data stays on-device, making it ideal for privacy-conscious developers. Features include document analysis, code assistance with syntax highlighting, image generation, web search, and a local knowledge base for private Q&A. Available on Windows, macOS, Linux, Android, iOS, and web.
On-device AI inference for React Native apps
Declarative framework for running AI models on-device in React Native applications, powered by Meta ExecuTorch runtime. Supports LLMs including Llama 3.2, computer vision, OCR, embeddings, and vision-language models on iOS 17+ and Android 13+. Developed by Software Mansion with pre-built optimized models, custom model export support, and privacy-first inference without any cloud dependency for mobile AI development.
Cross-platform on-device AI model runtime
Nexa SDK enables running frontier LLMs and multimodal models locally across PC, mobile, IoT, and wearables with automatic hardware acceleration for GPU, NPU, and CPU. It supports Qwen, Gemma, Llama, DeepSeek models with Python/C++ desktop SDKs, Android/iOS mobile SDKs, and Docker for edge deployment. Includes an OpenAI-compatible API server with chat and function calling support.
Free local AI code completion for VSCode
Twinny is a free, open-source AI code completion extension for VS Code that works with any OpenAI API-compatible endpoint including local models via Ollama. It provides real-time inline suggestions, a sidebar chat for code discussion, workspace embeddings for context-aware completions, and a decentralized P2P inference network. A privacy-first alternative to GitHub Copilot with zero licensing cost.
High-performance mobile neural network inference
NCNN is Tencent's high-performance neural network inference framework optimized for mobile and embedded platforms. It features pure C++ with zero dependencies, ARM NEON assembly optimization, Vulkan GPU acceleration, and sophisticated memory management for minimal footprint. Supports importing models from PyTorch, ONNX, Caffe, TensorFlow, and Keras with 8-bit quantization and half-precision storage for efficient on-device deployment across Android, iOS, and Linux.
Cross-platform on-device AI inference SDK
RunAnywhere SDK is a production-ready toolkit for running AI models entirely on-device across iOS, macOS, Android, Web, React Native, and Flutter. It provides a unified C++ core with platform-specific bindings for LLM text generation via llama.cpp, vision-language models, Whisper speech-to-text, Piper text-to-speech, and on-device image generation. All processing stays local with zero cloud dependency, ensuring privacy and low latency for mobile and edge AI applications.
All-in-one multimodal RAG framework
RAG-Anything is an all-in-one multimodal RAG framework from the University of Hong Kong that processes text, images, tables, and equations through a unified pipeline built on LightRAG. It constructs multi-modal knowledge graphs by extracting multimodal entities and establishing cross-modal relationships. The VLM-Enhanced Query mode integrates visual content into large language models for deeper document understanding beyond plain text retrieval.
Open-source toolkit for audio, music, and speech generation
Amphion is an open-source audio generation toolkit from OpenMMLab designed for reproducible research in speech synthesis, voice conversion, singing voice synthesis, and text-to-audio generation. It implements state-of-the-art models including MaskGCT, DualCodec, VITS, and VALL-E with built-in architecture visualizations for educational use. The project ships with the Emilia-Large dataset of 200,000 hours of speech data and includes multiple vocoders and evaluation metrics for benchmarking.
Tokenizer-free multilingual TTS with voice cloning
VoxCPM is an open-source text-to-speech system from OpenBMB generating continuous speech across 30 languages without traditional tokenization. Its 2B parameter end-to-end diffusion architecture produces 48kHz studio-quality audio with natural prosody and emotion. Key capabilities include voice design from text descriptions, few-shot voice cloning, and multilingual synthesis without language-specific modules. The Apache 2.0 project has 8,700 GitHub stars.
Offline speech recognition for 20+ languages
Vosk is an offline speech recognition toolkit supporting 20+ languages with compact 50MB models that run on Raspberry Pi, Android, iOS, and servers. It provides streaming API with zero-latency response, speaker identification, and reconfigurable vocabulary. Vosk offers bindings for Python, Java, Node.js, C#, Go, and Rust. Unlike cloud-based alternatives, all processing happens locally with no internet required. Apache 2.0 licensed with 14K+ GitHub stars.
Lightweight mobile and edge AI inference engine
MNN is a lightweight, high-performance deep learning inference engine developed by Alibaba and battle-tested across 30+ Alibaba apps including Taobao, DingTalk, and Youku. It supports TensorFlow, ONNX, PyTorch, and Caffe models with optimized backends for CPU, GPU, and NPU on mobile and edge devices. MNN includes on-device LLM inference, an OpenCV-like image processing library, and Python bindings for rapid prototyping. Apache 2.0 licensed with 15K+ stars.
Open-source deep learning text-to-speech toolkit
Coqui TTS is an open-source deep learning toolkit for text-to-speech synthesis, originally built by former Mozilla TTS engineers. It supports multi-speaker and multilingual synthesis, voice cloning from just six seconds of audio, and ships pre-trained models for 20+ languages. After Coqui shut down in 2023, the Idiap Research Institute forked and actively maintains it. With 45K+ GitHub stars, it remains the most popular open-source TTS framework in Python.
Blazing fast Solidity development toolkit
Foundry is a blazing-fast portable toolkit for Ethereum smart contract development written in Rust. It provides Forge for testing and deploying Solidity contracts with native Solidity tests that run orders of magnitude faster than JavaScript-based alternatives, Cast for interacting with EVM chains from the command line, Anvil as a local testnet node, and Chisel as a Solidity REPL. Foundry has become the standard toolchain for serious Solidity development.
On-device hybrid search engine for your docs and notes
QMD is an on-device search engine built by Tobi Lütke (Shopify CEO) that indexes markdown notes, meeting transcripts, and documentation locally. It combines BM25 full-text search, vector semantic search, and LLM-powered re-ranking into a single hybrid pipeline. Ships with a built-in MCP server for seamless integration with Claude Code, Cursor, and other AI editors. All processing happens on your machine via node-llama-cpp with GGUF models — zero cloud dependency.
Run open-source LLMs on your phone, fully offline and private
Google AI Edge Gallery is an open-source mobile app that lets you download and run large language models like Gemma directly on Android and iOS devices with zero cloud dependency. Built on MediaPipe and LiteRT, it features AI chat with reasoning mode, multimodal image analysis, real-time audio transcription, and autonomous agent skills—all running entirely on-device for complete privacy. A reference implementation for developers building offline-first AI experiences.
Hot-swap between local LLM models via OpenAI-compatible API
llama-swap is an open-source tool that manages multiple local LLM models behind a single OpenAI-compatible API endpoint. It automatically loads and unloads models on demand, letting developers hot-swap between different models without restarting services. With 3.1K+ GitHub stars, it solves the common pain point of running multiple specialized models on limited hardware.
Google's production on-device LLM inference framework
LiteRT-LM is Google's official open-source framework for running large language models on-device across Android, iOS, Web, Desktop, and Raspberry Pi. Already deployed in Chrome and Pixel hardware, it provides production-grade on-device LLM inference with 1.4K+ GitHub stars. Apache 2.0 licensed.
Run and fine-tune Vision Language Models locally on Mac
Open-source Python package for running and fine-tuning Vision Language Models locally on Mac using Apple's MLX framework. Supports multimodal inference with images, audio, and video across Qwen, DeepSeek, Phi, and Gemma architectures. Features OpenAI-compatible API server, Gradio chat UI, and KV cache optimization. 3.8K+ GitHub stars.
YC-backed multimodal RAG platform for documents, images, and video
Morphik is a YC-backed multimodal RAG platform that ingests and retrieves information from documents, images, tables, and video content. It processes complex document layouts including charts, diagrams, and multi-column formats that traditional text-only RAG systems handle poorly. Provides API-first integration for building knowledge bases that understand visual as well as textual information.
Multilingual emotional text-to-speech with 80+ language support
Fish Speech is an open-source text-to-speech system supporting 80+ languages with emotional expression, zero-shot voice cloning, and real-time streaming. It generates natural speech with controllable emotions, speaking styles, and prosody. Features a web interface, API server, and integration with AI agent frameworks for voice-enabled applications. Over 29,000 GitHub stars.