aicoolies logo

Best tools for Local AI Workflows

Running AI models and agents locally without cloud dependencies

Showing 24 of 108 tools

Accomplish Coworker

Open-source desktop AI coworker for browsing and code execution.

Accomplish Coworker is an MIT-licensed open-source AI coworker that runs on the desktop, combining computer-use style browsing with code execution so agents can research, implement, run, and debug workflows in one local environment.

open-sourceOpen SourceTelemetry

Safari MCP Server

Apple's Safari-native MCP server for web debugging agents

Safari MCP Server is Apple's safaridriver-based MCP server in Safari Technology Preview, giving compatible coding agents local access to Safari page content, console logs, network requests, screenshots, JavaScript evaluation, interactions, viewport controls, and accessibility/performance checks.

freeTelemetry

Headroom

Context compression for LLM apps and coding agents

Headroom is an Apache-2.0 context compression layer for LLM apps and coding agents. It compresses tool output, logs, files, RAG chunks, and agent history through a local library, proxy, wrapper, or MCP server, with retrieval hooks for bringing originals back when needed. Treat its savings numbers as Headroom-reported benchmarks, not independent aicoolies measurements.

open-sourceOpen SourceTelemetry

Codebase Memory MCP

Codebase knowledge graph MCP server for AI coding agents

Codebase Memory MCP is an MIT-licensed MCP server that turns a repository into a persistent code knowledge graph for AI coding agents. It gives Claude Code, Cursor, Codex-style agents, and other MCP clients structural queries for functions, classes, call chains, routes, and architecture, helping them explore large projects without repeatedly rereading files or relying only on broad search.

open-sourceOpen SourceTelemetry

KubeAI

Kubernetes operator for serving AI inference workloads

KubeAI is an Apache-2.0 Kubernetes operator for deploying and scaling AI inference workloads, including LLMs, embeddings, reranking, and speech-to-text. It gives platform teams OpenAI-compatible endpoints, model proxy/controller primitives, model caching, scale-from-zero behavior, and cluster-native resource management for self-hosted inference on Kubernetes.

open-sourceOpen Source
BeeAI Framework logo

BeeAI Framework

Python and TypeScript framework for production multi-agent systems

BeeAI Framework is an Apache-2.0 toolkit for building production-ready AI agents and multi-agent systems in Python and TypeScript. Its docs cover agents, tools, RAG, memory, workflows, backend providers, serving, and A2A/MCP integration surfaces, making it a vendor-neutral option for teams comparing LangGraph, CrewAI, Mastra, and related agent runtimes.

open-sourceOpen SourceTelemetry
Windows-MCP logo

Windows-MCP

MCP server for controlling Windows desktops through UIAutomation

Windows-MCP is an open-source MCP server for giving AI agents structured access to Windows desktop automation. It focuses on UIAutomation, snapshots, input control, and Windows-specific app workflows, making it different from general filesystem or shell MCP servers.

open-sourceOpen Source
Reasonix logo

Reasonix

DeepSeek-native terminal coding agent with a Go rewrite and MCP support

Reasonix is an open-source terminal coding agent built around DeepSeek workflows, with a newer Go-based 1.0 line, MCP integration, repository-aware code understanding, and BYOK model usage. It fits developers who want a DeepSeek-first CLI agent rather than a Claude- or OpenAI-native workflow.

open-sourceOpen Source
BrowserOS logo

BrowserOS

Open-source agentic browser that runs local AI agents in your browsing workflow.

BrowserOS is a privacy-first, open-source agentic browser for running AI assistants locally inside real browsing sessions instead of handing every task to a remote cloud browser.

open-sourceOpen Source
agentmemory logo

agentmemory

Persistent memory layer for AI coding agents — keeps Claude Code, Codex, Cursor, and any MCP agent in context across sessions

agentmemory is an open-source MCP server that gives AI coding agents persistent, cross-session memory. Built on hybrid vector-graph search, it achieves 95.2% recall on the LongMemEval-S benchmark while using up to 92% fewer context tokens than naive context injection. Works out of the box with Claude Code, Codex, Cursor, Windsurf, Cline, OpenCode, Kilo Code, Hermes, and any MCP client through 51 MCP tools plus 12 hooks and 4 skills.

open-sourceOpen Source
genericagent logo

GenericAgent

Self-evolving local computer agent with a reusable skill tree

GenericAgent is a minimal, self-evolving autonomous agent from a 3.3K-line seed and ~3K core loop that gives LLMs system-level control of a local computer. It writes files, runs shell commands, browses the web, and uses keyboard/mouse/screen/mobile tools, while skill crystallization saves successful runs into a reusable skill tree that cuts token cost on repeats.

open-sourceOpen Source
Cerebras logo

Cerebras

Wafer-scale inference at thousands of tokens per second

Cerebras Inference serves open-weight LLMs like Llama, Qwen, and GPT-OSS on wafer-scale CS-3 chips through an OpenAI-compatible API, benchmarking between 1,800 and 2,600 output tokens per second on Llama 3.1 8B and several hundred on 70B models. A free tier offers one million tokens per day with no credit card, while paid pay-per-token pricing starts at $0.04 per million tokens for the smaller Llama models.

freemium
Magentic-UI logo

Magentic-UI

Human-in-the-loop web agent you can co-pilot in real time

Magentic-UI is a Microsoft Research web agent with a human-in-the-loop interface for browsing, coding, and file tasks. It plans multi-step actions, asks for approval before executing, and lets users co-pilot by taking over the browser mid-task. Built on AutoGen, it runs a team of specialized agents for web browsing, file handling, and code execution with full action transparency and safety guardrails.

freeOpen Source
Chatbox logo

Chatbox

One desktop app for every LLM — private, cross-platform, extensible

Chatbox is a cross-platform desktop AI client supporting OpenAI, Claude, Gemini, DeepSeek, and local models via Ollama. All chat data stays on-device, making it ideal for privacy-conscious developers. Features include document analysis, code assistance with syntax highlighting, image generation, web search, and a local knowledge base for private Q&A. Available on Windows, macOS, Linux, Android, iOS, and web.

freemiumOpen Source
React Native ExecuTorch logo

React Native ExecuTorch

On-device AI inference for React Native apps

Declarative framework for running AI models on-device in React Native applications, powered by Meta ExecuTorch runtime. Supports LLMs including Llama 3.2, computer vision, OCR, embeddings, and vision-language models on iOS 17+ and Android 13+. Developed by Software Mansion with pre-built optimized models, custom model export support, and privacy-first inference without any cloud dependency for mobile AI development.

open-sourceOpen Source
Nexa SDK logo

Nexa SDK

Cross-platform on-device AI model runtime

Nexa SDK enables running frontier LLMs and multimodal models locally across PC, mobile, IoT, and wearables with automatic hardware acceleration for GPU, NPU, and CPU. It supports Qwen, Gemma, Llama, DeepSeek models with Python/C++ desktop SDKs, Android/iOS mobile SDKs, and Docker for edge deployment. Includes an OpenAI-compatible API server with chat and function calling support.

open-sourceOpen Source
Twinny logo

Twinny

Free local AI code completion for VSCode

Twinny is a free, open-source AI code completion extension for VS Code that works with any OpenAI API-compatible endpoint including local models via Ollama. It provides real-time inline suggestions, a sidebar chat for code discussion, workspace embeddings for context-aware completions, and a decentralized P2P inference network. A privacy-first alternative to GitHub Copilot with zero licensing cost.

open-sourceOpen Source
NCNN logo

NCNN

High-performance mobile neural network inference

NCNN is Tencent's high-performance neural network inference framework optimized for mobile and embedded platforms. It features pure C++ with zero dependencies, ARM NEON assembly optimization, Vulkan GPU acceleration, and sophisticated memory management for minimal footprint. Supports importing models from PyTorch, ONNX, Caffe, TensorFlow, and Keras with 8-bit quantization and half-precision storage for efficient on-device deployment across Android, iOS, and Linux.

open-sourceOpen Source
RunAnywhere SDK logo

RunAnywhere SDK

Cross-platform on-device AI inference SDK

RunAnywhere SDK is a production-ready toolkit for running AI models entirely on-device across iOS, macOS, Android, Web, React Native, and Flutter. It provides a unified C++ core with platform-specific bindings for LLM text generation via llama.cpp, vision-language models, Whisper speech-to-text, Piper text-to-speech, and on-device image generation. All processing stays local with zero cloud dependency, ensuring privacy and low latency for mobile and edge AI applications.

open-sourceOpen Source
RAG-Anything logo

RAG-Anything

All-in-one multimodal RAG framework

RAG-Anything is an all-in-one multimodal RAG framework from the University of Hong Kong that processes text, images, tables, and equations through a unified pipeline built on LightRAG. It constructs multi-modal knowledge graphs by extracting multimodal entities and establishing cross-modal relationships. The VLM-Enhanced Query mode integrates visual content into large language models for deeper document understanding beyond plain text retrieval.

open-sourceOpen Source

Amphion

Open-source toolkit for audio, music, and speech generation

Amphion is an open-source audio generation toolkit from OpenMMLab designed for reproducible research in speech synthesis, voice conversion, singing voice synthesis, and text-to-audio generation. It implements state-of-the-art models including MaskGCT, DualCodec, VITS, and VALL-E with built-in architecture visualizations for educational use. The project ships with the Emilia-Large dataset of 200,000 hours of speech data and includes multiple vocoders and evaluation metrics for benchmarking.

open-sourceOpen Source
VoxCPM logo

VoxCPM

Tokenizer-free multilingual TTS with voice cloning

VoxCPM is an open-source text-to-speech system from OpenBMB generating continuous speech across 30 languages without traditional tokenization. Its 2B parameter end-to-end diffusion architecture produces 48kHz studio-quality audio with natural prosody and emotion. Key capabilities include voice design from text descriptions, few-shot voice cloning, and multilingual synthesis without language-specific modules. The Apache 2.0 project has 8,700 GitHub stars.

open-sourceOpen Source
Vosk logo

Vosk

Offline speech recognition for 20+ languages

Vosk is an offline speech recognition toolkit supporting 20+ languages with compact 50MB models that run on Raspberry Pi, Android, iOS, and servers. It provides streaming API with zero-latency response, speaker identification, and reconfigurable vocabulary. Vosk offers bindings for Python, Java, Node.js, C#, Go, and Rust. Unlike cloud-based alternatives, all processing happens locally with no internet required. Apache 2.0 licensed with 14K+ GitHub stars.

open-sourceOpen Source

MNN

Lightweight mobile and edge AI inference engine

MNN is a lightweight, high-performance deep learning inference engine developed by Alibaba and battle-tested across 30+ Alibaba apps including Taobao, DingTalk, and Youku. It supports TensorFlow, ONNX, PyTorch, and Caffe models with optimized backends for CPU, GPU, and NPU on mobile and edge devices. MNN includes on-device LLM inference, an OpenCV-like image processing library, and Python bindings for rapid prototyping. Apache 2.0 licensed with 15K+ stars.

open-sourceOpen Source