11 tools tagged
Showing 11 of 11 tools
Cross-platform on-device AI model runtime
Nexa SDK enables running frontier LLMs and multimodal models locally across PC, mobile, IoT, and wearables with automatic hardware acceleration for GPU, NPU, and CPU. It supports Qwen, Gemma, Llama, DeepSeek models with Python/C++ desktop SDKs, Android/iOS mobile SDKs, and Docker for edge deployment. Includes an OpenAI-compatible API server with chat and function calling support.
Free local AI code completion for VSCode
Twinny is a free, open-source AI code completion extension for VS Code that works with any OpenAI API-compatible endpoint including local models via Ollama. It provides real-time inline suggestions, a sidebar chat for code discussion, workspace embeddings for context-aware completions, and a decentralized P2P inference network. A privacy-first alternative to GitHub Copilot with zero licensing cost.
Cross-platform on-device AI inference SDK
RunAnywhere SDK is a production-ready toolkit for running AI models entirely on-device across iOS, macOS, Android, Web, React Native, and Flutter. It provides a unified C++ core with platform-specific bindings for LLM text generation via llama.cpp, vision-language models, Whisper speech-to-text, Piper text-to-speech, and on-device image generation. All processing stays local with zero cloud dependency, ensuring privacy and low latency for mobile and edge AI applications.
Container-native local AI model serving with Podman
RamaLama is an open-source tool that containerizes AI model inference using Podman or Docker, eliminating host system configuration complexity. It auto-detects GPUs (NVIDIA, AMD, Intel, Apple Silicon), pulls models from HuggingFace, Ollama, and OCI registries, and runs them in isolated rootless containers with read-only mounts and network isolation. Developed under the Containers project (Red Hat ecosystem), it brings familiar container workflows to local LLM serving.
Personal AI agent in your terminal with local tool access
gptme is one of the earliest terminal-based AI agent CLIs, launched in spring 2023. It equips a personal AI agent with local tools to run shell commands, write and edit code, browse the web via Playwright, and use vision capabilities. Supports MCP integration and an extensible plugin system for building persistent autonomous agents. The reference agent Bob has completed over 1700 sessions.
On-device hybrid search engine for your docs and notes
QMD is an on-device search engine built by Tobi Lütke (Shopify CEO) that indexes markdown notes, meeting transcripts, and documentation locally. It combines BM25 full-text search, vector semantic search, and LLM-powered re-ranking into a single hybrid pipeline. Ships with a built-in MCP server for seamless integration with Claude Code, Cursor, and other AI editors. All processing happens on your machine via node-llama-cpp with GGUF models — zero cloud dependency.
Self-hosted AI accounting for freelancers and small teams
TaxHacker is an open-source, self-hosted AI accounting app that automatically extracts financial data from receipts, invoices, and bank statements using LLMs. It supports 170+ currencies and 14 cryptocurrencies with historical exchange rate conversion, multi-project accounting, and custom AI extraction fields. Works with OpenAI, Gemini, Mistral, or local models via Ollama—deploy with Docker and keep all financial data under your control.
Hot-swap between local LLM models via OpenAI-compatible API
llama-swap is an open-source tool that manages multiple local LLM models behind a single OpenAI-compatible API endpoint. It automatically loads and unloads models on demand, letting developers hot-swap between different models without restarting services. With 3.1K+ GitHub stars, it solves the common pain point of running multiple specialized models on limited hardware.
Run LLMs as a single portable executable file
Llamafile by Mozilla packages a complete LLM — model weights, inference engine, and OpenAI-compatible API server — into a single executable file that runs on Mac, Windows, Linux, FreeBSD, and OpenBSD with no installation. Built on llama.cpp and Cosmopolitan Libc for cross-platform portability, it delivers GPU-accelerated inference when available and falls back to optimized CPU execution. Supports GGUF models with a built-in web chat UI and REST API for integration.
Run LLMs natively on any device with ML compilation
MLC LLM is an open-source engine for deploying large language models natively across diverse platforms using machine learning compilation. It runs models on NVIDIA/AMD GPUs, Apple Silicon, mobile devices, and browsers via WebGPU without cloud dependencies. Features include OpenAI-compatible API, quantization support, and optimized backends for CUDA, Metal, Vulkan, and WebAssembly.
Offline-first AI assistant for local inference
Jan is an open-source offline-first AI assistant with 25K+ GitHub stars running LLMs locally without sending data externally. Features a ChatGPT-like interface with one-click model downloads from Hugging Face, conversation management, customizable prompts, and an OpenAI-compatible local API server. Supports GGUF models via llama.cpp with GPU acceleration on NVIDIA and Apple Silicon. Built with Electron for macOS, Windows, and Linux with full data privacy.