# offline-capable
10 tools tagged
Showing 10 of 10 tools
Google AI Edge Gallery
Run open-source LLMs on your phone, fully offline and private
Google AI Edge Gallery is an open-source mobile app that lets you download and run large language models like Gemma directly on Android and iOS devices with zero cloud dependency. Built on MediaPipe and LiteRT, it features AI chat with reasoning mode, multimodal image analysis, real-time audio transcription, and autonomous agent skills—all running entirely on-device for complete privacy. A reference implementation for developers building offline-first AI experiences.
Cactus
On-device AI inference engine for mobile and wearable applications
Cactus is a YC-backed low-latency AI engine for mobile and wearable devices that runs LLMs, transcription, embedding, and TTS models locally. It achieves 16-20 tok/sec on older devices and 70+ tok/sec on flagships with ARM SIMD kernels optimized for Snapdragon, Apple, and MediaTek processors. Supports Qwen, Gemma, Llama, DeepSeek with Flutter, React Native, and Kotlin SDKs.
BitNet
Microsoft's framework for running 1-bit large language models on consumer CPUs
BitNet is Microsoft's official inference framework for 1-bit quantized large language models that enables running models with up to 100 billion parameters on standard consumer CPUs without requiring a GPU. By leveraging extreme quantization where weights use only 1.58 bits on average, BitNet achieves dramatic reductions in memory footprint and computational cost while maintaining competitive output quality for many practical use cases.
Lemonade
AMD's open-source local LLM server with GPU and NPU acceleration
Lemonade is AMD's open-source local AI serving platform for LLMs, image generation, speech recognition, and text-to-speech on your own hardware. Built in lightweight C++, it can detect CPU, GPU, and NPU backends and is extra optimized for Ryzen AI, Radeon, and Strix Halo PCs. Lemonade exposes OpenAI, Anthropic, and Ollama-compatible APIs, ships with a desktop model manager, and supports source-confirmed GGUF, FLM, and ONNX models across Windows, Linux, macOS, and Docker.
Hyprnote
Local-first AI notepad for meetings and voice notes
Hyprnote is a local-first AI notepad designed for capturing and processing meeting notes and voice recordings. It runs entirely on-device for privacy, transcribes audio using local models, and generates structured summaries, action items, and follow-ups. Built with Rust and Tauri for native desktop performance. Over 8,000 GitHub stars with strong privacy-focused community adoption.
Open WebUI
Self-hosted AI platform with ChatGPT-like interface for local and cloud LLMs.
Extensible, self-hosted AI platform with 290M+ Docker pulls and 124K+ GitHub stars. Supports Ollama, OpenAI-compatible APIs, and any Chat Completions backend. Features built-in RAG, multi-user RBAC, voice/video calls, Python function workspace, model builder, and web browsing. Runs entirely offline with enterprise features including SSO and audit logging.
LM Studio
Run local LLMs with an intuitive desktop GUI and OpenAI-compatible API server.
Free desktop application by Element Labs for discovering, downloading, and running open-source LLMs locally. Features a curated Hugging Face model browser, side-by-side model comparison, parameter tuning, and an OpenAI-compatible API server on localhost:1234. Powered by llama.cpp with Metal acceleration for Apple Silicon.
Neovim
Hyperextensible Vim-based text editor
Modern fork of Vim with Lua-based plugin architecture and built-in LSP support. Lightweight yet endlessly extensible, it serves as the foundation for AI-enhanced development workflows through plugins like Avante, Codecompanion, and Copilot.lua. With 85k+ GitHub stars, it's the terminal editor of choice for developers who value speed, customization, and keyboard-driven efficiency.
Ollama
Run LLMs locally with one command
Tool for running large language models locally on your machine with a simple CLI interface. Download and run Llama 3, Mistral, Gemma, Phi, Code Llama, and dozens of other open-source models with a single command. Features model management, GPU acceleration (NVIDIA/AMD/Apple Silicon), OpenAI-compatible API server, Modelfile for customization, and multi-model switching. Ideal for offline AI development, privacy-sensitive use cases, and local testing. 120K+ GitHub stars.
Alacritty
A fast, cross-platform terminal
Self-described 'fastest terminal emulator in existence' — a GPU-accelerated, cross-platform terminal written in Rust focused on simplicity and performance. No tabs, splits, or built-in multiplexer — designed to pair with tmux or Zellij. Configured via YAML with a minimal feature set that prioritizes speed above all else. Supports true color, Vi mode, regex search, and clickable URLs. Available on macOS, Linux, Windows, and BSD. 57K+ GitHub stars.