aicoolies logo
Vosk logo

Vosk

Offline speech recognition for 20+ languages

Share
open-sourceOpen Source
Visit Website →

Vosk is an offline speech recognition toolkit supporting 20+ languages with compact 50MB models that run on Raspberry Pi, Android, iOS, and servers. It provides streaming API with zero-latency response, speaker identification, and reconfigurable vocabulary. Vosk offers bindings for Python, Java, Node.js, C#, Go, and Rust. Unlike cloud-based alternatives, all processing happens locally with no internet required. Apache 2.0 licensed with 14K+ GitHub stars.

Vosk is an offline speech recognition toolkit developed by Alpha Cephei that brings accurate, real-time transcription to devices ranging from Raspberry Pi boards to production servers without requiring internet connectivity. The toolkit supports over 20 languages including English, Chinese, German, French, Spanish, Japanese, Russian, and Arabic, with compact models around 50MB that fit comfortably in memory-constrained environments. Vosk uses Kaldi-based acoustic models combined with efficient language models to deliver recognition quality comparable to cloud services while keeping all audio data on-device.

The streaming API is one of Vosk's key differentiators, providing partial recognition results with near-zero latency as audio arrives rather than waiting for complete utterances. This makes it suitable for real-time applications like voice assistants, live captioning, and interactive voice response systems. Vosk also includes speaker identification capabilities to distinguish between multiple speakers, and its vocabulary can be reconfigured at runtime to improve accuracy for domain-specific terminology without retraining the underlying model.

With over 14,000 GitHub stars, Vosk provides official bindings for Python, Java, Node.js, C#, C++, Go, and Rust, making it accessible across virtually any technology stack. The project is used in smart home platforms, transcription workflows, and accessibility tools where cloud dependency is unacceptable due to privacy, latency, or connectivity constraints. For developers who need speech recognition without sending audio to third-party servers, Vosk offers a mature and well-documented alternative to cloud-only solutions.

Pricing

Free, open-source under Apache 2.0 license

Platforms

Python, Java, Node.js, C#, Go, Rust; RPi, Android, iOS, servers

Categories

Tags

Use Cases

Alternatives

Related Tools

Marqo logo

Marqo

Embedding-first search and discovery engine for AI-powered product experiences.

Marqo is an open-source tensor search engine that combines embedding generation and vector search in a single API, removing the need to manage separate embedding pipelines and vector databases. Built for product discovery and multi-modal search, it lets teams index text, images, and structured data together, returning ranked results based on semantic similarity rather than keyword overlap.

freemium
Magika logo

Magika

AI-powered file-type detection at Google scale

Open-source AI-powered file-type detection tool from Google that uses a custom deep-learning model under a few megabytes to identify more than 200 binary and textual content types in milliseconds, even on a single CPU. Magika ships as a CLI, Python package, JavaScript/TypeScript library, and an ONNX model, achieves around 99% accuracy on its test set, and is already used at Google scale across Gmail, Drive, and Safe Browsing as well as by VirusTotal and abuse.ch.

freeOpen Source
Zep logo

Zep

Context engineering platform for AI agents with temporal knowledge graphs

Zep is a context engineering platform that assembles relationship-aware context for AI agents from conversations, business data, documents, and events. It maintains a temporal knowledge graph that automatically extracts entities and relationships, tracking how context evolves over time. Zep delivers formatted context blocks optimized for LLMs with sub-200ms latency, integrating with LangChain, LlamaIndex, AutoGen, and Google ADK through Python, TypeScript, and Go SDKs.

freemium
Hindsight logo

Hindsight

Agent memory system that learns, not just remembers

Hindsight is an agent memory system that enables AI agents to learn from experience rather than just store conversations. It organizes memories into three biomimetic categories: World knowledge for facts, Experiences for agent events, and Mental Models for learned understanding. The system provides retain, recall, and reflect operations backed by a temporal knowledge graph with parallel retrieval strategies including semantic, keyword, graph traversal, and temporal search.

freemiumOpen Source
Weights & Biases logo

Weights & Biases

ML experiment tracking and model monitoring

Weights and Biases is the AI developer platform for experiment tracking, model monitoring, and ML workflow orchestration. Weave extends W&B with LLM ops capabilities for prompt engineering, evaluation, and deployment. Enables teams to track experiments, monitor model performance in production, manage datasets, log LLM application traces, and collaborate on ML projects with visualization dashboards, automated logging, and enterprise SSO and RBAC compliance.

freemium
Labelbox logo

Labelbox

Data factory for AI teams and model training

Labelbox is a comprehensive data platform for AI teams handling reinforcement learning, evaluations, robotics, and human feedback workflows. Core capabilities include RL data generation with knowledge work rubrics, custom evaluations for private benchmarks and model comparisons, robotics data with full-stack video and trajectories, and an expert network of 1.5M+ knowledge workers including 50K+ PhDs. Trusted by 80% of leading AI labs for production data operations.

paid