aicoolies logo
Coqui TTS logo

Coqui TTS

Open-source deep learning text-to-speech toolkit

Share
open-sourceOpen Source
Visit Website →

Coqui TTS is an open-source deep learning toolkit for text-to-speech synthesis, originally built by former Mozilla TTS engineers. It supports multi-speaker and multilingual synthesis, voice cloning from just six seconds of audio, and ships pre-trained models for 20+ languages. After Coqui shut down in 2023, the Idiap Research Institute forked and actively maintains it. With 45K+ GitHub stars, it remains the most popular open-source TTS framework in Python.

Coqui TTS is a comprehensive deep learning framework for text-to-speech synthesis that grew out of Mozilla's pioneering TTS research program. The toolkit provides end-to-end neural speech synthesis with support for multiple architectures including Tacotron2, VITS, GlowTTS, and the flagship XTTS model family. XTTS in particular enables high-quality voice cloning from as little as six seconds of reference audio across 17 languages, making it one of the most capable open-source voice cloning systems available today. The library ships with dozens of pre-trained models that can be used directly or fine-tuned on custom voice datasets.

After Coqui the company ceased operations in late 2023, the Idiap Research Institute in Switzerland assumed maintainership of the codebase, ensuring continued bug fixes, security patches, and compatibility updates with newer Python and PyTorch versions. The community fork at github.com/idiap/coqui-ai-TTS receives regular releases on PyPI under the coqui-tts package name. The original repository at github.com/coqui-ai/TTS retains its 45,000 plus stars and serves as the historical reference, while active development continues under Idiap with contributions from researchers and developers worldwide.

Coqui TTS fits into workflows ranging from podcast production and audiobook narration to accessibility applications, game dialogue systems, and automated customer service. It runs entirely locally with no API calls required, making it suitable for privacy-sensitive deployments and offline environments where cloud TTS services are not an option. The framework provides a straightforward Python API, a command-line interface for quick synthesis tasks, and a built-in web server for interactive testing and demos. For teams evaluating text-to-speech solutions, Coqui TTS offers a rare combination of production-grade quality, multilingual support, and zero licensing costs.

Pricing

Free, open-source under MPL-2.0 license

Platforms

Python package (pip), Linux, macOS, Windows

Categories

Tags

Use Cases

Alternatives

Fish Speech logo

Fish Speech

Multilingual emotional text-to-speech with 80+ language support

Fish Speech is an open-source text-to-speech system supporting 80+ languages with emotional expression, zero-shot voice cloning, and real-time streaming. It generates natural speech with controllable emotions, speaking styles, and prosody. Features a web interface, API server, and integration with AI agent frameworks for voice-enabled applications. Over 29,000 GitHub stars.

open-sourceOpen Source

GPT-SoVITS

Open-source voice cloning and text-to-speech with few-shot learning

GPT-SoVITS is an open-source voice cloning and text-to-speech system that generates natural-sounding speech from just a few seconds of reference audio. It combines GPT-style language modeling with SoVITS voice synthesis for zero-shot and few-shot voice cloning across multiple languages. Supports Chinese, English, Japanese, Korean, and Cantonese with over 56,000 GitHub stars.

open-sourceOpen Source
VoxCPM logo

VoxCPM

Tokenizer-free multilingual TTS with voice cloning

VoxCPM is an open-source text-to-speech system from OpenBMB generating continuous speech across 30 languages without traditional tokenization. Its 2B parameter end-to-end diffusion architecture produces 48kHz studio-quality audio with natural prosody and emotion. Key capabilities include voice design from text descriptions, few-shot voice cloning, and multilingual synthesis without language-specific modules. The Apache 2.0 project has 8,700 GitHub stars.

open-sourceOpen Source

Amphion

Open-source toolkit for audio, music, and speech generation

Amphion is an open-source audio generation toolkit from OpenMMLab designed for reproducible research in speech synthesis, voice conversion, singing voice synthesis, and text-to-audio generation. It implements state-of-the-art models including MaskGCT, DualCodec, VITS, and VALL-E with built-in architecture visualizations for educational use. The project ships with the Emilia-Large dataset of 200,000 hours of speech data and includes multiple vocoders and evaluation metrics for benchmarking.

open-sourceOpen Source

Related Tools

DenchClaw logo

DenchClaw

Local AI CRM and workflow automation on OpenClaw

DenchClaw is a local AI CRM and workflow automation app built on OpenClaw. It runs on a Mac at localhost, lets users chat with local business data, and focuses on lead enrichment, founder/customer research, and outreach automation. It belongs beside local AI, workflow automation, and OpenClaw-style personal-agent tools rather than pure coding IDEs.

open-sourceOpen Source
Traceway logo

Traceway

OpenTelemetry-native observability with AI tracing, logs, traces, metrics, and session replay — self-hosted in 90 seconds.

Traceway is an open-source, OpenTelemetry-native observability platform that combines logs, traces, metrics, exceptions, session replay, and AI tracing in a single self-hosted system. MIT licensed with no open-core restrictions, it deploys in 90 seconds via Docker Compose and accepts OTLP/HTTP from any OTel SDK without a Collector or per-language vendor SDK.

open-sourceOpen Source
Marqo logo

Marqo

Embedding-first search and discovery engine for AI-powered product experiences.

Marqo is an open-source tensor search engine that combines embedding generation and vector search in a single API, removing the need to manage separate embedding pipelines and vector databases. Built for product discovery and multi-modal search, it lets teams index text, images, and structured data together, returning ranked results based on semantic similarity rather than keyword overlap.

freemium
Freestyle logo

Freestyle

Sandboxes for coding agents — Linux VMs, Git, and deploys in one box

Freestyle is YC-backed sandbox infrastructure built for AI coding agents, shipping secure Linux VMs with nested virtualization, Git servers, and one-click web deploys. It lets agents run real workloads, branch repos, and deploy apps under short-lived identities while billing only for active compute. Used in production by vly.ai, Rork, and Vibeflow.

freemium
OpenSRE logo

OpenSRE

Open-source toolkit for building AI SRE incident response agents

OpenSRE is an open-source Python toolkit from Tracer Cloud for building AI SRE agents that investigate and respond to production incidents. It ships with connectors to Prometheus, Grafana, Kubernetes and incident platforms, plus a simulation harness that replays past incidents so teams can benchmark agent accuracy before trusting it on live pager rotations.

open-sourceOpen Source
Magika logo

Magika

AI-powered file-type detection at Google scale

Open-source AI-powered file-type detection tool from Google that uses a custom deep-learning model under a few megabytes to identify more than 200 binary and textual content types in milliseconds, even on a single CPU. Magika ships as a CLI, Python package, JavaScript/TypeScript library, and an ONNX model, achieves around 99% accuracy on its test set, and is already used at Google scale across Gmail, Drive, and Safe Browsing as well as by VirusTotal and abuse.ch.

freeOpen Source