PersonaPlex is a real-time, full-duplex speech-to-speech conversational model developed by NVIDIA Research that enables developers to build AI voice assistants with controllable personas. Unlike traditional text-to-speech pipelines that chain separate ASR, LLM, and TTS models, PersonaPlex operates as a single end-to-end model that can simultaneously listen and speak, producing natural conversational dynamics including interruptions, backchanneling, and turn-taking. The model is trained on a combination of synthetic and real conversation data, resulting in low-latency responses that feel human-like rather than robotic.
The core innovation is persona control through two mechanisms: text-based role prompts that define the assistant's personality, knowledge, and conversational style, and audio-based voice conditioning that selects from pre-packaged voice embeddings. NVIDIA ships embeddings across natural-sounding voices labeled NATF0-NATF3 and NATM0-NATM3 for female and male voices respectively, plus a variety set with more diverse vocal characteristics. This dual-control approach means developers can create distinct AI characters that maintain consistent voice and personality across long interactions.
Released in January 2026, PersonaPlex is built on the Moshi architecture and weights, with NVIDIA's additions focused on persona conditioning and voice diversity. The project has accumulated over 8,000 GitHub stars and includes a companion Hugging Face model (personaplex-7b-v1). The code is MIT-licensed while the model weights use NVIDIA's Open Model License, making it accessible for both research and commercial applications in voice agent development, interactive gaming NPCs, and customer-facing conversational interfaces.