2 tools tagged
Showing 2 of 2 tools
NVIDIA's optimized AI model serving platform
Triton Inference Server is NVIDIA's open-source inference serving platform that deploys AI models from TensorRT, PyTorch, ONNX, TensorFlow, OpenVINO, Python, and more across cloud, data center, and edge environments. It supports dynamic batching, model ensembles, concurrent model execution on GPUs and CPUs, and real-time, streaming, and batch inference patterns. Includes Model Analyzer for profiling and Model Navigator for automated optimization.
NVIDIA's real-time persona-driven voice dialogue model
PersonaPlex is NVIDIA's open-source, full-duplex speech-to-speech conversational AI model that enables persona control through text-based role prompts and audio-based voice conditioning. Built on the Moshi architecture, it produces natural, low-latency spoken interactions with consistent persona across conversations. The model supports multiple pre-packaged voice embeddings for both natural and varied speaking styles, making it suitable for building interactive voice agents and assistants.