BentoML standardizes the path from trained model to production API. Package models with preprocessing and serving logic into portable Bento archives that deploy consistently anywhere.
Auto-generated REST and gRPC endpoints with OpenAPI documentation. Adaptive batching groups individual requests for efficient GPU utilization. Multi-model inference pipelines chain models in sequence or parallel.
Supports PyTorch, TensorFlow, scikit-learn, XGBoost, Hugging Face, and any Python model. Containerization produces Docker images for deployment to any infrastructure.
BentoCloud provides managed deployment with auto-scaling, GPU scheduling, and monitoring. Open-source BentoML runs on any infrastructure.