Container-native local AI model serving with Podman
RamaLama is an open-source tool that containerizes AI model inference using Podman or Docker, eliminating host system configuration complexity. It auto-detects GPUs (NVIDIA, AMD, Intel, Apple Silicon), pulls models from HuggingFace, Ollama, and OCI registries, and runs them in isolated rootless containers with read-only mounts and network isolation. Developed under the Containers project (Red Hat ecosystem), it brings familiar container workflows to local LLM serving.