This stack assembles the most capable open-source tools for every stage of the LLM fine-tuning pipeline. LLaMA-Factory serves as the central orchestration hub, providing a unified web UI and CLI that supports over 100 model architectures across every major training methodology from supervised fine-tuning through DPO, PPO, and ORPO preference alignment.
Unsloth acts as the speed multiplier layer, delivering 2-5x faster training with up to 80% less VRAM through custom GPU kernels. LLaMA-Factory natively integrates Unsloth as an acceleration backend, meaning teams can use the familiar LLaMA Board interface while getting Unsloth's kernel optimizations automatically applied to their training runs.
For teams that need deeper control over the training pipeline or prefer working directly with PyTorch, torchtune provides Meta's official fine-tuning library with composable building blocks and minimal abstractions. Its recipes for LoRA, QLoRA, DPO, and knowledge distillation offer a transparent alternative when LLaMA-Factory's higher-level interface does not expose the specific customization needed.
Ray handles the distributed compute dimension, scaling training runs across multiple GPUs and nodes when single-machine approaches hit their limits. Its integration with PyTorch's native distribution primitives and DeepSpeed enables linear scaling of training throughput, while autoscaling clusters dynamically provision resources to match workload demands.
Whisper rounds out the stack by enabling speech-to-text data preparation for voice-related fine-tuning tasks. Teams building custom voice assistants or speech-enabled applications can use Whisper to transcribe and prepare audio datasets that feed into the fine-tuning pipeline, creating a complete workflow from raw audio through deployed language model.
The entire stack runs on open-source software under permissive licenses, keeping the cost limited to compute infrastructure. Teams can start with a single consumer GPU using Unsloth-accelerated QLoRA through LLaMA-Factory, then scale to multi-GPU clusters with Ray as their training needs grow, without changing the fundamental toolchain.