aicoolies logo

LLaMA-Factory

Unified framework for fine-tuning 100+ large language models

Share
open-sourceOpen Source
Visit Website →

LLaMA-Factory is an open-source toolkit providing a unified interface for fine-tuning over 100 LLMs and vision-language models. It supports SFT, RLHF with PPO and DPO, LoRA and QLoRA for memory-efficient training, and continuous pre-training. The LLaMA Board web UI enables no-code configuration, while CLI and YAML workflows serve advanced users. Integrates with Hugging Face, ModelScope, vLLM, and SGLang for model deployment.

We have a review for this tool

A detailed review by the aicoolies team — click to read

LLaMA-Factory has become a widely adopted open-source fine-tuning framework in the LLM ecosystem, accumulating over 72K+ GitHub stars and a peer-reviewed ACL 2024 publication. The toolkit abstracts away the boilerplate complexity of adapting large language models to custom datasets, offering a single unified interface that spans LLaMA, Mistral, Qwen, Gemma, DeepSeek, ChatGLM, and dozens of other model families. Its support for LoRA and QLoRA with 2/3/4/5/6/8-bit quantization enables fine-tuning surprisingly large models on consumer-grade GPUs, dramatically lowering the barrier to entry for teams without enterprise compute clusters.

The framework covers the full spectrum of modern training methodologies: supervised fine-tuning for instruction following, DPO and KTO for preference alignment, PPO for reinforcement learning from human feedback, and ORPO for combined objectives. Recent 2025 updates added OFT and OFTv2 orthogonal fine-tuning methods, SGLang as an inference backend, multimodal model support including audio understanding, and compatibility with Llama 4, Qwen3, and InternVL3. FlashAttention-2, DeepSpeed, and GaLore integrations further optimize training throughput and memory efficiency.

LLaMA-Factory stands out through exceptional developer experience. The LLaMA Board web interface provides a Gradio-powered dashboard for configuring datasets, selecting training methods, setting hyperparameters, and monitoring experiments through integrated TensorBoard and Weights & Biases tracking. The CLI accepts YAML configuration files with extensive examples for every supported scenario. Trained models can be exported to Hugging Face Hub, served through an OpenAI-compatible API endpoint, or deployed via vLLM and SGLang workers for high-throughput inference.

Pricing

Free and open-source under Apache 2.0 license

Platforms

Python, Linux, macOS, Windows (CUDA GPUs recommended)

Categories

Tags

Use Cases

Alternatives

Related Tools

Deep Lake logo

Deep Lake

AI data runtime for multimodal datasets and vector search

Deep Lake is an open-source AI data runtime from Activeloop for storing, versioning, and querying multimodal data and embeddings. It fits teams building RAG, training, evaluation, or dataset-heavy agent workflows that need a bridge between vector search, structured metadata, and large image, text, audio, or video collections.

open-sourceOpen Source
SeekDB logo

SeekDB

AI-native state store with hybrid vector and full-text search

SeekDB is an open-source AI-native state store from the OceanBase ecosystem that combines MySQL-compatible data access with hybrid vector and full-text retrieval. It targets agent and AI application teams that need embedded or server deployment, copy-on-write style sandboxes, and searchable state without gluing together several separate storage layers.

open-sourceOpen Source
Marqo logo

Marqo

Embedding-first search and discovery engine for AI-powered product experiences.

Marqo is an open-source tensor search engine that combines embedding generation and vector search in a single API, removing the need to manage separate embedding pipelines and vector databases. Built for product discovery and multi-modal search, it lets teams index text, images, and structured data together, returning ranked results based on semantic similarity rather than keyword overlap.

freemium
Magika logo

Magika

AI-powered file-type detection at Google scale

Open-source AI-powered file-type detection tool from Google that uses a custom deep-learning model under a few megabytes to identify more than 200 binary and textual content types in milliseconds, even on a single CPU. Magika ships as a CLI, Python package, JavaScript/TypeScript library, and an ONNX model, achieves around 99% accuracy on its test set, and is already used at Google scale across Gmail, Drive, and Safe Browsing as well as by VirusTotal and abuse.ch.

freeOpen Source
Zep logo

Zep

Context engineering platform for AI agents with temporal knowledge graphs

Zep is a context engineering platform that assembles relationship-aware context for AI agents from conversations, business data, documents, and events. It maintains a temporal knowledge graph that automatically extracts entities and relationships, tracking how context evolves over time. Zep delivers formatted context blocks optimized for LLMs with sub-200ms latency, integrating with LangChain, LlamaIndex, AutoGen, and Google ADK through Python, TypeScript, and Go SDKs.

freemium
Hindsight logo

Hindsight

Agent memory system that learns, not just remembers

Hindsight is an agent memory system that enables AI agents to learn from experience rather than just store conversations. It organizes memories into three biomimetic categories: World knowledge for facts, Experiences for agent events, and Mental Models for learned understanding. The system provides retain, recall, and reflect operations backed by a temporal knowledge graph with parallel retrieval strategies including semantic, keyword, graph traversal, and temporal search.

freemiumOpen Source

Used in Stacks

Comparisons