Loading...
Loading...
Training, fine-tuning, and evaluating AI/ML models for specific use cases
Showing 24 of 61 tools
Autonomous scientific discovery via agentic tree search
AI Scientist v2 is Sakana AI's open-source system for fully autonomous scientific research using LLM-powered agentic tree search. It generates hypotheses, designs experiments, writes and executes code, analyzes results, and produces publishable manuscripts without human intervention. The system uses progressive exploration with backtracking to navigate the research space efficiently.
Lightweight C++ inference for Google Gemma models
gemma.cpp is Google's standalone C++ inference engine built specifically for running Gemma language models without Python or CUDA dependencies. It provides optimized CPU inference using SIMD instructions and Highway library, supports Gemma 2 and Gemma 3 models, and runs on x86 and ARM architectures. Designed for embedded systems, edge devices, and server deployments needing minimal overhead.
State-of-the-art open-source code language models
DeepSeek Coder is a family of open-source code language models trained from scratch on 2 trillion tokens of code and natural language data. Available in sizes from 1B to 33B parameters, these models support 80+ programming languages with 16K context windows and fill-in-the-blank capabilities. DeepSeek Coder outperforms CodeLlama-34B on HumanEval and MBPP benchmarks while being commercially licensable under MIT.
On-device ML solutions for mobile and edge AI
MediaPipe is Google's open-source framework for building on-device machine learning pipelines across mobile, web, desktop, and edge platforms. It provides pre-built solutions for face detection, hand tracking, pose estimation, object detection, image classification, text classification, and on-device LLM inference. MediaPipe runs entirely locally without cloud dependencies, supporting Android, iOS, Python, and web browsers.
Deep learning optimization for distributed training
DeepSpeed is Microsoft's open-source deep learning optimization library that makes distributed training and inference easy, efficient, and effective. Its ZeRO optimizer eliminates memory redundancies across data-parallel processes, enabling training of models with trillions of parameters. DeepSpeed supports 3D parallelism combining data, pipeline, and tensor parallelism, along with mixed precision training, gradient checkpointing, and CPU/NVMe offloading for memory-constrained environments.
DeepSeek's FP8 general matrix multiplication kernels for efficient inference
DeepGEMM is DeepSeek's open-source library of FP8 matrix multiplication CUDA kernels optimized for LLM inference and training on modern NVIDIA GPUs. It provides efficient GEMM operations using 8-bit floating point precision that reduce memory bandwidth requirements while maintaining model accuracy. Designed for integration into inference engines and training frameworks. Over 6,300 GitHub stars.
DeepSeek's expert-parallel communication library for MoE model training
DeepEP is DeepSeek's open-source communication library optimized for expert-parallel training of Mixture-of-Experts models. It provides efficient GPU-to-GPU data routing for distributing tokens to expert networks across multiple devices during MoE model training and inference. Enables the distributed expert parallelism that powers DeepSeek's competitive model efficiency. Over 9,100 GitHub stars.
Multilingual emotional text-to-speech with 80+ language support
Fish Speech is an open-source text-to-speech system supporting 80+ languages with emotional expression, zero-shot voice cloning, and real-time streaming. It generates natural speech with controllable emotions, speaking styles, and prosody. Features a web interface, API server, and integration with AI agent frameworks for voice-enabled applications. Over 29,000 GitHub stars.
Open-source voice cloning and text-to-speech with few-shot learning
GPT-SoVITS is an open-source voice cloning and text-to-speech system that generates natural-sounding speech from just a few seconds of reference audio. It combines GPT-style language modeling with SoVITS voice synthesis for zero-shot and few-shot voice cloning across multiple languages. Supports Chinese, English, Japanese, Korean, and Cantonese with over 56,000 GitHub stars.
DeepSeek's optimized attention kernel for Multi-Head Latent Attention
FlashMLA is DeepSeek's open-source CUDA kernel implementing efficient Multi-Head Latent Attention, the attention mechanism used in DeepSeek-V2 and V3 models. It provides optimized GPU kernels that significantly reduce memory usage and improve inference speed for MLA-based architectures. Represents DeepSeek's contribution to open AI infrastructure with over 12,600 GitHub stars.
Local model inference engine with OpenAI-compatible API and web UI
Xinference is a local inference engine that runs LLMs, embedding models, image generation, and audio models with an OpenAI-compatible API. It provides a web dashboard for model management, supports vLLM, llama.cpp, and transformers backends, and handles multi-GPU deployment automatically. Supports 100+ models including Qwen, Llama, Mistral, and DeepSeek with over 9,200 GitHub stars.
ModelScope's fine-tuning framework supporting 600+ models
ms-swift is ModelScope's open-source framework for fine-tuning over 600 large language and multimodal models. It supports SFT, DPO, RLHF, LoRA, QLoRA, and full fine-tuning with a web UI and CLI interface. Optimized for the Chinese AI ecosystem with native ModelScope Hub integration alongside Hugging Face support. Over 13,500 GitHub stars.
End-to-end open-source platform for training and evaluating foundation models
Oumi is an end-to-end open-source platform for training, fine-tuning, and evaluating foundation models at any scale. It covers data preparation, distributed training, reinforcement learning from human feedback, evaluation benchmarks, and model deployment in a unified framework. Supports training from scratch to post-training alignment with over 9,100 GitHub stars.
Intelligent model router that balances cost and quality across LLM providers
RouteLLM by LMSYS routes LLM requests to the most cost-effective model that can handle each query's complexity. It uses learned routing models to classify whether a query needs a powerful expensive model or can be handled by a cheaper alternative, reducing costs by up to 85% while maintaining quality. Supports OpenAI, Anthropic, and other providers through an OpenAI-compatible API.
Multi-LoRA inference server for serving hundreds of fine-tuned models
LoRAX is an inference server that serves hundreds of fine-tuned LoRA models from a single base model deployment. It dynamically loads and unloads LoRA adapters on demand, sharing the base model's GPU memory across all adapters. Built on text-generation-inference with OpenAI-compatible API. Enables multi-tenant model serving without per-model GPU allocation. Over 3,700 GitHub stars.
OpenAI's open-source speech recognition model for any language
Whisper is OpenAI's open-source automatic speech recognition model trained on 680,000 hours of multilingual audio data. It supports transcription and translation across 99 languages with robust handling of accents, background noise, and technical vocabulary. Available in multiple model sizes from tiny (39M) to large (1.5B parameters) for balancing accuracy and speed.
Meta's official PyTorch library for LLM fine-tuning
torchtune is Meta's official PyTorch-native library for fine-tuning large language models. It provides composable building blocks for training recipes covering LoRA, QLoRA, full fine-tuning, DPO, and knowledge distillation. Supports Llama, Mistral, Gemma, Qwen, and Phi model families with distributed training across multiple GPUs. Designed as a hackable, dependency-minimal alternative to higher-level frameworks.
Serverless GPU compute platform for AI inference and training
Modal is a serverless compute platform that lets developers run AI workloads on GPUs with a Python-first SDK. Functions deploy with simple decorators, auto-scale from zero to thousands of containers, and bill per-second of actual use. Supports LLM inference, fine-tuning, batch processing, and sandboxed environments. Used by Meta, Scale AI, and Harvey. Valued at $1.1B after $87M Series B.
Build and share ML web apps in Python with a few lines of code
Gradio is an open-source Python library for creating interactive web interfaces for machine learning models. It supports any data type including images, audio, video, 3D objects, and dataframes. Apps deploy for free on Hugging Face Spaces with auto-scaling, or can be shared via temporary public links. Gradio 5 adds server-side rendering, WebRTC streaming, and an AI Playground for generating apps.
Distributed AI compute engine for scaling Python and ML workloads
Ray is an open-source distributed computing framework built for scaling AI and Python applications from a laptop to thousands of GPUs. It provides libraries for distributed training, hyperparameter tuning, model serving, reinforcement learning, and data processing under a single unified API. Used by OpenAI for ChatGPT training, Uber, Shopify, and Instacart. Maintained by Anyscale and part of the PyTorch Foundation.
Unified framework for fine-tuning 100+ large language models
LLaMA-Factory is an open-source toolkit providing a unified interface for fine-tuning over 100 LLMs and vision-language models. It supports SFT, RLHF with PPO and DPO, LoRA and QLoRA for memory-efficient training, and continuous pre-training. The LLaMA Board web UI enables no-code configuration, while CLI and YAML workflows serve advanced users. Integrates with Hugging Face, ModelScope, vLLM, and SGLang for model deployment.
Python toolkit for assessing and mitigating ML model fairness issues
Fairlearn is a Microsoft-backed open-source Python toolkit that helps developers assess and improve the fairness of machine learning models. It provides metrics for measuring disparity across groups defined by sensitive features, mitigation algorithms that reduce unfairness while maintaining model performance, and an interactive visualization dashboard for exploring fairness-accuracy trade-offs. Integrated with scikit-learn and Azure ML's Responsible AI dashboard.
Open-source control plane for AI workloads across multi-cloud GPU infrastructure
dstack is an open-source platform that orchestrates AI training and inference workloads across heterogeneous GPU infrastructure spanning multiple clouds, Kubernetes clusters, and bare-metal servers. It abstracts away cloud-specific APIs so teams define GPU requirements declaratively and dstack automatically provisions the cheapest available resources from AWS, GCP, Azure, Lambda, or on-premises hardware.
Microsoft's framework for running 1-bit large language models on consumer CPUs
BitNet is Microsoft's official inference framework for 1-bit quantized large language models that enables running models with up to 100 billion parameters on standard consumer CPUs without requiring a GPU. By leveraging extreme quantization where weights use only 1.58 bits on average, BitNet achieves dramatic reductions in memory footprint and computational cost while maintaining competitive output quality for many practical use cases.