Run and fine-tune Vision Language Models locally on Mac
Open-source Python package for running and fine-tuning Vision Language Models locally on Mac using Apple's MLX framework. Supports multimodal inference with images, audio, and video across Qwen, DeepSeek, Phi, and Gemma architectures. Features OpenAI-compatible API server, Gradio chat UI, and KV cache optimization. 3.8K+ GitHub stars.