Instructor is the most popular Python library for extracting structured, validated data from large language models, with over 3 million monthly downloads and support across Python, TypeScript, Go, Ruby, Elixir, and Rust. It solves the challenge of getting reliable, schema-conformant outputs from LLMs by using Pydantic models to define output schemas and automatically handling validation, retries, and error correction when the model output does not match the expected structure. Instructor provides a thin, zero-cost abstraction that patches existing LLM client libraries rather than replacing them, preserving full access to the underlying API features.
Instructor differentiates itself with automatic retry logic that feeds validation errors back to the model for self-correction, semantic validation capabilities for checking outputs against sophisticated criteria beyond simple type checking, and streaming support for processing structured data as it arrives. The library supports 15+ providers including OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Ollama, and DeepSeek through a unified from_provider() interface. Recent integrations with OpenAI Responses API, comprehensive multi-modal support, and the llms.txt specification for AI-readable documentation keep Instructor at the forefront of structured output tooling.
Instructor is designed for developers and data engineers who need to extract structured information from LLM responses in production applications, from simple classification tasks to complex multi-field extraction pipelines. It integrates seamlessly with existing OpenAI, Anthropic, and other provider client libraries, making it easy to add structured output capabilities to any existing LLM workflow with minimal code changes. The library is particularly well-suited for data pipelines, content classification, entity extraction, and any use case where LLM outputs need to conform to a predefined schema with guarantees of type safety and validation.