Guidance is a structured generation framework from Microsoft that constrains large language model outputs at the token level during decoding. Instead of relying on prompt engineering and hoping the model returns valid JSON, developers define schemas, regex patterns, or context-free grammars and Guidance guarantees the output conforms exactly. This approach eliminates parsing failures, retry loops, and the brittle post-processing code that typically surrounds LLM calls in production pipelines.
The library integrates directly with popular model runtimes including llama.cpp, HuggingFace Transformers, and vLLM for local inference, plus remote providers like OpenAI and Anthropic via their APIs. Its Python-native syntax lets developers interleave free-form generation with hard constraints in a single program — for example, generating a product description and then forcing the model to output a strictly-typed JSON object with price, category, and rating fields. Advanced features include stateful multi-turn control flow, subgrammar composition, and token healing for cleaner outputs.
Guidance is fully open-source under the MIT license with 19,000+ GitHub stars and strong adoption in enterprise AI pipelines. It is particularly valuable for agent tool-calling, function routing, and any workflow where downstream code depends on predictable LLM output structure. The library integrates well with LangChain, LlamaIndex, and custom orchestration layers, making it a foundational building block for reliable AI applications.
