PurpleLlama provides a comprehensive toolkit for LLM security that goes beyond simple content filtering. Llama Guard is a family of models purpose-trained for safety classification — they evaluate prompts and responses against configurable safety taxonomies and return structured verdicts. Unlike rule-based filters, Llama Guard understands context and nuance, reducing both false positives and bypasses. The latest Llama Guard 4 extends this to multimodal inputs.
LlamaFirewall implements defense-in-depth with multiple protection layers: prompt injection detection using PromptGuard, agent misalignment monitoring for tool-calling scenarios, and output content scanning. CodeShield specifically targets insecure code generation, detecting common vulnerabilities (SQL injection, XSS, buffer overflows) in LLM-generated code before it reaches production. CyberSecEval provides standardized benchmarks for measuring how well an LLM resists generating harmful content.
The suite is released under a custom open license with 4,100+ GitHub stars. All models run locally without external API calls, making them suitable for air-gapped and regulated environments. Compared to Guardrails AI (which validates structured outputs) or NeMo Guardrails (which controls conversation flows), PurpleLlama focuses specifically on safety classification and security evaluation with purpose-trained models rather than rule-based validation.