aicoolies logo

PurpleLlama

Meta's open-source LLM security suite with Llama Guard and CodeShield

Share
open-sourceOpen Source
Visit Website →

PurpleLlama is Meta's open-source suite of tools for evaluating and improving LLM safety. It includes Llama Guard models for input/output content safety classification, LlamaFirewall for multi-layer defense, CodeShield for insecure code detection, and CyberSecEval benchmarks for measuring LLM security. Llama Guard 4 supports multimodal safety across text and images. 4,100+ GitHub stars, backed by Meta AI with 44+ contributors.

We have a review for this tool

A detailed review by the aicoolies team — click to read

PurpleLlama provides a comprehensive toolkit for LLM security that goes beyond simple content filtering. Llama Guard is a family of models purpose-trained for safety classification — they evaluate prompts and responses against configurable safety taxonomies and return structured verdicts. Unlike rule-based filters, Llama Guard understands context and nuance, reducing both false positives and bypasses. The latest Llama Guard 4 extends this to multimodal inputs.

LlamaFirewall implements defense-in-depth with multiple protection layers: prompt injection detection using PromptGuard, agent misalignment monitoring for tool-calling scenarios, and output content scanning. CodeShield specifically targets insecure code generation, detecting common vulnerabilities (SQL injection, XSS, buffer overflows) in LLM-generated code before it reaches production. CyberSecEval provides standardized benchmarks for measuring how well an LLM resists generating harmful content.

The suite is released under a custom open license with 4,100+ GitHub stars. All models run locally without external API calls, making them suitable for air-gapped and regulated environments. Compared to Guardrails AI (which validates structured outputs) or NeMo Guardrails (which controls conversation flows), PurpleLlama focuses specifically on safety classification and security evaluation with purpose-trained models rather than rule-based validation.

Pricing

Free and open-source (custom Meta license)

Platforms

Python, runs locally, models downloadable from HuggingFace

Categories

Tags

Use Cases

Alternatives

Guardrails AI logo

Guardrails AI

Validate and structure LLM outputs with composable Guards

Guardrails AI is an open-source Python and JavaScript framework for validating and structuring LLM outputs using composable Guards built from a Hub of pre-built validators. It handles structured data extraction with Pydantic models, content safety checks including toxicity, PII detection, competitor mentions, and bias filtering, plus automatic re-prompting when validation fails. The Guardrails Hub offers dozens of validators from regex matching to hallucination detection via LLM judges.

free

NeMo Guardrails

Programmable safety rails for LLM applications

NeMo Guardrails is NVIDIA's open-source toolkit for adding programmable safety rails to LLM applications. It supports five guardrail types — input, dialog, retrieval, execution, and output rails — covering content safety, jailbreak detection, topic control, PII masking, hallucination detection, and fact-checking. The toolkit uses Colang, a domain-specific language for defining conversational constraints, and integrates with OpenAI, Azure, Anthropic, HuggingFace, and LangChain/LangGraph.

free
garak logo

garak

NVIDIA's LLM vulnerability scanner and red-teaming tool

garak is NVIDIA's open-source LLM vulnerability scanner for red-teaming AI models and applications. Probes for prompt injection, data leakage, hallucination, toxicity, encoding-based attacks, and dozens of other vulnerability categories. Runs automated attack sequences against any LLM endpoint and generates detailed vulnerability reports. Features a modular probe/detector architecture that is extensible with custom attack patterns. Named after the Star Trek character known for deception.

open-sourceOpen Source

Related Tools

Agent Governance Toolkit logo

Agent Governance Toolkit

Microsoft’s public-preview runtime governance toolkit for policy, identity, sandboxing, audit, and MCP security around AI agents.

Agent Governance Toolkit is Microsoft’s MIT-licensed public-preview toolkit for governing AI agent runtimes. It adds policy enforcement, zero-trust identity, execution sandboxing, audit, reliability, and MCP security-gateway patterns around tool calls and autonomous actions, helping platform teams move beyond prompt-only guardrails while preserving architecture review requirements.

open-sourceOpen SourceTelemetry
Baz logo

Baz

Telemetry-aware AI code reviewer that checks how pull requests may affect real services.

Baz is an AI code-review platform focused on production-aware pull requests. Instead of only reading the diff, Baz connects code changes to application telemetry so reviewers can understand what endpoints, services, and runtime behavior may be affected. That makes it a useful complement to existing AI PR bots when the question is not just whether a change looks correct, but whether it could break a live system.

freemiumTelemetry
rampart

Rampart

Microsoft’s pytest-native red teaming framework for turning AI agent safety findings into CI tests.

RAMPART is an open-source Microsoft framework for safety and security testing of agentic AI applications. It brings red-team findings into a pytest-native workflow so teams can turn prompt injection, unsafe tool use, and behavioral boundary failures into repeatable regression tests. The strongest aicoolies angle is developer workflow: RAMPART makes agent safety part of CI/CD instead of a one-off security review.

open-sourceOpen Source
Statewright logo

Statewright

State-machine guardrails for controlling which tools AI coding agents can use at each phase.

Statewright is a guardrail layer for AI coding agents that uses explicit state machines to control what an agent can do at each stage of a workflow. Instead of relying only on prompt instructions, teams can model phases such as plan, implement, test, and review, then constrain tool access for clients like Claude Code, Codex, Cursor, opencode, and related MCP workflows.

open-sourceOpen Source
Magika logo

Magika

AI-powered file-type detection at Google scale

Open-source AI-powered file-type detection tool from Google that uses a custom deep-learning model under a few megabytes to identify more than 200 binary and textual content types in milliseconds, even on a single CPU. Magika ships as a CLI, Python package, JavaScript/TypeScript library, and an ONNX model, achieves around 99% accuracy on its test set, and is already used at Google scale across Gmail, Drive, and Safe Browsing as well as by VirusTotal and abuse.ch.

freeOpen Source
Trent AI logo

Trent AI

Agentic AI security posture management

Trent AI is a specialized security platform for agentic AI applications providing AI Security Posture Management that compounds with every development cycle. Scans, judges, mitigates, and evaluates AI agent security detecting threats traditional tools miss including prompt injection attacks, tool misuse, unintended autonomous actions, data exfiltration through agent chains, and privilege escalation. Offers continuous assessment with remediation plan execution through Claude Code.

paid

Used in Stacks

Comparisons