aicoolies logo

ps-fuzz Review: Open-Source Prompt Security Fuzzer for Hardening LLM System Prompts

ps-fuzz (Prompt Security Fuzzer) is an open-source tool from Prompt Security that tests GenAI application system prompts against 16 different dynamic LLM-based attacks across 16 LLM providers. It provides interactive and CLI modes for iterative prompt hardening with multi-threaded testing. The tool dynamically adapts attacks based on your prompt's context and domain. Free and open-source with a community-driven approach to expanding attack types.

Reviewed by Raşit Akyol on March 31, 2026

Share
Overall
72
Speed
70
Privacy
85
Dev Experience
74

What ps-fuzz Does

ps-fuzz, the Prompt Security Fuzzer, is an open-source security testing tool built by Prompt Security specifically for GenAI applications. Its core function is to assess the security of your application's system prompt by simulating various LLM-based attacks and reporting which ones succeed in breaking through your prompt's defenses. The tool then provides a security evaluation that guides developers in strengthening their system prompts iteratively.

Attack Types and Dynamic Fuzzing

The fuzzer supports 16 different attack types that cover the major categories of prompt-based vulnerabilities. These include AIM Jailbreak which probes ethical compliance through roleplay scenarios, DAN (Do Anything Now) jailbreaks that test resilience against unrestricted persona adoption, Amnesia attacks that attempt to make the LLM forget system instructions, Typoglycemia attacks that exploit text processing by omitting characters, and System Prompt Stealer attempts to extract internal configuration. Each attack type is dynamically tailored to your application's specific context.

What sets ps-fuzz apart from static prompt injection datasets is its dynamic testing approach. Rather than firing the same generic payloads at every application, the fuzzer reads your system prompt, understands its context and domain, and adapts its attack generation accordingly. This produces more realistic and meaningful test results because the attacks are contextually relevant to what your application actually does.

Provider Support and Modes

The tool supports 16 different LLM providers including OpenAI, Anthropic, Azure OpenAI, Google PaLM, Cohere, and many others. This LLM-agnostic design means teams can test against whichever provider they are deploying to production, ensuring the security evaluation reflects real-world behavior. Configuration is handled through environment variables for API keys, with a .env file option for convenience.

ps-fuzz offers both interactive and CLI modes. The interactive Playground mode is particularly valuable because it lets developers iterate on their system prompt in real time, running the fuzzer after each modification to see if hardening attempts actually improve security. The CLI mode supports automation and CI/CD integration with multi-threaded testing for faster execution. Custom benchmarks can be loaded from CSV files for organization-specific attack scenarios.

Reporting and Installation

Results are reported in three categories: Broken (attacks that the LLM succumbed to), Resilient (attacks the LLM resisted), and Errors (inconclusive results). This clear categorization makes it straightforward to identify which attack vectors your system prompt is vulnerable to and prioritize hardening efforts. The tool includes example system prompts of varying security strengths for benchmarking purposes.

Installation is simple via pip as a Python package. The extensible architecture allows anyone to contribute new attack types by following a straightforward pattern: create a new Python file in the attacks directory, implement a test class, and register it. The project actively encourages community contributions of novel attack techniques, maintaining a growing library of increasingly sophisticated test scenarios.

Limitations and Complementary Use

There are important limitations to understand. As with all prompt fuzzing tools, ps-fuzz tests a finite set of known attack patterns against the essentially infinite attack surface of natural language. A clean fuzzing report does not guarantee prompt security; it means the prompt resisted the specific attacks tested. The tool consumes LLM tokens during testing, which can add up during extensive fuzzing campaigns with multiple attack types and iterations.

ps-fuzz works well as a complement to runtime guardrail solutions like NeMo Guardrails or LLM Guard. While those tools provide real-time protection during inference, ps-fuzz operates at development time to strengthen the system prompt itself. Used together, they create a layered defense where the prompt is hardened against known attacks and runtime guardrails catch novel or zero-day attack patterns.

The Bottom Line

For the current state of LLM security tooling, ps-fuzz provides genuine value at zero cost. The fact that it dynamically adapts attacks rather than relying purely on static payloads puts it ahead of many alternatives, though it is worth noting that more comprehensive commercial platforms like Promptfoo offer deeper customization and broader vulnerability coverage. As a free first step in prompt security testing, ps-fuzz is hard to beat.

Pros

  • Dynamic attack adaptation tailors test scenarios to your specific system prompt context and domain rather than using generic static payloads
  • Supports 16 LLM providers including OpenAI Anthropic Azure Cohere and Google PaLM enabling testing against your actual production provider
  • Interactive Playground mode enables iterative prompt hardening with immediate feedback on whether changes improve security
  • 16 built-in attack types covering jailbreaks prompt stealing amnesia typoglycemia ethical compliance and more
  • Multi-threaded CLI mode supports CI/CD integration and automated testing with custom benchmarks from CSV files
  • Completely free and open-source with community-driven attack library that grows as contributors add new techniques
  • Clean extensible architecture makes adding custom attack types straightforward following a simple Python class pattern

Cons

  • Testing a finite set of known attack patterns against the infinite attack surface of natural language means clean results do not guarantee security
  • Consumes LLM API tokens during fuzzing campaigns which can accumulate significant costs during thorough multi-attack multi-iteration testing
  • Smaller attack library compared to commercial alternatives like Promptfoo which offer broader vulnerability coverage and deeper customization
  • Results can be noisy with inconclusive errors category that requires manual interpretation to determine actual security implications
  • No built-in reporting dashboard or historical tracking of prompt security improvements across development iterations

Verdict

ps-fuzz fills an important gap in the AI security toolchain by providing a structured way to test system prompts against known attack patterns before deploying LLM applications to production. The dynamic adaptation of attacks based on your specific prompt context is genuinely more useful than static payload libraries, though the tool is still limited by the fundamental challenge that LLM attack surfaces are vastly larger than traditional injection vectors. The interactive Playground mode for iterative prompt hardening is the standout feature, letting teams strengthen their prompts through multiple rounds of testing. As a free, open-source tool it should be part of every LLM application development workflow, but teams should understand it as one layer of defense rather than a comprehensive security solution.

View ps-fuzz on aicoolies

Pricing, platforms, and community stacks — explore the full tool page

Alternatives to ps-fuzz

PromptLayer logo

PromptLayer

Prompt registry, observability, and evaluation workflows for LLM applications.

PromptLayer is a prompt management, observability, and evaluation platform for LLM applications. Teams use its Prompt Registry, visual editor, request logs, Tables, evaluations, Tool Registry, and Skill Collections to version prompts, replay requests, compare variants, run datasets, and ship prompt changes without redeploying code. Pricing starts with Free $0 for 5 users and 2.5K requests/month, Pro $49/month, Team $500/month, and Enterprise custom.

freemium
Guardrails AI logo

Guardrails AI

Validate and structure LLM outputs with composable Guards

Guardrails AI is an open-source Python and JavaScript framework for validating and structuring LLM outputs using composable Guards built from a Hub of pre-built validators. It handles structured data extraction with Pydantic models, content safety checks including toxicity, PII detection, competitor mentions, and bias filtering, plus automatic re-prompting when validation fails. The Guardrails Hub offers dozens of validators from regex matching to hallucination detection via LLM judges.

free

NeMo Guardrails

Programmable safety rails for LLM applications

NeMo Guardrails is NVIDIA's open-source toolkit for adding programmable safety rails to LLM applications. It supports five guardrail types — input, dialog, retrieval, execution, and output rails — covering content safety, jailbreak detection, topic control, PII masking, hallucination detection, and fact-checking. The toolkit uses Colang, a domain-specific language for defining conversational constraints, and integrates with OpenAI, Azure, Anthropic, HuggingFace, and LangChain/LangGraph.

free