aicoolies logo
rampart

Rampart

Microsoft’s pytest-native red teaming framework for turning AI agent safety findings into CI tests.

Share
open-sourceOpen Source
Visit Website →

RAMPART is an open-source Microsoft framework for safety and security testing of agentic AI applications. It brings red-team findings into a pytest-native workflow so teams can turn prompt injection, unsafe tool use, and behavioral boundary failures into repeatable regression tests. The strongest aicoolies angle is developer workflow: RAMPART makes agent safety part of CI/CD instead of a one-off security review.

RAMPART is Microsoft’s pytest-native framework for safety and security testing of agentic AI applications. Its key idea is to make agent red teaming look more like normal engineering work: define scenarios, run tests, capture failures and keep those checks in CI. That is a useful shift for teams building agents that browse, call tools, write files or interact with untrusted data sources, because safety can become part of the same loop as unit tests and pull requests.

The framework is especially relevant because agent failures are often behavioral and probabilistic rather than simple string-output bugs. RAMPART is designed to help teams turn prompt injection, unsafe tool use, data exfiltration and boundary-violation findings into repeatable regression tests. Its Microsoft/PyRIT lineage and pytest shape make it easier to connect security research with code-owned test suites, statistical evaluation and pull request gates.

RAMPART is not a substitute for thoughtful threat modeling or expert red team review. Poor scenarios will still create false confidence, and model-based testing can introduce cost and flakiness if it is not designed carefully. The value is in making agent safety continuous: developers can keep known failures from returning and add new adversarial cases as the agent gains more tools and autonomy. It belongs beside tools like garak, PyRIT and Promptfoo rather than replacing every security workflow.

Pricing

Free and open source under MIT; teams still pay for any models, infrastructure, or CI resources used during test generation and evaluation.

Platforms

Python/pytest-native framework with GitHub repository, documentation site, PyRIT lineage, and CI/CD fit for agent applications.

Categories

Tags

Use Cases

Alternatives

garak logo

garak

NVIDIA's LLM vulnerability scanner and red-teaming tool

garak is NVIDIA's open-source LLM vulnerability scanner for red-teaming AI models and applications. Probes for prompt injection, data leakage, hallucination, toxicity, encoding-based attacks, and dozens of other vulnerability categories. Runs automated attack sequences against any LLM endpoint and generates detailed vulnerability reports. Features a modular probe/detector architecture that is extensible with custom attack patterns. Named after the Star Trek character known for deception.

open-sourceOpen Source

PyRIT

Microsoft's automated red teaming framework for AI systems

PyRIT (Python Risk Identification Toolkit) is Microsoft's open-source framework for automated red teaming of generative AI systems. It enables security researchers to probe LLMs for jailbreaks, prompt injection, content safety bypasses, and harmful output generation using multi-turn attack strategies, scoring engines, and orchestrated adversarial workflows. Supports multiple target models and integrates with Azure AI services.

open-sourceOpen Source
Promptfoo logo

Promptfoo

LLM testing and evaluation toolkit

Open-source tool for testing, evaluating, and red-teaming LLM applications. Promptfoo lets developers define test cases, run prompts across multiple models and configurations, and score outputs with built-in metrics like factuality, relevance, and toxicity. Includes red-teaming for jailbreak and hallucination detection plus CI/CD integration for automated prompt regression testing.

open-sourceOpen Source
Guardrails AI logo

Guardrails AI

Validate and structure LLM outputs with composable Guards

Guardrails AI is an open-source Python and JavaScript framework for validating and structuring LLM outputs using composable Guards built from a Hub of pre-built validators. It handles structured data extraction with Pydantic models, content safety checks including toxicity, PII detection, competitor mentions, and bias filtering, plus automatic re-prompting when validation fails. The Guardrails Hub offers dozens of validators from regex matching to hallucination detection via LLM judges.

free

Related Tools

Baz logo

Baz

Telemetry-aware AI code reviewer that checks how pull requests may affect real services.

Baz is an AI code-review platform focused on production-aware pull requests. Instead of only reading the diff, Baz connects code changes to application telemetry so reviewers can understand what endpoints, services, and runtime behavior may be affected. That makes it a useful complement to existing AI PR bots when the question is not just whether a change looks correct, but whether it could break a live system.

freemiumTelemetry
Statewright logo

Statewright

State-machine guardrails for controlling which tools AI coding agents can use at each phase.

Statewright is a guardrail layer for AI coding agents that uses explicit state machines to control what an agent can do at each stage of a workflow. Instead of relying only on prompt instructions, teams can model phases such as plan, implement, test, and review, then constrain tool access for clients like Claude Code, Codex, Cursor, opencode, and related MCP workflows.

open-sourceOpen Source
OpenHuman logo

OpenHuman

Local-first personal AI agent with memory trees, desktop integrations, and private workspace context.

OpenHuman is an open-source, local-first personal AI agent from TinyHumans. It combines a desktop app, persistent memory trees, Obsidian-compatible storage, OAuth integrations, and local model support into a private assistant harness. It is most interesting for users who want agentic workflows and long-term memory without handing every context detail to a fully cloud-hosted assistant.

open-sourceOpen SourceTelemetry
Unabyss logo

Unabyss

MCP-native personal context vault for keeping AI agents aligned with your work, voice, and projects.

Unabyss is a personal context headquarters for AI agents. It syncs sources such as email, Slack, Notion, Drive, meetings, and professional profiles into structured context files that can be served to MCP-capable clients. The strongest angle is not generic note taking; it is permissioned, reusable context for Claude, Cursor, custom agents, and other tools that otherwise need the same background explained repeatedly.

freemiumTelemetry
Re_gent logo

Re_gent

Version control for AI coding-agent actions

Re_gent is an open-source version-control layer for AI coding-agent activity. Instead of only reviewing the final Git diff, it records what the agent attempted, changed, and executed along the way so teams can trace, undo, and govern autonomous coding work. It fits Claude Code, Codex, Cursor, and multi-agent teams that need an audit trail between prompt and pull request.

open-sourceOpen Source

agentmemory

Persistent memory layer for AI coding agents — keeps Claude Code, Codex, Cursor, and any MCP agent in context across sessions

agentmemory is an open-source MCP server that gives AI coding agents persistent, cross-session memory. Built on hybrid vector-graph search, it achieves 95.2% recall on the LongMemEval-S benchmark while using up to 92% fewer context tokens than naive context injection. Works out of the box with Claude Code, Codex, Cursor, Windsurf, Cline, OpenCode, Kilo Code, Hermes, and any MCP client through 51 MCP tools plus 12 hooks and 4 skills.

open-sourceOpen Source