Superagent is an open-source safety SDK designed to protect AI agents from prompt injection attacks, data leakage, and malicious tool calls in production environments. Originally launched as Safety Agent before rebranding in early 2026, the project provides four core modules that work together as a defense-in-depth layer: Guard intercepts and blocks prompt injections and unsafe tool invocations, Redact automatically strips PII, PHI, and secrets from agent inputs and outputs, Scan analyzes GitHub repositories for AI-targeted supply chain attacks, and Test enables automated red-team evaluations of agent robustness.
The SDK ships with open-weight guard models available on HuggingFace in three sizes from 0.6B to 4B parameters, enabling teams to choose between speed and accuracy based on their latency requirements. At the smallest model size, Guard processes requests in 50-100 milliseconds, making it viable for real-time protection without noticeable user impact. The framework integrates with any LLM provider including OpenAI, Anthropic, Google, Groq, and AWS Bedrock, requiring minimal code changes to add safety layers to existing agent architectures.
Backed by Y Combinator, Superagent has attracted 6,500 GitHub stars and provides TypeScript and Python SDKs alongside a CLI tool and MCP server integration. The MIT-licensed project addresses a growing need in the AI ecosystem as more teams deploy autonomous agents that interact with external tools, APIs, and user data. Its modular design means teams can adopt individual components like Guard or Redact independently before committing to the full safety stack.