What This Stack Does
PurpleLlama provides Meta's Llama Guard models for content safety classification and CodeShield for insecure code detection. Agentic Radar scans agent workflows for MCP vulnerabilities. DeepEval offers pytest-native LLM testing with 50+ metrics. Promptfoo adds CLI-first evaluation with red-teaming attack generation. TruLens provides experiment tracking with RAG Triad metrics. Langfuse delivers observability for production monitoring.
The Bottom Line
Deploy PurpleLlama for content safety, Agentic Radar for agent architecture auditing, DeepEval and Promptfoo for pre-deployment testing, TruLens for experiment tracking, and Langfuse for production observability. The stack covers the complete lifecycle from development testing to production monitoring.