What is Promptfoo?
Promptfoo is an open-source CLI and library for evaluating and red-teaming LLM applications. It helps teams move from ad-hoc prompt testing to repeatable evaluation workflows that can run locally, in CI, or as part of release gates. The core use cases are prompt regression testing, model comparison, assertion-based checks, security probes, and reviewing outputs through a web interface. It is best understood as the testing layer of an LLMOps stack rather than a full production observability platform.
How Promptfoo works
Promptfoo configurations define prompts, providers, test cases, and optional assertions. A typical workflow is to create a config, run an eval, inspect the comparison matrix in the web view, and commit the test suite alongside application code. That structure makes prompt and model changes easier to review in pull requests. Instead of relying on a developer's memory of whether a prompt used to work, teams can define expected behavior and rerun the same checks whenever prompts, retrieval logic, model versions, or provider settings change.
Evaluation and regression testing
Promptfoo is especially useful as a pre-deployment quality gate. Teams can test prompt variants, compare providers, validate JSON structure, check for expected facts, and use custom assertions for application-specific behavior. This is valuable for RAG systems, support bots, extraction pipelines, and agents where small prompt changes can cause major output regressions. The caveat is that Promptfoo does not magically create good evals; teams still need representative datasets, clear pass/fail criteria, and periodic human review of test quality.
Red teaming and security testing
Promptfoo also includes a red-team workflow for simulated adversarial inputs. Its documentation positions this for finding vulnerabilities before deployment, including many vulnerability categories and test paths through HTTP APIs, browsers, or direct model access. This is useful for prompt injection, jailbreak, data leakage, harmful content, and policy-bypass scenarios. Red-team output should be treated as a risk signal and regression source, not a replacement for human security review or domain-specific threat modeling.
CI/CD and engineering workflow
Promptfoo's strongest fit is engineering-driven LLM development. Config files can live in source control, run in GitHub Actions or other CI systems, and produce artifacts that help reviewers understand behavior changes. A practical setup is to run a fast regression suite on pull requests, a larger nightly eval suite, and scheduled red-team scans for sensitive workflows. This makes LLM behavior part of the software delivery pipeline instead of a manual QA step that happens after release.
Where Promptfoo is weaker
Promptfoo is not primarily a production trace analytics or user-feedback platform. It can help evaluate prompts and models, but teams that need request logging, trace search, feedback capture, labeling, datasets, and production-to-eval loops will usually pair it with Langfuse, Helicone, Confident AI, or custom telemetry. LLM-as-judge and red-team runs also consume model/API tokens, so large eval suites need budget control and sampling strategy.