Testing & QA
Test runners, visual regression, E2E testing, and QA automation
Showing 24 of 104 tools
Safari MCP Server
Apple's Safari-native MCP server for web debugging agents
Safari MCP Server is Apple's safaridriver-based MCP server in Safari Technology Preview, giving compatible coding agents local access to Safari page content, console logs, network requests, screenshots, JavaScript evaluation, interactions, viewport controls, and accessibility/performance checks.
Rampart
Microsoft’s pytest-native red teaming framework for turning AI agent safety findings into CI tests.
RAMPART is an open-source Microsoft framework for safety and security testing of agentic AI applications. It brings red-team findings into a pytest-native workflow so teams can turn prompt injection, unsafe tool use, and behavioral boundary failures into repeatable regression tests. The strongest aicoolies angle is developer workflow: RAMPART makes agent safety part of CI/CD instead of a one-off security review.
Requestly
One tool for intercepting, mocking, and replaying HTTP — acquired by BrowserStack
Requestly is a BrowserStack-backed API client, HTTP interceptor, mock server, and session replay tool for frontend and QA teams. Its current product is commercial/API-client led, while the legacy interceptor/open-source code is AGPLv3. The free plan covers individual workflows, and Pro lists at $12/user/month monthly or $9/user/month annually for collaborative QA and frontend debugging teams.
Browserbase
Headless browser cloud built for AI agents
Browserbase is cloud infrastructure that runs headless Chromium browsers on demand for AI agents and automation workflows, exposing Playwright, Puppeteer, and Selenium endpoints with built-in session replay, residential proxies, CAPTCHA solving, and stealth fingerprints. It also hosts Stagehand and a Model Gateway, letting teams build browser-using agents without maintaining their own fleet of Kubernetes-managed Chromium instances.
Anchor Browser
Cloud browser infrastructure for AI agents
Anchor Browser provides secure cloud-managed browser infrastructure for computer-use agents. Deploy humanized Chromium instances that access any website while maintaining bot-detection evasion and authentication support. Features OmniConnect for authentication lifecycle management, Web Action Cache for deterministic workflows, and built-in VPN infrastructure. Includes free tier and paid plans supporting millions of concurrent browser sessions for scalable agent automation.
RagaAI Catalyst
AI testing and evaluation for agents and LLM apps
RagaAI Catalyst is a comprehensive Python SDK for observability, monitoring, and evaluation of LLM and agentic applications. Provides agent tracing with execution graph visualization, self-hosted dashboard with analytics, synthetic data generation, multi-metric evaluation framework, and guardrail management. Built for teams running production RAG systems and AI agents who need systematic testing, debugging, and performance optimization workflows.
Laminar
Open-source observability for AI agents
Laminar is an open-source observability platform for AI agents providing tracing, evaluation, and analytics for LLM applications. It integrates with Vercel AI SDK, LangChain, OpenAI, and Anthropic with a single line of code. Features include OpenTelemetry-native SDKs, an extensible evaluation framework with CI/CD support, SQL access to traces and metrics, and a visual debugging timeline for agent reasoning and actions.
Great Expectations
Data quality validation framework for Python
Great Expectations is an open-source Python framework for validating, documenting, and profiling data quality. Teams define expectations as expressive unit tests for their data using an intuitive API, then validate datasets against those rules in CI/CD pipelines or production workflows. It connects to pandas, Spark, and SQL sources, generates data documentation automatically, and integrates with orchestrators like Airflow and Prefect for continuous data quality monitoring.
Lychee
Fast async link checker written in Rust
Lychee is a fast, asynchronous link checker written in Rust that finds broken URLs and email addresses in Markdown, HTML, reStructuredText, and websites. Available as a CLI tool, Rust library, and GitHub Action, it validates links with configurable concurrency, rate limiting, and retry logic. Supports GitHub token authentication for API rate limit avoidance and can check both internal file links and external HTTP endpoints across entire repositories or websites.
Lost Pixel
Open-source visual regression testing tool
Lost Pixel is an open-source visual regression testing tool that serves as an alternative to Percy and Chromatic. It captures and compares screenshots of UI components and application pages across Storybook, Ladle, Histoire, and custom screenshot sources like Cypress or Playwright. Integrated directly into GitHub Actions pipelines, it detects unintended visual changes before they reach production, with a free SaaS tier available for open-source projects.
Agentic Security
LLM vulnerability scanner and red teaming kit
Agentic Security is an open-source vulnerability scanner for LLM agent workflows that tests AI systems against jailbreaks, fuzzing, and multimodal attacks. It probes weaknesses across text, image, and audio inputs through multi-step jailbreak simulations, randomized stress testing, and reinforcement learning-powered adaptive attacks. The toolkit connects directly to LLM APIs for high-volume real-world attack scenarios, helping developers identify and patch safety gaps before deployment.
Cleanlab
AI-powered data quality for ML datasets
Cleanlab is a data-centric AI library that automatically detects and fixes label errors, outliers, and data quality issues in machine learning datasets. It works with any ML model and any data type including text, images, tabular, and audio by analyzing model predictions to identify mislabeled examples, near-duplicates, and ambiguous data points. Cleanlab helps teams improve model accuracy by cleaning training data rather than tuning model architecture.
Foundry
Blazing fast Solidity development toolkit
Foundry is a blazing-fast portable toolkit for Ethereum smart contract development written in Rust. It provides Forge for testing and deploying Solidity contracts with native Solidity tests that run orders of magnitude faster than JavaScript-based alternatives, Cast for interacting with EVM chains from the command line, Anvil as a local testnet node, and Chisel as a Solidity REPL. Foundry has become the standard toolchain for serious Solidity development.
reviewdog
Automated code review for any linter on CI
reviewdog is an open-source automated code review tool that integrates any linter or static analysis tool with GitHub, GitLab, Bitbucket, and Gitea pull requests. Parses output in errorformat, Checkstyle XML, SARIF, and JSON formats to post inline review comments on changed lines only. Works with GitHub Actions, Travis CI, CircleCI, GitLab CI, and Jenkins. Supports 40+ languages through universal linter adapter architecture.
Browserless
Headless browsers in Docker for automation at scale
Browserless is a headless browser-as-a-service platform that deploys Chrome, Firefox, and WebKit in Docker containers for web scraping, testing, and AI agent automation. It provides Puppeteer and Playwright-compatible APIs, a built-in MCP server for connecting AI assistants to browser automation, screenshot and PDF generation, and connection pooling for high-concurrency workloads. Available as self-hosted open source or managed cloud.
ty
Extremely fast Python type checker written in Rust
ty is an extremely fast Python type checker built in Rust by Astral, the team behind Ruff and uv. It performs full type inference, supports PEP 695 type parameter syntax, and checks Python code orders of magnitude faster than mypy or pyright. ty completes the Astral Python toolchain alongside Ruff for linting and uv for package management, giving developers a unified Rust-powered development experience.
Playwright MCP
Microsoft's MCP server for structured browser automation by AI agents
Playwright MCP is Microsoft's Model Context Protocol server that enables AI agents to automate web browsers through structured tool calls. It exposes Playwright's browser automation capabilities as MCP tools for navigation, clicks, forms, extraction, and screenshots. The Microsoft-maintained repo has 30K+ GitHub stars and is a durable default for structured browser interaction in agent workflows.
qodo-cover
AI-powered test generation agent for automated code coverage improvement
qodo-cover (formerly Cover Agent) is an open-source AI agent that automatically generates meaningful unit tests to improve code coverage. It analyzes existing code and test patterns to produce tests that follow project conventions and target uncovered branches. Uses an iterative approach where generated tests are verified by running them, discarding those that fail. MIT licensed with over 5,300 GitHub stars.
Krkn
CNCF Sandbox chaos engineering framework for Kubernetes resilience
Krkn is a CNCF Sandbox chaos engineering tool that tests Kubernetes cluster resilience by injecting controlled failures. It simulates pod kills, node failures, network partitions, CPU/memory pressure, and zone outages. Krkn-AI adds AI-powered scenario generation that suggests chaos experiments based on cluster topology. Supports CI/CD integration for automated resilience testing in deployment pipelines.
LambdaTest KaneAI
GenAI-powered test agent with natural language test authoring
KaneAI is LambdaTest's GenAI-powered test automation agent that creates, evolves, and debugs tests from natural language descriptions. It generates test scripts in multiple frameworks including Selenium, Playwright, and Cypress from plain English instructions. Features intelligent test maintenance that automatically updates tests when application UI changes and two-way editing between natural language and code.
Ghost Inspector
Codeless browser testing with visual test recorder and scheduling
Ghost Inspector provides codeless browser testing through a visual recorder that captures user interactions and converts them into automated test suites. Tests run on managed infrastructure with scheduled execution, CI/CD integration, and Slack notifications. Features visual comparison for UI regression detection, API testing, and test organization with folders and tags for managing large test suites.
testRigor
AI-powered E2E testing with plain English test authoring
testRigor enables end-to-end test creation in plain English without coding or element selectors. Tests describe user actions in natural language like 'click on the Submit button' and testRigor's AI interprets and executes them across web, mobile, and API. Self-healing tests automatically adapt to UI changes. Supports cross-browser testing, visual validation, and integration with CI/CD pipelines.
Schemathesis
Property-based API fuzz testing from OpenAPI and GraphQL schemas
Schemathesis automatically generates test cases from OpenAPI and GraphQL schemas to find crashes, validation errors, and specification violations in APIs. It uses property-based testing and fuzzing techniques to explore edge cases that manual test writing misses. CLI tool and Python library with CI/CD integration. 3.4K+ GitHub stars with support for authentication, custom checks, stateful testing, JUnit XML, and Allure reports.
Mago
Blazing-fast PHP linter, formatter, and static analyzer in Rust
Mago is a comprehensive PHP toolchain written in Rust that unifies linting, formatting, and static analysis into a single binary. It enforces PER-CS formatting standards, catches code smells with 100+ lint rules, and performs deep type inference for semantic analysis. Inspired by Clippy and OXC from the Rust ecosystem, it delivers performance orders of magnitude faster than PHPStan and Psalm while requiring no PHP runtime to execute.