Loading...
Loading...
Writing tests, visual regression, and automated QA workflows
Showing 24 of 97 tools
Production-grade browser automation with AI self-healing and Playwright code ownership
Intuned is a code-first browser automation platform that turns natural language prompts into production-ready Playwright code, deploys it, and self-heals it when target sites change. Supports TypeScript and Python with Anthropic Computer Use, OpenAI CUA, Stagehand, Browser-Use, and Gemini Computer Use integrations. Built-in stealth, captcha solving, auth session management, and scheduled runs with concurrency control. No vendor lock-in—you own the code.
One tool for intercepting, mocking, and replaying HTTP — acquired by BrowserStack
Requestly is an open-source HTTP interceptor, API client, and session replay tool that lets developers modify, mock, and debug network traffic without leaving the browser. Acquired by BrowserStack and trusted by 200,000+ developers, it bundles a Chrome extension, a full API client, mock servers, and shareable session captures into one free-plus-commercial product.
Self-evolution engine for AI agents with auditable updates
Evolver is an open-source self-evolution engine for AI agents that turns run logs into auditable, reviewable updates via its Genome Evolution Protocol. Instead of ad hoc prompt tweaking, teams collect traces and Evolver proposes versioned diffs to prompts, tools and workflows that engineers can approve, reject or roll back like code.
Official Chrome DevTools MCP server for coding agents
chrome-devtools-mcp is the Chrome DevTools team's official MCP server that lets coding agents control and inspect a live Chrome browser with first-party Chrome DevTools Protocol fidelity. It exposes Network inspection, Performance traces, Lighthouse audits, console output, and structured DOM snapshots as typed MCP tools, so agents can debug real pages and ship reliable web performance investigations without resorting to brittle DOM scraping.
Headless browser cloud built for AI agents
Browserbase is cloud infrastructure that runs headless Chromium browsers on demand for AI agents and automation workflows, exposing Playwright, Puppeteer, and Selenium endpoints with built-in session replay, residential proxies, CAPTCHA solving, and stealth fingerprints. It also hosts Stagehand and a Model Gateway, letting teams build browser-using agents without maintaining their own fleet of Kubernetes-managed Chromium instances.
AI testing and evaluation for agents and LLM apps
RagaAI Catalyst is a comprehensive Python SDK for observability, monitoring, and evaluation of LLM and agentic applications. Provides agent tracing with execution graph visualization, self-hosted dashboard with analytics, synthetic data generation, multi-metric evaluation framework, and guardrail management. Built for teams running production RAG systems and AI agents who need systematic testing, debugging, and performance optimization workflows.
Open-source observability for AI agents
Laminar is an open-source observability platform for AI agents providing tracing, evaluation, and analytics for LLM applications. It integrates with Vercel AI SDK, LangChain, OpenAI, and Anthropic with a single line of code. Features include OpenTelemetry-native SDKs, an extensible evaluation framework with CI/CD support, SQL access to traces and metrics, and a visual debugging timeline for agent reasoning and actions.
Data quality validation framework for Python
Great Expectations is an open-source Python framework for validating, documenting, and profiling data quality. Teams define expectations as expressive unit tests for their data using an intuitive API, then validate datasets against those rules in CI/CD pipelines or production workflows. It connects to pandas, Spark, and SQL sources, generates data documentation automatically, and integrates with orchestrators like Airflow and Prefect for continuous data quality monitoring.
Fast async link checker written in Rust
Lychee is a fast, asynchronous link checker written in Rust that finds broken URLs and email addresses in Markdown, HTML, reStructuredText, and websites. Available as a CLI tool, Rust library, and GitHub Action, it validates links with configurable concurrency, rate limiting, and retry logic. Supports GitHub token authentication for API rate limit avoidance and can check both internal file links and external HTTP endpoints across entire repositories or websites.
Open-source visual regression testing tool
Lost Pixel is an open-source visual regression testing tool that serves as an alternative to Percy and Chromatic. It captures and compares screenshots of UI components and application pages across Storybook, Ladle, Histoire, and custom screenshot sources like Cypress or Playwright. Integrated directly into GitHub Actions pipelines, it detects unintended visual changes before they reach production, with a free SaaS tier available for open-source projects.
LLM vulnerability scanner and red teaming kit
Agentic Security is an open-source vulnerability scanner for LLM agent workflows that tests AI systems against jailbreaks, fuzzing, and multimodal attacks. It probes weaknesses across text, image, and audio inputs through multi-step jailbreak simulations, randomized stress testing, and reinforcement learning-powered adaptive attacks. The toolkit connects directly to LLM APIs for high-volume real-world attack scenarios, helping developers identify and patch safety gaps before deployment.
Microsoft's MCP server for structured browser automation by AI agents
Playwright MCP is Microsoft's Model Context Protocol server that enables AI agents to automate web browsers through structured tool calls. It exposes Playwright's browser automation capabilities as MCP tools that LLMs can invoke for navigating pages, clicking elements, filling forms, extracting content, and taking screenshots. Provides structured, reliable browser interaction for AI agent workflows.
AI-powered test generation agent for automated code coverage improvement
qodo-cover (formerly Cover Agent) is an open-source AI agent that automatically generates meaningful unit tests to improve code coverage. It analyzes existing code and test patterns to produce tests that follow project conventions and target uncovered branches. Uses an iterative approach where generated tests are verified by running them, discarding those that fail. MIT licensed with over 5,300 GitHub stars.
CNCF Sandbox chaos engineering framework for Kubernetes resilience
Krkn is a CNCF Sandbox chaos engineering tool that tests Kubernetes cluster resilience by injecting controlled failures. It simulates pod kills, node failures, network partitions, CPU/memory pressure, and zone outages. Krkn-AI adds AI-powered scenario generation that suggests chaos experiments based on cluster topology. Supports CI/CD integration for automated resilience testing in deployment pipelines.
GenAI-powered test agent with natural language test authoring
KaneAI is LambdaTest's GenAI-powered test automation agent that creates, evolves, and debugs tests from natural language descriptions. It generates test scripts in multiple frameworks including Selenium, Playwright, and Cypress from plain English instructions. Features intelligent test maintenance that automatically updates tests when application UI changes and two-way editing between natural language and code.
Codeless browser testing with visual test recorder and scheduling
Ghost Inspector provides codeless browser testing through a visual recorder that captures user interactions and converts them into automated test suites. Tests run on managed infrastructure with scheduled execution, CI/CD integration, and Slack notifications. Features visual comparison for UI regression detection, API testing, and test organization with folders and tags for managing large test suites.
AI-powered E2E testing with plain English test authoring
testRigor enables end-to-end test creation in plain English without coding or element selectors. Tests describe user actions in natural language like 'click on the Submit button' and testRigor's AI interprets and executes them across web, mobile, and API. Self-healing tests automatically adapt to UI changes. Supports cross-browser testing, visual validation, and integration with CI/CD pipelines.
Property-based API fuzz testing from OpenAPI and GraphQL schemas
Schemathesis automatically generates test cases from OpenAPI and GraphQL schemas to find crashes, validation errors, and specification violations in APIs. It uses property-based testing and fuzzing techniques to explore edge cases that manual test writing misses. CLI tool and Python library with CI/CD integration. Over 3,200 GitHub stars with support for authentication, custom checks, and stateful testing.
Blazing-fast PHP linter, formatter, and static analyzer in Rust
Mago is a comprehensive PHP toolchain written in Rust that unifies linting, formatting, and static analysis into a single binary. It enforces PER-CS formatting standards, catches code smells with 100+ lint rules, and performs deep type inference for semantic analysis. Inspired by Clippy and OXC from the Rust ecosystem, it delivers performance orders of magnitude faster than PHPStan and Psalm while requiring no PHP runtime to execute.
Browser automation CLI built for AI agents by Vercel Labs
Agent Browser is a Rust-based browser automation CLI designed specifically for AI agent workflows rather than traditional testing. Developed by Vercel Labs, it provides semantic element selection through a refs system, accessibility tree snapshots, session persistence, and authentication vaults. Unlike Playwright or Puppeteer which target test automation, Agent Browser optimizes for token efficiency and deterministic element selection that gives LLMs reliable browser interaction capabilities.
Benchmark for evaluating AI coding agents on real GitHub issues
SWE-bench is a benchmark from Princeton NLP that evaluates AI coding agents by testing their ability to resolve real GitHub issues from popular open-source projects. Each task provides an issue description and repository state, and the agent must produce a working patch that passes the project's test suite. With 4,600+ GitHub stars, it has become the standard yardstick for comparing autonomous coding tools like Devin, Claude Code, and OpenHands.
Python toolkit for assessing and mitigating ML model fairness issues
Fairlearn is a Microsoft-backed open-source Python toolkit that helps developers assess and improve the fairness of machine learning models. It provides metrics for measuring disparity across groups defined by sensitive features, mitigation algorithms that reduce unfairness while maintaining model performance, and an interactive visualization dashboard for exploring fairness-accuracy trade-offs. Integrated with scikit-learn and Azure ML's Responsible AI dashboard.
Find unused files, dependencies, and exports in JavaScript and TypeScript projects
Knip is an open-source CLI tool that detects unused files, dependencies, devDependencies, and exports in JavaScript and TypeScript codebases. It analyzes the full dependency graph to identify dead code that accumulates over time — especially relevant for AI-generated codebases where unused artifacts pile up faster than manual cleanup can handle. With over 10,800 GitHub stars, it has become a standard code hygiene tool in the JS/TS ecosystem.
Open-source sandboxes and SDKs for AI agents that control desktops
CUA is an open-source infrastructure platform for building, benchmarking, and deploying AI agents that autonomously control full desktop environments. It provides secure sandboxed VMs across macOS, Linux, Windows, and Android with a unified Python SDK for screenshots, mouse/keyboard control, shell commands, and file I/O. Includes CuaBot CLI for running agents in sandboxes, Cua-Bench for standardized evaluation, and Lume for near-native macOS virtualization on Apple Silicon.