Test runners, visual regression, E2E testing, and QA automation
Showing 24 of 73 tools
Reusable computer vision tools for developers
Supervision is an open-source Python toolkit by Roboflow providing reusable CV utilities for detection, tracking, annotation, and dataset management. It works with any model including YOLO and Hugging Face via a standardized Detections class. Features include 20+ annotators, ByteTrack object tracking, zone counting, speed estimation, and dataset conversion between COCO, YOLO, and Pascal VOC formats.
AI-powered test generation agent for automated code coverage improvement
qodo-cover (formerly Cover Agent) is an open-source AI agent that automatically generates meaningful unit tests to improve code coverage. It analyzes existing code and test patterns to produce tests that follow project conventions and target uncovered branches. Uses an iterative approach where generated tests are verified by running them, discarding those that fail. MIT licensed with over 5,300 GitHub stars.
CNCF Sandbox chaos engineering framework for Kubernetes resilience
Krkn is a CNCF Sandbox chaos engineering tool that tests Kubernetes cluster resilience by injecting controlled failures. It simulates pod kills, node failures, network partitions, CPU/memory pressure, and zone outages. Krkn-AI adds AI-powered scenario generation that suggests chaos experiments based on cluster topology. Supports CI/CD integration for automated resilience testing in deployment pipelines.
GenAI-powered test agent with natural language test authoring
KaneAI is LambdaTest's GenAI-powered test automation agent that creates, evolves, and debugs tests from natural language descriptions. It generates test scripts in multiple frameworks including Selenium, Playwright, and Cypress from plain English instructions. Features intelligent test maintenance that automatically updates tests when application UI changes and two-way editing between natural language and code.
Codeless browser testing with visual test recorder and scheduling
Ghost Inspector provides codeless browser testing through a visual recorder that captures user interactions and converts them into automated test suites. Tests run on managed infrastructure with scheduled execution, CI/CD integration, and Slack notifications. Features visual comparison for UI regression detection, API testing, and test organization with folders and tags for managing large test suites.
AI-powered E2E testing with plain English test authoring
testRigor enables end-to-end test creation in plain English without coding or element selectors. Tests describe user actions in natural language like 'click on the Submit button' and testRigor's AI interprets and executes them across web, mobile, and API. Self-healing tests automatically adapt to UI changes. Supports cross-browser testing, visual validation, and integration with CI/CD pipelines.
Property-based API fuzz testing from OpenAPI and GraphQL schemas
Schemathesis automatically generates test cases from OpenAPI and GraphQL schemas to find crashes, validation errors, and specification violations in APIs. It uses property-based testing and fuzzing techniques to explore edge cases that manual test writing misses. CLI tool and Python library with CI/CD integration. Over 3,200 GitHub stars with support for authentication, custom checks, and stateful testing.
Blazing-fast PHP linter, formatter, and static analyzer in Rust
Mago is a comprehensive PHP toolchain written in Rust that unifies linting, formatting, and static analysis into a single binary. It enforces PER-CS formatting standards, catches code smells with 100+ lint rules, and performs deep type inference for semantic analysis. Inspired by Clippy and OXC from the Rust ecosystem, it delivers performance orders of magnitude faster than PHPStan and Psalm while requiring no PHP runtime to execute.
Browser automation CLI built for AI agents by Vercel Labs
Agent Browser is a Rust-based browser automation CLI designed specifically for AI agent workflows rather than traditional testing. Developed by Vercel Labs, it provides semantic element selection through a refs system, accessibility tree snapshots, session persistence, and authentication vaults. Unlike Playwright or Puppeteer which target test automation, Agent Browser optimizes for token efficiency and deterministic element selection that gives LLMs reliable browser interaction capabilities.
Benchmark for evaluating AI coding agents on real GitHub issues
SWE-bench is a benchmark from Princeton NLP that evaluates AI coding agents by testing their ability to resolve real GitHub issues from popular open-source projects. Each task provides an issue description and repository state, and the agent must produce a working patch that passes the project's test suite. With 4,600+ GitHub stars, it has become the standard yardstick for comparing autonomous coding tools like Devin, Claude Code, and OpenHands.
Python toolkit for assessing and mitigating ML model fairness issues
Fairlearn is a Microsoft-backed open-source Python toolkit that helps developers assess and improve the fairness of machine learning models. It provides metrics for measuring disparity across groups defined by sensitive features, mitigation algorithms that reduce unfairness while maintaining model performance, and an interactive visualization dashboard for exploring fairness-accuracy trade-offs. Integrated with scikit-learn and Azure ML's Responsible AI dashboard.
Find unused files, dependencies, and exports in JavaScript and TypeScript projects
Knip is an open-source CLI tool that detects unused files, dependencies, devDependencies, and exports in JavaScript and TypeScript codebases. It analyzes the full dependency graph to identify dead code that accumulates over time — especially relevant for AI-generated codebases where unused artifacts pile up faster than manual cleanup can handle. With over 10,800 GitHub stars, it has become a standard code hygiene tool in the JS/TS ecosystem.
Free open-source local AWS emulator as a drop-in LocalStack replacement
Floci is a free open-source AWS emulator designed as a lightweight drop-in replacement for LocalStack Community Edition. It runs on port 4566 with the same endpoint conventions, supporting S3, SQS, DynamoDB, RDS, ElastiCache, API Gateway, Cognito, IAM, and twenty-plus other services. The Docker image is ninety megabytes versus LocalStack's one gigabyte and starts in twenty-four milliseconds.
Bayesian git bisection for finding commits that caused flaky tests
Git Bayesect applies Bayesian inference to git bisection, solving the problem of finding commits that introduced non-deterministic bugs like flaky tests. Unlike standard git bisect which requires binary pass-fail results, Git Bayesect handles probabilistic outcomes where a test might pass sometimes and fail sometimes, using entropy minimization to efficiently narrow down the culprit commit.
Static linter that catches production bugs in AI-generated code
prodlint is a zero-config static analysis tool with 52 rules targeting production bugs that AI coding tools consistently produce. It catches hallucinated npm imports, missing authentication checks, Prisma writes outside transactions, exposed secrets via NEXT_PUBLIC prefixes, and other patterns specific to code generated by Cursor, Claude Code, Bolt, and v0. Runs in one second via npx with no configuration needed.
AI-powered vision-driven UI automation for web, Android, and iOS
Midscene.js is an open-source UI automation framework from ByteDance's Web Infra team that uses vision-based AI models to understand and interact with interfaces. It replaces fragile CSS selectors with natural language descriptions, supporting web browsers via Playwright and Puppeteer, Android via ADB, and iOS via WebDriverAgent from a unified JavaScript SDK.
Google's vulnerability scanner using the OSV database
OSV-Scanner is Google's official open-source vulnerability scanner that checks your project's dependencies against the OSV.dev database — the largest open vulnerability database covering all major ecosystems. Written in Go, it supports lockfiles from npm, pip, Maven, Cargo, Go modules, and more, providing actionable remediation guidance and CI/CD integration for automated security scanning.
Free MIT-licensed drop-in replacement for LocalStack
MiniStack is a free, MIT-licensed drop-in replacement for LocalStack that emulates 33 AWS services using real infrastructure — actual Postgres for RDS, real Redis for ElastiCache, real Docker for ECS — rather than faking API responses. Born from LocalStack's surprise paywall in March 2026, it starts in 2 seconds, idles at 30MB RAM versus LocalStack's ~500MB, and runs real services for accurate local AWS development.
Autonomous AI pentester for web apps and APIs
Shannon is an autonomous AI-powered penetration testing tool that achieves a 96.15% success rate on the XBOW benchmark — significantly above the industry average. Using a multi-agent pipeline built on Anthropic's Agent SDK and Playwright, it performs reconnaissance, vulnerability analysis, exploitation, and reporting on web applications and APIs, having discovered 7 zero-day vulnerabilities in production software.
Autonomous AI agent for end-to-end software testing
TestSprite is an AI testing agent that autonomously generates test plans, writes test code, executes tests, and reports results without human intervention. Ranked #1 on Product Hunt for Developer Tools in 2025, it handles test creation from natural language requirements through execution and regression tracking. Credit-based SaaS model with 121K+ monthly visits.
AI-powered E2E test generation and maintenance platform
Octomind is an AI-powered testing platform that automatically generates, runs, and maintains end-to-end Playwright tests for web applications. It observes user flows, creates test cases from natural language descriptions, and self-heals tests when UI changes would break traditional selectors. Backed by $4.8M seed funding from Paua Ventures with enterprise production deployments.
Open-source LLM red-teaming framework with 40+ attack types
DeepTeam is an open-source red-teaming framework for systematically testing LLM applications against 40+ adversarial attack types. It covers OWASP Top 10 for LLMs including jailbreaks, prompt injection, PII leakage, and hallucination attacks. Built as the sister project of DeepEval for security testing alongside evaluation. Apache-2.0 licensed.
Open-source codeless test automation with NLP-based test creation
Testsigma is an open-source codeless test automation platform where tests are written in plain English using NLP-based interpretation. It supports web, mobile, and API testing with self-healing test maintenance that adapts to UI changes automatically. Apache 2.0 licensed Community Edition is free with full functionality. Cloud edition adds parallel execution, integrations, and team management. Active development with regular releases through 2026.
Unified API, performance, and contract testing DSL
Karate is an open-source testing framework that unifies API testing, performance testing, UI automation, and contract testing in a single BDD-style DSL. Write tests in plain Gherkin-like syntax without any Java knowledge. Built-in assertions, data-driven testing, parallel execution, and HTML reports. 8,200+ GitHub stars, MIT licensed. 7+ years of active development with Global 2000 enterprise adoption for comprehensive API quality assurance.