11 tools tagged
Showing 11 of 11 tools
ByteDance's open-source multimodal desktop agent with vision-based GUI automation
UI-TARS Desktop is ByteDance's open-source multimodal AI agent that automates desktop and browser interactions using computer vision rather than DOM selectors or accessibility APIs. Powered by the UI-TARS vision model, it can understand and operate any graphical interface by looking at screenshots, making it capable of automating applications that traditional browser automation tools cannot reach, including native desktop apps and complex web UIs.
Browser-native frontend coding agent for production codebases
Stagewise is an open-source frontend coding agent with 6,500+ GitHub stars that runs directly in the browser on localhost. YC S25 backed, it lets developers and designers point-and-prompt on live web applications with full devtools and console access, bridging the gap between visual editing and production codebase modification.
Microsoft's open-source frontier voice AI for long-form multi-speaker audio
VibeVoice is Microsoft's open-source voice AI family with both TTS and speech recognition models. The TTS model generates up to 90 minutes of expressive multi-speaker audio with 4 distinct voices. VibeVoice-ASR transcribes 60-minute recordings in a single pass with speaker identification and timestamps. Built on continuous speech tokenizers at 7.5 Hz and next-token diffusion, it compresses audio 80x more efficiently than Encodec while preserving fidelity.
High-performance open-source web crawler optimized for AI pipelines
Crawl4AI is an open-source Python web crawler built specifically for AI and data pipeline use cases. It features parallel crawling, heuristic-based content extraction, cosine similarity chunking for LLM context optimization, and multiple output formats including LLM-ready markdown. Frequently reaches GitHub trending and is adopted by teams building large-scale RAG datasets and training corpora.
Open-source browser infrastructure for AI agents at scale
Steel is an open-source browser API purpose-built for AI agents, providing managed headless browser sessions with anti-bot bypass, proxy rotation, CAPTCHA solving, and session persistence. It handles the infrastructure layer that browser automation agents like Browser Use and Stagehand run on top of. Self-hostable or available as a cloud service. Over 6,000 GitHub stars.
Open-source generalist AI agent for browser and code tasks
Suna is an open-source generalist AI agent that can autonomously browse the web, write and execute code, manage files, and interact with external services. It features a real-time browser automation engine, an isolated code execution sandbox, and integrations with popular APIs. Designed as an open-source alternative to commercial AI agent platforms. Over 9,000 GitHub stars with rapid community growth.
Cloud browser automation via MCP for scalable testing
Browserbase MCP Server provides cloud-hosted browser automation capabilities to AI agents through the Model Context Protocol. Unlike local browser automation, it runs browsers in Browserbase's cloud infrastructure for scalable web navigation, data extraction, form filling, and screenshot capture without local resource constraints.
Automate local Chrome browser via MCP
BrowserMCP is an MCP server that enables AI agents to automate a local Chrome browser — navigating pages, clicking elements, filling forms, extracting content, and taking screenshots. It gives coding agents the ability to interact with web applications the way a human would, directly from Claude Desktop, Cursor, or any MCP client.
A better way to use the internet
Chromium-based browser by The Browser Company that reimagines the browser UI with a sidebar for tabs, Spaces for context separation (work/personal), Boosts for custom CSS on any website, split views, easels for collecting web content, and built-in ad blocking. Features Arc Max AI for page summaries, tab renaming, and 5-second previews. Available on macOS, Windows, and iOS. Designed for power users who want to organize their browsing experience. Privacy-focused with no data collection.
Identify web technologies
Technology detection tool that identifies the tech stack behind any website — frameworks, CMS, analytics, CDNs, payment processors, and 1,500+ other technologies. Available as a browser extension (Chrome, Firefox, Edge), API, CLI, and bulk lookup service. Useful for competitive analysis, sales prospecting, lead enrichment, and technology trend research. Acquired by Crimson Hexagon (now Brandwatch). Free browser extension with paid API and enterprise tiers for large-scale lookups.
The browser for ambitious developers
Standalone browser built specifically for web developers and designers that shows multiple synchronized viewports side by side for responsive design testing. Features accessibility inspector (WCAG compliance checking), visual regression testing via screenshots, layout debugging overlays, meta tag validator, social media preview cards, and color contrast checker. Reduces the need for constant resizing and device switching during front-end development. Available on macOS, Windows, and Linux.