MCP servers, OpenAPI specs, and GraphQL endpoints all share the same problem when wired into agentic workflows: every session loads full schemas upfront, and those schemas can consume thousands of tokens before any real work begins. In long Claude Code, Cursor, or Codex sessions with multiple servers configured, that overhead compounds across every turn — even when the agent only ever calls one or two tools. mcp2cli turns any of these surfaces into a standard command-line interface at runtime, with no code generation and no persistent server. Tools and arguments load on demand through `--list` and `--help` flags, so the agent fetches only the schema fragments it actually uses, only when it needs them.
The savings claim is concrete: 96 to 99 percent fewer tokens compared to native MCP tool calls in typical workloads. mcp2cli supports MCP over HTTP, SSE, and stdio transports, OpenAPI specs as local files or remote URLs, and GraphQL endpoints with introspection-driven query generation. OAuth is built in across all three modes — authorization code with PKCE for browser logins, client credentials for machine-to-machine flows, and automatic token refresh persisted under `~/.cache/mcp2cli/oauth/`. A `bake` subcommand lets developers freeze a configured connection into a named alias (`mcp2cli @petstore list-pets`), and `bake install` writes a standalone wrapper into `~/.local/bin` so an agent can call the baked tool as if it were a native binary.
mcp2cli surfaced on Hacker News in May 2026 with 144 points and over a hundred comments, driven largely by the "MCP tax" framing that landed with teams already tracking per-session token costs. The project is MIT licensed, written in Python, distributed via PyPI (`uvx mcp2cli` or `uv tool install mcp2cli`), and ships an installable AI agent skill that teaches Claude Code, Cursor, and Codex how to invoke it. With more than 2,100 GitHub stars and active commits in the past week, it represents one of the first production-ready answers to a problem that grows worse as MCP adoption deepens — and as agent sessions get longer, token-efficient tool invocation is likely to become a default expectation rather than an optimization.
