aicoolies logo

# llm-api

23 tools tagged

Showing 23 of 23 tools

xAI Python SDK logo

xAI Python SDK

Official Python SDK for the xAI API

The xAI Python SDK is the official Python client for the xAI API, giving developers a direct way to build Grok-powered apps without relying on community proxies or unofficial wrappers. It supports synchronous and asynchronous Python clients for chat completions, streaming responses, function/tool calling, and multimodal workflows, making it a clean fit for backend services, agents, notebooks, and developer tools that need programmatic xAI access.

open-sourceOpen Source
Cerebras logo

Cerebras

Wafer-scale inference at thousands of tokens per second

Cerebras Inference serves open-weight LLMs like Llama, Qwen, and GPT-OSS on wafer-scale CS-3 chips through an OpenAI-compatible API, benchmarking between 1,800 and 2,600 output tokens per second on Llama 3.1 8B and several hundred on 70B models. A free tier offers one million tokens per day with no credit card, while paid pay-per-token pricing starts at $0.04 per million tokens for the smaller Llama models.

freemium
new-api logo

New API

Unified LLM API gateway and proxy hub

New API is an open-source multi-tenant AI gateway that aggregates and distributes LLM API requests across providers like OpenAI, Claude, and Gemini through a unified proxy interface. It cross-converts requests into OpenAI-compatible, Claude-compatible, or Gemini-compatible formats, with built-in channel management, quota control, token-based authentication, and billing capabilities. Deploy via Docker with SQLite or MySQL for centralized model management.

open-sourceOpen Source
Tokscale logo

Tokscale

CLI token usage tracker for AI coding agents

Tokscale is a CLI tool that tracks token usage and costs across AI coding agents including Claude Code, Codex, OpenCode, Gemini CLI, Cursor, and more. Built with a native Rust core for high-performance processing, it provides detailed breakdowns of input, output, cache, and reasoning tokens with real-time pricing calculations via LiteLLM data. Features include interactive 2D/3D contribution graphs, web visualization dashboards, global leaderboards, and JSON export for cost analysis.

open-sourceOpen Source
DeepInfra logo

DeepInfra

Cost-effective AI inference platform with 86+ models from $0.02/M tokens

DeepInfra is an AI inference platform offering 86+ LLM models with pricing starting at $0.02 per million tokens. Backed by $20.6M in funding including an $18M Series A from Felicis Ventures, it provides OpenAI-compatible endpoints for models including DeepSeek, Llama, and Mistral with pay-as-you-go pricing.

api-usage-based
TensorZero logo

TensorZero

Open-source LLM gateway with built-in optimization and A/B testing

TensorZero is an open-source LLMOps platform in Rust that unifies an LLM gateway, observability, prompt optimization, and A/B experimentation in a single binary. It routes requests across providers with sub-millisecond P99 latency at 10K+ QPS while capturing structured data for continuous improvement. Supports dynamic in-context learning, fine-tuning workflows, and production feedback loops. Backed by $7.3M seed funding, 11K+ GitHub stars.

open-sourceOpen Source
Tavily logo

Tavily

Real-time search API built for AI agents

Tavily is an AI-native search API that provides real-time web search, content extraction, and crawling capabilities specifically designed for LLM applications and autonomous agents. It returns structured, citation-ready results optimized for RAG workflows with built-in safety features including prompt injection protection and PII leak prevention. Acquired by Nebius in 2026, Tavily integrates with LangChain, LlamaIndex, and major agent frameworks, serving over one million developers worldwide.

freemiumOpen Source
LM Studio logo

LM Studio

Run local LLMs with an intuitive desktop GUI and OpenAI-compatible API server.

Free desktop application by Element Labs for discovering, downloading, and running open-source LLMs locally. Features a curated Hugging Face model browser, side-by-side model comparison, parameter tuning, and an OpenAI-compatible API server on localhost:1234. Powered by llama.cpp with Metal acceleration for Apple Silicon.

free
OpenAI Assistants API logo

OpenAI Assistants API

Thread-based AI assistant API with tools and file support

OpenAI's platform API for building stateful AI assistants. Manages conversation threads, supports function calling, code interpreter, and file search (RAG) out of the box. Usage-based pricing makes it accessible for startups and enterprises alike, with built-in memory and tool orchestration for production-grade conversational applications.

api-usage-based
OpenAI API logo

OpenAI API

API for GPT-5 family models, multimodal generation, embeddings, and agents

Official API platform for the GPT-5 family, reasoning/thinking variants, multimodal generation, speech, embeddings, and agent workflows. Features the Responses API, tool calling, structured outputs, batch processing, fine-tuning, and SDK support. It remains one of the most widely integrated AI APIs in the developer ecosystem, but model choice, retention settings, rate limits, and pricing tiers require active governance in production.

api-usage-based
Anthropic API logo

Anthropic API

Direct API access to Claude models with tool use

Official API for Claude models including Opus, Sonnet, and Haiku. Supports tool use, computer use, extended thinking, and batch processing. Features prompt caching, streaming, and Messages API with vision capabilities. Known for strong performance on complex reasoning tasks, nuanced instruction following, and safety-conscious design that makes it trusted for enterprise and production applications.

api-usage-based
Google Vertex AI logo

Google Vertex AI

Google Cloud ML platform with Gemini and custom models

Google Cloud's end-to-end ML platform with Gemini models, Model Garden featuring 150+ models, AutoML, and custom training pipelines. Features Vertex AI Search, Conversation, and Agent Builder for enterprise AI applications. The comprehensive platform for organizations building production AI systems at scale within the Google Cloud ecosystem, with enterprise governance and compliance built in.

api-usage-based
Azure OpenAI logo

Azure OpenAI

OpenAI models with Azure enterprise security

Microsoft's enterprise gateway to OpenAI models — GPT-5-family models, reasoning variants, real-time/audio options, and Azure-hosted governance — with Azure security, compliance, and global infrastructure. Azure OpenAI is designed for teams that need OpenAI capability inside existing Microsoft cloud controls.

api-usage-based
AWS Bedrock logo

AWS Bedrock

Managed foundation models on AWS

Fully managed AWS service providing enterprise access to 100+ foundation models from Anthropic, Meta, Mistral, Cohere, and Amazon's Nova family through a single API. Bedrock includes AgentCore for agent runtime, Knowledge Bases for RAG, Guardrails blocking 88% of harmful content, plus Model Distillation, Prompt Caching, and Intelligent Prompt Routing for cost optimization.

api-usage-based
Cohere logo

Cohere

Enterprise AI for text generation, search, and RAG

Enterprise-focused AI platform from former Google Brain researchers offering Command (chat), Embed (semantic search), and Rerank (result ordering) model families. Cohere Embed v4 supports 100+ languages with multimodal text/image inputs, North agent workspace processes documents and spreadsheets, and Model Vault enables secure VPC or on-premises deployment for regulated enterprises.

freemium
Hugging Face logo

Hugging Face

The GitHub of ML — model hub, datasets, and inference

Open-source platform for building, sharing, and deploying machine learning models and datasets. Hosts 500k+ models, 100k+ datasets, and Spaces for interactive demos. The central hub of the open-source AI ecosystem, providing model discovery, inference APIs, and collaborative tools that make it the GitHub of machine learning for researchers and developers worldwide.

freemiumOpen Source
Replicate logo

Replicate

Run and deploy ML models via API with simple pricing

Cloud platform that lets developers run thousands of open-source and proprietary public ML models through a simple API without managing GPUs or infrastructure. Replicate hosts models for image, text, audio, and video, supports Cog-based custom deployments and private models, and now operates as a distinct Cloudflare brand with pay-by-time or input/output pricing depending on the model.

api-usage-based
Fireworks AI logo

Fireworks AI

Production-grade inference with serverless and on-demand GPUs

High-performance inference platform serving open-source and custom AI models at global scale, processing 13+ trillion tokens daily at ~180K requests per second. Fireworks AI delivers 1,000+ tokens per second on large models through quantization-aware tuning and adaptive speculation, with serverless, fine-tuning, and dedicated GPU options across text, image, and audio modalities.

freemium
Groq logo

Groq

Ultra-fast LPU inference for open-weight models

Groq is an AI inference provider built around custom Language Processing Unit (LPU) hardware for low-latency open-weight model serving. GroqCloud exposes an OpenAI-compatible API for Llama, GPT-OSS, Qwen, Kimi, DeepSeek, Gemma, Whisper, and related models, with high token-throughput positioning, model-specific rate limits, and usage-based pricing.

freemium
Together AI logo

Together AI

Open-weight inference, fine-tuning, and GPU-cloud platform

Together AI is a cloud platform for running, fine-tuning, batching, and training open-weight AI models. It supports serverless inference, dedicated endpoints, LoRA and full fine-tuning, GPU clusters, code-execution sandboxes, and async batch jobs up to 30B tokens per model. Current docs list fast-moving families such as Qwen, Kimi, GLM, GPT-OSS, DeepSeek, Llama, MiniMax, and Mistral.

api-usage-based
OpenRouter logo

OpenRouter

Unified API gateway for 200+ AI models

Unified API gateway providing access to 500+ AI models from leading providers through a single OpenAI-compatible interface. OpenRouter eliminates the need to manage separate keys, billing, and integrations across providers like OpenAI, Anthropic, Google, and Meta, with built-in plugins for web search, PDF processing, automatic fallback routing, and per-model cost tracking.

api-usage-based
DeepSeek logo

DeepSeek

Low-cost reasoning and coding models with V4 API options

Chinese AI research lab developing low-cost reasoning and coding models with a fast-moving hosted API surface. Current API docs foreground DeepSeek V4 Flash and V4 Pro with thinking/non-thinking modes, OpenAI- and Anthropic-compatible endpoints, 1M context, JSON output, tool calls, and chat-prefix/FIM options. Free chat assistant and API access are available, while open-weight/self-hosting claims should be checked against current model repositories.

freemiumTelemetry
Mistral AI logo

Mistral AI

Open-weight frontier lab with Vibe, Studio, and a European AI cloud

Mistral AI is the French frontier-AI lab behind open-weight and commercial models, Mistral Vibe (formerly Le Chat), Studio, agentic coding, and the European-hosted Mistral Compute cloud. It gives developers an EU-centered alternative across API, assistant, agent-platform, and sovereign-infrastructure workflows, with model-specific licensing and pricing that should be checked per workload.

freemium