Skip to content

API Pricing & Free Tier Matrix

What it is

The API Pricing & Free Tier Matrix is a consolidated reference for the costs and free access policies of major Large Language Model (LLM) providers and AI platforms. It tracks pricing for tokens, developer program benefits, and specific model-level quotas.

What problem it solves

LLM pricing is notoriously complex, with costs varying by several orders of magnitude between "mini" and "frontier" models. Furthermore, free tiers are often hidden or poorly documented. This matrix allows developers to perform a "budget-first" architectural selection, choosing models that fit their financial constraints and usage patterns.

Where it fits in the stack

This document belongs to the Layer 1: Providers and Layer 2: Models analysis layer. It provides the economic context for the tools documented in docs/tools/providers/ and docs/tools/ai_knowledge/.

Typical use cases

  • Budgeting: Estimating the monthly cost of running a specific agentic workflow (e.g., using GPT-5.4 mini for routine tasks).
  • Free-Tier Hunting: Identifying which providers offer enough free credits to build and test a prototype without a credit card.
  • Provider Switching: Comparing the "Intelligence-per-Dollar" value when deciding whether to migrate from one provider to another (e.g., Anthropic to Google).
  • Quota Management: Checking Rate Per Minute (RPM) and Tokens Per Day (TPD) limits for free tiers to avoid service interruptions.

Strengths

  • Consolidated: Aggregates data from over 30 providers in a single view.
  • Agent-Optimized: Specifically highlights "mini" models and high-speed providers ideal for autonomous agents.
  • Evidence-Based: Includes direct links to official documentation for every claim.
  • Automated: Uses scripts to maintain a "Capability Capacity Summary" of known free tokens.

Limitations

  • High Volatility: Prices and free-tier availability can change weekly.
  • Regional Variation: Some free tiers or pricing models may not be available in all jurisdictions.
  • Tier Complexity: Many providers use opaque "usage tiers" (Tier 1-5) that affect limits beyond simple price-per-token.

When to use it

  • Use it during the design phase of an AI application to select a model based on cost-efficiency.
  • Use it when you hit a rate limit and need to find an alternative provider with higher free-tier capacity.
  • Use it to verify if a "free trial" mentioned in a tutorial is still currently active.

When not to use it

  • Do not use it as a legally binding price list; always verify with the provider's official dashboard before committing to large-scale spend.
  • Do not use it for consumer-facing chat app pricing (e.g., ChatGPT Plus vs. Claude Pro) unless it specifically mentions API benefits.

Getting started

  1. Identify your primary requirement (e.g., "Coding", "Fast", or "Budget").
  2. Consult the Capability Capacity Summary to see which models currently offer the best free-tier value.
  3. Use the Canonical pricing matrix to jump to the official pricing and documentation for your chosen provider.
  4. Check the Model-level quota tracker for specific RPM/TPD limits.

Status legend

  • Yes = official free tier/trial access is currently documented.
  • Partial = limited free usage exists (for example, selected models/features).
  • No = no current free trial/tier is documented.
  • Unclear = pricing/billing docs do not clearly confirm a standing free tier.

Canonical pricing matrix (last verified: 2026-05-14)

Provider / Platform Official links Free tier / trial Evidence summary
OpenAI Docs · Pricing No Usage-priced API; current pricing centers GPT-5.5, GPT-5.4, and GPT-5.4 mini.
Anthropic (Claude API) Docs · Pricing Yes New users receive small starter API credits. Current API pricing lists Claude Opus 4.7/4.6/4.5, Sonnet 4.6/4.5, and Haiku 4.5.
Google Gemini Developer API Docs · Pricing Yes Pricing page documents a Free plan with selected free input/output token access; Gemini 3.1 Pro Preview is paid-only, while Gemini 3.1 Flash-Lite Preview has free rows. Gemini 4 Maverick preview free for dev accounts.
OpenRouter Docs · Pricing Yes Free plan and free-model routing are documented.
xAI (Grok API) Docs · Pricing Yes Docs mention monthly free requests/credits.
Z.ai (GLM API) Docs · Pricing Yes New users can claim free API token packages.
Alibaba DashScope (Qwen APIs) Docs · Pricing Yes Many models show temporary free quota periods.
Cohere Docs · Pricing Yes Trial API keys are free and rate-limited.
Mistral AI Docs · Pricing Yes Experiment plan supports free API testing.
Together AI Docs · Pricing No Billing docs indicate paid credits are required.
Groq Docs · Pricing Yes Free API plan and model-level limits are documented; exact current limits should be read from the account limits page for production budgeting.
Kiro Docs · Pricing Yes Perpetual free tier (50 credits/mo) + 500 bonus credits.
Fireworks AI Docs · Pricing Yes Public pricing notes starter free credits.
Replicate Docs · Pricing Partial Some models can be run free before billing.
DeepSeek API Docs · Pricing Unclear Granted balances are mentioned, fixed free tier unclear.
Perplexity API Docs · Pricing No Purchased credits and top-up requirements documented.
AI21 Docs · Pricing Yes Pricing page advertises free trial credits.
Abacus.AI Docs · Pricing Yes Free trial and ChatLLM free access documented.
Voyage AI Docs · Pricing Unclear Paid rates are clear; standing free tier not explicit.
Cloudflare Workers AI Docs · Pricing Yes Free plan includes daily usage.
Hugging Face Inference Providers Docs · Pricing Yes Monthly included inference credits by account tier.
Cerebras Inference Docs · Pricing Yes Pricing references free-tier usage/credits.
NVIDIA API Catalog Docs · Pricing Yes Starter credits are referenced publicly.
SambaNova Cloud Docs · Pricing Unclear Public page shows paid plans; no stable free policy.
AWS Bedrock Docs · Pricing No Metered pay-as-you-go pricing.
Amazon Q Docs · Pricing Yes Free tier for individuals/developers is documented.
Azure OpenAI Service Docs · Pricing Unclear Metered service; only account-level cloud credits may apply.
Vertex AI (Gemini via GCP) Docs · Pricing Unclear Metered pricing; no persistent API free-tier statement.
OCI Generative AI Docs · Pricing Unclear Paid rates are public; free tier not clearly documented.
MiniMax Docs · Pricing Yes Coding Plan provides a low-cost entry tier; trial credits available.
Moonshot AI Docs · Pricing Partial Trial credits are typically granted to new developer accounts.

Enterprise Productivity Suite (2026 Benchmarks)

Tool Pricing Model Free Tier / Trial Notes
Fyxer AI ~$30-$50/user/mo 14-day Trial Executive admin & inbox management.
Glean Enterprise Quote No Unified search across 100+ SaaS apps.
Hebbia Institutional Quote No High-precision analytical search for Finance/Law.
tl;dv Freemium Yes Meeting recorder with free unlimited tier.

Developer Program Plans

These are the core subscription plans for developers that bundle AI access, cloud credits, and other professional benefits.

Program / Plan Cost AI Access & Quotas Cloud Credits & Benefits
Google Developer Program — Standard Free 10 Firebase Studio workspaces; Gemini Code Assist (Basic); Gemini CLI (60 RPM / 1000 RPD) Monthly Google Skills credits (via GEAR); community access; private previews.
Google Developer Program — Premium $24.99/mo or $299/yr 30 Firebase Studio workspaces; Gemini Code Assist (Higher); Gemini CLI (120 RPM / 1500 RPD) $45/mo ($550/yr) GenAI/Cloud credit; $500 bonus credit upon certification; 1 Cloud cert voucher; expert consultation.
Google Developer Program — Enterprise Preview Gemini Code Assist Enterprise; Gemini CLI (120 RPM / 2000 RPD) $150/mo Google Cloud credit; centralized purchasing; developer sandboxes.

Model-level quota tracker (expanded list)

This section is grouped by provider with compact four-column tables for narrower screens.

  • Verified = core limits are visible in official docs.
  • Partially verified = provider free-tier stance is verified, but model-level quotas are dynamic or not fully published.
  • Unverified = values come from community reports or account-specific observations not explicitly documented.

Code Generation Quality is subjective and treated as community-assessed, not an official benchmark metric.

Quick jump: Google Gemini · OpenAI · Anthropic Claude · Groq · Together AI · Hugging Face · Mistral AI · DeepSeek · Cohere · OpenRouter · Cerebras · xAI Grok · Kiro

Quotas format is context / RPM / RPD / TPM / daily token cap. n/p means "not published."

Capability tags:

  • CODE code generation and refactoring tasks.
  • VERIFY cross-checking, factual validation, and test review.
  • REASON complex reasoning and multi-step planning.
  • LONGCTX long documents, large prompts, and retrieval-heavy workflows.
  • FAST low-latency interactions and interactive agent loops.
  • BUDGET better free-tier value or lower-cost experimentation.
  • OPEN open-weight/open-model ecosystem affinity.

Capability Capacity Summary (auto-generated)

These summaries are generated from the model rows on this page using scripts/update_api_pricing_capability_summary.py. Only rows with a numeric daily token cap are included in the capacity math.

Leaderboard By Capability (known daily token caps)

Capability Top models Highest known daily cap Known models
Coding Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Groq — Llama 4 Maverick 17B (500K) 1M 8
Verification n/a n/a 0
Reasoning Groq — GPT OSS 120B (200K) 200K 1
Long-context n/a n/a 0
Low-latency Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Cerebras — Llama 3.1 8B (1M) 1M 8
Budget/free-value Cerebras — Llama 3.1 8B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K) 1M 6
Open-model ecosystem Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Cerebras — Llama 3.1 8B (1M) 1M 8

80% Shortlist (known-cap coverage)

Capability Models to reach >=80% of known capacity Coverage Total known daily cap
Coding Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K); Groq — Llama 4 Scout 17B (500K) 86.4% 4M
Verification n/a n/a n/a
Reasoning Groq — GPT OSS 120B (200K) 100.0% 200K
Long-context n/a n/a n/a
Low-latency Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Cerebras — Llama 3.1 8B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K) 82.5% 4.8M
Budget/free-value Cerebras — Llama 3.1 8B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K); Groq — Llama 4 Scout 17B (500K) 87.7% 2.9M
Open-model ecosystem Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Cerebras — Llama 3.1 8B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K) 82.5% 4.8M

Fast Recommendation (80% rule, known-cap data)

Goal Recommended free-first models Why this set
Coding Cerebras — Llama 4 Maverick 400B; Cerebras — Qwen3 Coder 235B; Groq — Llama 4 Maverick 17B; Groq — Qwen3 32B; Groq — Llama 4 Scout 17B Reaches 86.4% of known daily capacity (4M total known).
Verification n/a No numeric daily-cap data available for this capability.
Reasoning Groq — GPT OSS 120B Reaches 100.0% of known daily capacity (200K total known).

Alibaba DashScope (Qwen)

Model Quotas Verification Summary
Qwen3.6-35B-A3B 262K / plan / plan / plan / plan Verified CODE REASON OPEN
Account: DashScope. Latest frontier variant with 3B active parameters.
Qwen3.5-35B-A3B 128K / plan / plan / plan / plan Verified CODE REASON OPEN
Account: DashScope. SOTA agentic coding (37.8% SWE-bench Verified Hard).

Google Gemini

Model Quotas Verification Summary
Gemini 3.1 Pro Preview model / paid / paid / paid / paid Verified CODE VERIFY REASON LONGCTX
Account: Google. Paid-only Pro preview; $2/MTok input and $12/MTok output for prompts up to 200K tokens.
Gemini 3.1 Flash-Lite Preview model / plan / plan / plan / n/p Verified FAST LONGCTX BUDGET
Account: Google. Free input/output rows documented for Standard/Batch/Flex/Priority; paid Standard is $0.25/MTok text-image-video input and $1.50/MTok output.
Gemini Embedding model / plan / plan / plan / n/p Verified BUDGET LONGCTX
Account: Google. Gemini Embedding is available on free and paid tiers; paid input is $0.15/MTok.

OpenAI

Model Quotas Verification Summary
GPT-5.5 model / tier / tier / tier / tier Verified CODE VERIFY REASON
Account: OpenAI. Frontier coding/professional-work model; $5/MTok input and $30/MTok output. No standing public free API tier documented.
GPT-5.4 model / tier / tier / tier / tier Verified CODE FAST REASON
Account: OpenAI. More affordable coding/professional-work model; $2.50/MTok input and $15/MTok output.
GPT-5.4 mini model / tier / tier / tier / tier Verified CODE FAST BUDGET
Account: OpenAI. Mini model for coding, computer use, and subagents; $0.75/MTok input and $4.50/MTok output.

Anthropic Claude

Model Quotas Verification Summary
Claude Opus 4.7 model / tier / tier / tier / tier Verified CODE VERIFY REASON LONGCTX
Account: Anthropic. Current Opus line; $5/MTok input and $25/MTok output on the Claude API.
Claude Sonnet 4.6 model / tier / tier / tier / tier Verified CODE FAST LONGCTX
Account: Anthropic. Current Sonnet line; $3/MTok input and $15/MTok output.
Claude Haiku 4.5 model / tier / tier / tier / tier Verified FAST BUDGET VERIFY
Account: Anthropic. Current Haiku line; $1/MTok input and $5/MTok output.

Groq

Model Quotas Verification Summary
Llama 4 Maverick 70B 128K / 30 / 1000 / 12K / 250K Verified FAST CODE OPEN BUDGET
Account: Groq (no CC for free tier). Quality: Excellent. Standard for June 2026.
Llama 3.3 70B (llama-3.3-70b-versatile) 128K / 30 / 1000 / 12K / 100K Verified FAST CODE OPEN BUDGET
Account: Groq (no CC for free tier). Quality: Very Good. Official row differs from older community numbers.
Llama 4 Maverick 17B 128K / 30 / 1000 / 6K / 500K Verified FAST CODE OPEN BUDGET
Account: Groq (no CC for free tier). Quality: Good. Fast inference; revision limits can change.
Qwen3 32B 128K / 60 / 1000 / 6K / 500K Verified FAST CODE OPEN BUDGET
Account: Groq (no CC for free tier). Quality: Good. Current official free-plan row uses qwen/qwen3-32b.
Llama 4 Scout 17B 128K / 30 / 1000 / 30K / 500K Verified FAST CODE OPEN BUDGET
Account: Groq. 16E MoE variant optimized for low-latency tasks; official free-plan TPM is 30K.
GPT OSS 120B 128K / 30 / 1000 / 8K / 200K Verified CODE REASON
Account: Groq. High performance open-weights model; official free-plan TPD is 200K.
Kimi K2 (kimi-k2-0905) 256K / n/p / n/p / n/p / n/p Verified LONGCTX REASON
Account: Groq. 1T parameter MoE with 256K context.
Compound AI (groq/compound) 128K / 30 / 250 / 70K / n/p Verified FAST REASON CODE
Account: Groq (no CC for free tier). Quality: Good. Official docs do not publish TPD.

Kiro

Model Quotas Verification Summary
Auto (Frontier Mix) n/p / n/p / n/p / n/p / 50 credits Verified CODE FAST BUDGET
Account: Kiro. Mixed agent using frontier models. Free tier includes 50 monthly credits.

Together AI

Model Quotas Verification Summary
Llama 4 Maverick 131K / tier / tier / tier / tier Verified CODE REASON OPEN
Account: Together + paid credits. Quality: Very Good. No standing free trial in current docs.
DeepSeek V3.1 64K / tier / tier / tier / tier Verified CODE REASON OPEN
Account: Together + paid credits. Quality: Excellent. "$100 signup credits" not confirmed in current docs.
Mistral Small 3 128K / tier / tier / tier / tier Verified CODE OPEN
Account: Together + paid credits. Quality: Good. Limits are spend/account-tier dependent.

Hugging Face

Model Quotas Verification Summary
Various open models varies / provider / provider / provider / credit-based Verified OPEN BUDGET CODE
Account: Hugging Face. Quality: Varies. Limits depend on routed provider and plan.
Pro-tier routed providers varies / higher / provider / provider / credit-based Verified OPEN CODE VERIFY
Account: Hugging Face Pro. Quality: Very Good. Pro includes higher monthly credits.

Mistral AI

Model Quotas Verification Summary
Mistral Nemo 12B model / plan / plan / plan / plan Partially verified CODE OPEN BUDGET
Account: Mistral (Experiment/Scale). Quality: Good. Free Experiment plan exists; quotas are dynamic.
Mistral Small 3.1 128K / plan / plan / plan / plan Partially verified CODE OPEN
Account: Mistral (Experiment/Scale). Quality: Good. Access/limits depend on plan tier.
Codestral 32K / plan / plan / plan / plan Partially verified CODE VERIFY OPEN
Account: Mistral (Experiment/Scale). Quality: Excellent. Code-oriented with plan gating.
Mistral Large 3 128K / plan / plan / plan / plan Partially verified CODE REASON VERIFY
Account: Mistral (Experiment/Scale). Quality: Excellent. Most capable tier usually paid.

DeepSeek

Model Quotas Verification Summary
DeepSeek V3.2 (deepseek-chat) 128K / n/p / n/p / n/p / n/p Partially verified CODE REASON BUDGET
Account: DeepSeek. Quality: Excellent. Pricing is public; fixed free quotas are not.
DeepSeek R1 / reasoner 128K / n/p / n/p / n/p / n/p Unverified REASON VERIFY CODE
Account: DeepSeek. Quality: Excellent. "5M signup tokens" is not confirmed in official docs.

Cohere

Model Quotas Verification Summary
Command R7B 128K / 20 / ~1000mo / endpoint / 1000mo Verified VERIFY FAST
Account: Cohere trial key. Quality: Good. Free trial usage is heavily rate-limited.
Command R+ 128K / 20 / ~1000mo / endpoint / 1000mo Verified VERIFY FAST CODE
Account: Cohere trial key. Quality: Very Good. Trial cap is account-wide per month.

OpenRouter

Model Quotas Verification Summary
Qwen3 Coder 480B (:free variant when available) model / 20 / 50d (<$10) or 1000d (>= $10) / n/p / n/p Verified CODE OPEN BUDGET
Account: OpenRouter. Quality: Excellent. Free limits are account-plan based.
GPT-OSS-120B (:free variant when available) model / 20 / 50d or 1000d (>= $10) / n/p / n/p Verified CODE OPEN BUDGET
Account: OpenRouter. Quality: Very Good. Free variants can rotate.
Llama 3.3 70B (:free variant when available) model / 20 / 50d or 1000d (>= $10) / n/p / n/p Verified CODE OPEN BUDGET
Account: OpenRouter. Quality: Very Good. Free router pool is dynamic.
Mistral Small 3.1 (:free variant when available) model / 20 / 50d or 1000d (>= $10) / n/p / n/p Verified CODE OPEN BUDGET
Account: OpenRouter. Quality: Good. Best for low-volume testing.
DeepSeek R1 (:free variant when available) model / 20 / 50d or 1000d (>= $10) / n/p / n/p Verified REASON VERIFY BUDGET
Account: OpenRouter. Quality: Excellent. Current docs differ from older community RPD values.

Cerebras

Model Quotas Verification Summary
Llama 4 Maverick 400B 128K (paid) / 30 / 14,400 / 60K / 1M Partially verified FAST CODE OPEN
Account: Cerebras. Quality: Very Good. Context/limits vary by tier and model page.
Qwen3 Coder 235B 64K free, 131K paid / 30 / 14,400 / 60K / 1M Partially verified FAST CODE OPEN
Account: Cerebras. Quality: Excellent. Verify live limits on model page.
Llama 3.1 8B 8K free, 32K paid / 30 / 14,400 / 60K / 1M Verified FAST BUDGET OPEN
Account: Cerebras. Quality: Good. Free-tier limits are explicitly published.

xAI Grok

Model Quotas Verification Summary
Grok 4.1 Fast model / credit / credit / credit / credit Unverified REASON VERIFY
Account: xAI. Quality: Very Good. Promotional credits may exist; fixed "$25 startup credits" not consistently documented.

MiniMax

Model Quotas Verification Summary
MiniMax-M2.5 (Coding Plan Starter) 200K / 40 prompts per 5h / n/p / n/p / n/p Verified CODE BUDGET REASON
Account: MiniMax. Quality: Excellent. Optimized for coding. Fixed-fee subscription.
MiniMax-M2.5 (Pay-as-you-go) 200K / plan / plan / plan / plan Verified CODE FAST
Account: MiniMax. Quality: Excellent. Competitive RMB pricing (2.1/8.4 per 1M tokens).

Moonshot AI

Model Quotas Verification Summary
moonshot-v1-128k 128K / tier / tier / tier / tier Partially verified LONGCTX REASON
Account: Moonshot AI. Quality: Very Good. Famous for pioneer long-context stability.

Model Intelligence-per-Dollar Value (2026 Triage)

Based on community analysis (April 2026), models are categorized by their efficiency relative to cost: - Top Intelligence: GPT-5.4, Gemini 3.1 Pro. Use these for high-stakes reasoning where cost is secondary. - Best Value: MiMo-V2-Flash. Offers the highest intelligence-per-dollar for high-volume tasks. - Balanced: GLM-5, Kimi K2.5, Gemini 3 Flash. These models offer a strong middle ground for general-purpose agentic loops.

  • AI Tooling Landscape — The architectural overview of the entire AI stack.
  • Model Classes — Understanding the different "tiers" of models (Frontier, Performance, Mini).
  • OpenRouter — The primary provider used in this repo for multi-model access.
  • Groq — Recommended for ultra-fast, low-latency free-tier inference.
  • Mistral — A key open-weights provider with a strong experiment plan.
  • DeepSeek — High-value Chinese provider often leading on cost-per-token.
  • Benchmarking — How we verify the "Intelligence" part of the value equation.

Sources / References

Contribution Metadata

  • Last reviewed: 2026-06-07
  • Confidence: high