API Pricing & Free Tier Matrix¶

What it is¶

The API Pricing & Free Tier Matrix is a consolidated reference for the costs and free access policies of major Large Language Model (LLM) providers and AI platforms. It tracks pricing for tokens, developer program benefits, and specific model-level quotas.

What problem it solves¶

LLM pricing is notoriously complex, with costs varying by several orders of magnitude between "mini" and "frontier" models. Furthermore, free tiers are often hidden or poorly documented. This matrix allows developers to perform a "budget-first" architectural selection, choosing models that fit their financial constraints and usage patterns.

Where it fits in the stack¶

This document belongs to the Layer 1: Providers and Layer 2: Models analysis layer. It provides the economic context for the tools documented in docs/tools/providers/ and docs/tools/ai_knowledge/.

Typical use cases¶

Budgeting: Estimating the monthly cost of running a specific agentic workflow (e.g., using GPT-5.4 mini for routine tasks).
Free-Tier Hunting: Identifying which providers offer enough free credits to build and test a prototype without a credit card.
Provider Switching: Comparing the "Intelligence-per-Dollar" value when deciding whether to migrate from one provider to another (e.g., Anthropic to Google).
Quota Management: Checking Rate Per Minute (RPM) and Tokens Per Day (TPD) limits for free tiers to avoid service interruptions.

Strengths¶

Consolidated: Aggregates data from over 30 providers in a single view.
Agent-Optimized: Specifically highlights "mini" models and high-speed providers ideal for autonomous agents.
Evidence-Based: Includes direct links to official documentation for every claim.
Automated: Uses scripts to maintain a "Capability Capacity Summary" of known free tokens.

Limitations¶

High Volatility: Prices and free-tier availability can change weekly.
Regional Variation: Some free tiers or pricing models may not be available in all jurisdictions.
Tier Complexity: Many providers use opaque "usage tiers" (Tier 1-5) that affect limits beyond simple price-per-token.

When to use it¶

Use it during the design phase of an AI application to select a model based on cost-efficiency.
Use it when you hit a rate limit and need to find an alternative provider with higher free-tier capacity.
Use it to verify if a "free trial" mentioned in a tutorial is still currently active.

When not to use it¶

Do not use it as a legally binding price list; always verify with the provider's official dashboard before committing to large-scale spend.
Do not use it for consumer-facing chat app pricing (e.g., ChatGPT Plus vs. Claude Pro) unless it specifically mentions API benefits.

Getting started¶

Identify your primary requirement (e.g., "Coding", "Fast", or "Budget").
Consult the Capability Capacity Summary to see which models currently offer the best free-tier value.
Use the Canonical pricing matrix to jump to the official pricing and documentation for your chosen provider.
Check the Model-level quota tracker for specific RPM/TPD limits.

Status legend¶

Yes = official free tier/trial access is currently documented.
Partial = limited free usage exists (for example, selected models/features).
No = no current free trial/tier is documented.
Unclear = pricing/billing docs do not clearly confirm a standing free tier.

Canonical pricing matrix (last verified: 2026-05-14)¶

Provider / Platform	Official links	Free tier / trial	Evidence summary
OpenAI	Docs · Pricing	No	Usage-priced API; current pricing centers GPT-5.5, GPT-5.4, and GPT-5.4 mini.
Anthropic (Claude API)	Docs · Pricing	Yes	New users receive small starter API credits. Current API pricing lists Claude Opus 4.7/4.6/4.5, Sonnet 4.6/4.5, and Haiku 4.5.
Google Gemini Developer API	Docs · Pricing	Yes	Pricing page documents a Free plan with selected free input/output token access; Gemini 3.1 Pro Preview is paid-only, while Gemini 3.1 Flash-Lite Preview has free rows. Gemini 4 Maverick preview free for dev accounts.
OpenRouter	Docs · Pricing	Yes	Free plan and free-model routing are documented.
xAI (Grok API)	Docs · Pricing	Yes	Docs mention monthly free requests/credits.
Z.ai (GLM API)	Docs · Pricing	Yes	New users can claim free API token packages.
Alibaba DashScope (Qwen APIs)	Docs · Pricing	Yes	Many models show temporary free quota periods.
Cohere	Docs · Pricing	Yes	Trial API keys are free and rate-limited.
Mistral AI	Docs · Pricing	Yes	Experiment plan supports free API testing.
Together AI	Docs · Pricing	No	Billing docs indicate paid credits are required.
Groq	Docs · Pricing	Yes	Free API plan and model-level limits are documented; exact current limits should be read from the account limits page for production budgeting.
Kiro	Docs · Pricing	Yes	Perpetual free tier (50 credits/mo) + 500 bonus credits.
Fireworks AI	Docs · Pricing	Yes	Public pricing notes starter free credits.
Replicate	Docs · Pricing	Partial	Some models can be run free before billing.
DeepSeek API	Docs · Pricing	Unclear	Granted balances are mentioned, fixed free tier unclear.
Perplexity API	Docs · Pricing	No	Purchased credits and top-up requirements documented.
AI21	Docs · Pricing	Yes	Pricing page advertises free trial credits.
Abacus.AI	Docs · Pricing	Yes	Free trial and ChatLLM free access documented.
Voyage AI	Docs · Pricing	Unclear	Paid rates are clear; standing free tier not explicit.
Cloudflare Workers AI	Docs · Pricing	Yes	Free plan includes daily usage.
Hugging Face Inference Providers	Docs · Pricing	Yes	Monthly included inference credits by account tier.
Cerebras Inference	Docs · Pricing	Yes	Pricing references free-tier usage/credits.
NVIDIA API Catalog	Docs · Pricing	Yes	Starter credits are referenced publicly.
SambaNova Cloud	Docs · Pricing	Unclear	Public page shows paid plans; no stable free policy.
AWS Bedrock	Docs · Pricing	No	Metered pay-as-you-go pricing.
Amazon Q	Docs · Pricing	Yes	Free tier for individuals/developers is documented.
Azure OpenAI Service	Docs · Pricing	Unclear	Metered service; only account-level cloud credits may apply.
Vertex AI (Gemini via GCP)	Docs · Pricing	Unclear	Metered pricing; no persistent API free-tier statement.
OCI Generative AI	Docs · Pricing	Unclear	Paid rates are public; free tier not clearly documented.
MiniMax	Docs · Pricing	Yes	Coding Plan provides a low-cost entry tier; trial credits available.
Moonshot AI	Docs · Pricing	Partial	Trial credits are typically granted to new developer accounts.

Enterprise Productivity Suite (2026 Benchmarks)¶

Tool	Pricing Model	Free Tier / Trial	Notes
Fyxer AI	~$30-$50/user/mo	14-day Trial	Executive admin & inbox management.
Glean	Enterprise Quote	No	Unified search across 100+ SaaS apps.
Hebbia	Institutional Quote	No	High-precision analytical search for Finance/Law.
tl;dv	Freemium	Yes	Meeting recorder with free unlimited tier.

Developer Program Plans¶

These are the core subscription plans for developers that bundle AI access, cloud credits, and other professional benefits.

Program / Plan	Cost	AI Access & Quotas	Cloud Credits & Benefits
Google Developer Program — Standard	Free	10 Firebase Studio workspaces; Gemini Code Assist (Basic); Gemini CLI (60 RPM / 1000 RPD)	Monthly Google Skills credits (via GEAR); community access; private previews.
Google Developer Program — Premium	$24.99/mo or $299/yr	30 Firebase Studio workspaces; Gemini Code Assist (Higher); Gemini CLI (120 RPM / 1500 RPD)	$45/mo ($550/yr) GenAI/Cloud credit; $500 bonus credit upon certification; 1 Cloud cert voucher; expert consultation.
Google Developer Program — Enterprise	Preview	Gemini Code Assist Enterprise; Gemini CLI (120 RPM / 2000 RPD)	$150/mo Google Cloud credit; centralized purchasing; developer sandboxes.

Model-level quota tracker (expanded list)¶

This section is grouped by provider with compact four-column tables for narrower screens.

Verified = core limits are visible in official docs.
Partially verified = provider free-tier stance is verified, but model-level quotas are dynamic or not fully published.
Unverified = values come from community reports or account-specific observations not explicitly documented.

Code Generation Quality is subjective and treated as community-assessed, not an official benchmark metric.

Quick jump: Google Gemini · OpenAI · Anthropic Claude · Groq · Together AI · Hugging Face · Mistral AI · DeepSeek · Cohere · OpenRouter · Cerebras · xAI Grok · Kiro

Quotas format is context / RPM / RPD / TPM / daily token cap. n/p means "not published."

Capability tags:

CODE code generation and refactoring tasks.
VERIFY cross-checking, factual validation, and test review.
REASON complex reasoning and multi-step planning.
LONGCTX long documents, large prompts, and retrieval-heavy workflows.
FAST low-latency interactions and interactive agent loops.
BUDGET better free-tier value or lower-cost experimentation.
OPEN open-weight/open-model ecosystem affinity.

Capability Capacity Summary (auto-generated)¶

These summaries are generated from the model rows on this page using scripts/update_api_pricing_capability_summary.py. Only rows with a numeric daily token cap are included in the capacity math.

Leaderboard By Capability (known daily token caps)¶

Capability	Top models	Highest known daily cap	Known models
Coding	Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Groq — Llama 4 Maverick 17B (500K)	1M	8
Verification	n/a	n/a	0
Reasoning	Groq — GPT OSS 120B (200K)	200K	1
Long-context	n/a	n/a	0
Low-latency	Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Cerebras — Llama 3.1 8B (1M)	1M	8
Budget/free-value	Cerebras — Llama 3.1 8B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K)	1M	6
Open-model ecosystem	Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Cerebras — Llama 3.1 8B (1M)	1M	8

80% Shortlist (known-cap coverage)¶

Capability	Models to reach >=80% of known capacity	Coverage	Total known daily cap
Coding	Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K); Groq — Llama 4 Scout 17B (500K)	86.4%	4M
Verification	n/a	n/a	n/a
Reasoning	Groq — GPT OSS 120B (200K)	100.0%	200K
Long-context	n/a	n/a	n/a
Low-latency	Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Cerebras — Llama 3.1 8B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K)	82.5%	4.8M
Budget/free-value	Cerebras — Llama 3.1 8B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K); Groq — Llama 4 Scout 17B (500K)	87.7%	2.9M
Open-model ecosystem	Cerebras — Llama 4 Maverick 400B (1M); Cerebras — Qwen3 Coder 235B (1M); Cerebras — Llama 3.1 8B (1M); Groq — Llama 4 Maverick 17B (500K); Groq — Qwen3 32B (500K)	82.5%	4.8M

Fast Recommendation (80% rule, known-cap data)¶

Goal	Recommended free-first models	Why this set
Coding	Cerebras — Llama 4 Maverick 400B; Cerebras — Qwen3 Coder 235B; Groq — Llama 4 Maverick 17B; Groq — Qwen3 32B; Groq — Llama 4 Scout 17B	Reaches 86.4% of known daily capacity (4M total known).
Verification	n/a	No numeric daily-cap data available for this capability.
Reasoning	Groq — GPT OSS 120B	Reaches 100.0% of known daily capacity (200K total known).

Alibaba DashScope (Qwen)¶

Model	Quotas	Verification	Summary
Qwen3.6-35B-A3B	`262K / plan / plan / plan / plan`	Verified	CODE REASON OPEN Account: DashScope. Latest frontier variant with 3B active parameters.
Qwen3.5-35B-A3B	`128K / plan / plan / plan / plan`	Verified	CODE REASON OPEN Account: DashScope. SOTA agentic coding (37.8% SWE-bench Verified Hard).

Google Gemini¶

Model	Quotas	Verification	Summary
Gemini 3.1 Pro Preview	`model / paid / paid / paid / paid`	Verified	CODE VERIFY REASON LONGCTX Account: Google. Paid-only Pro preview; $2/MTok input and $12/MTok output for prompts up to 200K tokens.
Gemini 3.1 Flash-Lite Preview	`model / plan / plan / plan / n/p`	Verified	FAST LONGCTX BUDGET Account: Google. Free input/output rows documented for Standard/Batch/Flex/Priority; paid Standard is $0.25/MTok text-image-video input and $1.50/MTok output.
Gemini Embedding	`model / plan / plan / plan / n/p`	Verified	BUDGET LONGCTX Account: Google. Gemini Embedding is available on free and paid tiers; paid input is $0.15/MTok.

OpenAI¶

Model	Quotas	Verification	Summary
GPT-5.5	`model / tier / tier / tier / tier`	Verified	CODE VERIFY REASON Account: OpenAI. Frontier coding/professional-work model; $5/MTok input and $30/MTok output. No standing public free API tier documented.
GPT-5.4	`model / tier / tier / tier / tier`	Verified	CODE FAST REASON Account: OpenAI. More affordable coding/professional-work model; $2.50/MTok input and $15/MTok output.
GPT-5.4 mini	`model / tier / tier / tier / tier`	Verified	CODE FAST BUDGET Account: OpenAI. Mini model for coding, computer use, and subagents; $0.75/MTok input and $4.50/MTok output.

Anthropic Claude¶

Model	Quotas	Verification	Summary
Claude Opus 4.7	`model / tier / tier / tier / tier`	Verified	CODE VERIFY REASON LONGCTX Account: Anthropic. Current Opus line; $5/MTok input and $25/MTok output on the Claude API.
Claude Sonnet 4.6	`model / tier / tier / tier / tier`	Verified	CODE FAST LONGCTX Account: Anthropic. Current Sonnet line; $3/MTok input and $15/MTok output.
Claude Haiku 4.5	`model / tier / tier / tier / tier`	Verified	FAST BUDGET VERIFY Account: Anthropic. Current Haiku line; $1/MTok input and $5/MTok output.

Groq¶

Model	Quotas	Verification	Summary
Llama 4 Maverick 70B	`128K / 30 / 1000 / 12K / 250K`	Verified	FAST CODE OPEN BUDGET Account: Groq (no CC for free tier). Quality: Excellent. Standard for June 2026.
Llama 3.3 70B (llama-3.3-70b-versatile)	`128K / 30 / 1000 / 12K / 100K`	Verified	FAST CODE OPEN BUDGET Account: Groq (no CC for free tier). Quality: Very Good. Official row differs from older community numbers.
Llama 4 Maverick 17B	`128K / 30 / 1000 / 6K / 500K`	Verified	FAST CODE OPEN BUDGET Account: Groq (no CC for free tier). Quality: Good. Fast inference; revision limits can change.
Qwen3 32B	`128K / 60 / 1000 / 6K / 500K`	Verified	FAST CODE OPEN BUDGET Account: Groq (no CC for free tier). Quality: Good. Current official free-plan row uses `qwen/qwen3-32b`.
Llama 4 Scout 17B	`128K / 30 / 1000 / 30K / 500K`	Verified	FAST CODE OPEN BUDGET Account: Groq. 16E MoE variant optimized for low-latency tasks; official free-plan TPM is 30K.
GPT OSS 120B	`128K / 30 / 1000 / 8K / 200K`	Verified	CODE REASON Account: Groq. High performance open-weights model; official free-plan TPD is 200K.
Kimi K2 (kimi-k2-0905)	`256K / n/p / n/p / n/p / n/p`	Verified	LONGCTX REASON Account: Groq. 1T parameter MoE with 256K context.
Compound AI (groq/compound)	`128K / 30 / 250 / 70K / n/p`	Verified	FAST REASON CODE Account: Groq (no CC for free tier). Quality: Good. Official docs do not publish TPD.

Kiro¶

Model	Quotas	Verification	Summary
Auto (Frontier Mix)	`n/p / n/p / n/p / n/p / 50 credits`	Verified	CODE FAST BUDGET Account: Kiro. Mixed agent using frontier models. Free tier includes 50 monthly credits.

Together AI¶

Model	Quotas	Verification	Summary
Llama 4 Maverick	`131K / tier / tier / tier / tier`	Verified	CODE REASON OPEN Account: Together + paid credits. Quality: Very Good. No standing free trial in current docs.
DeepSeek V3.1	`64K / tier / tier / tier / tier`	Verified	CODE REASON OPEN Account: Together + paid credits. Quality: Excellent. "$100 signup credits" not confirmed in current docs.
Mistral Small 3	`128K / tier / tier / tier / tier`	Verified	CODE OPEN Account: Together + paid credits. Quality: Good. Limits are spend/account-tier dependent.

Hugging Face¶

Model	Quotas	Verification	Summary
Various open models	`varies / provider / provider / provider / credit-based`	Verified	OPEN BUDGET CODE Account: Hugging Face. Quality: Varies. Limits depend on routed provider and plan.
Pro-tier routed providers	`varies / higher / provider / provider / credit-based`	Verified	OPEN CODE VERIFY Account: Hugging Face Pro. Quality: Very Good. Pro includes higher monthly credits.

Mistral AI¶

Model	Quotas	Verification	Summary
Mistral Nemo 12B	`model / plan / plan / plan / plan`	Partially verified	CODE OPEN BUDGET Account: Mistral (Experiment/Scale). Quality: Good. Free Experiment plan exists; quotas are dynamic.
Mistral Small 3.1	`128K / plan / plan / plan / plan`	Partially verified	CODE OPEN Account: Mistral (Experiment/Scale). Quality: Good. Access/limits depend on plan tier.
Codestral	`32K / plan / plan / plan / plan`	Partially verified	CODE VERIFY OPEN Account: Mistral (Experiment/Scale). Quality: Excellent. Code-oriented with plan gating.
Mistral Large 3	`128K / plan / plan / plan / plan`	Partially verified	CODE REASON VERIFY Account: Mistral (Experiment/Scale). Quality: Excellent. Most capable tier usually paid.

DeepSeek¶

Model	Quotas	Verification	Summary
DeepSeek V3.2 (deepseek-chat)	`128K / n/p / n/p / n/p / n/p`	Partially verified	CODE REASON BUDGET Account: DeepSeek. Quality: Excellent. Pricing is public; fixed free quotas are not.
DeepSeek R1 / reasoner	`128K / n/p / n/p / n/p / n/p`	Unverified	REASON VERIFY CODE Account: DeepSeek. Quality: Excellent. "5M signup tokens" is not confirmed in official docs.

Cohere¶

Model	Quotas	Verification	Summary
Command R7B	`128K / 20 / ~1000mo / endpoint / 1000mo`	Verified	VERIFY FAST Account: Cohere trial key. Quality: Good. Free trial usage is heavily rate-limited.
Command R+	`128K / 20 / ~1000mo / endpoint / 1000mo`	Verified	VERIFY FAST CODE Account: Cohere trial key. Quality: Very Good. Trial cap is account-wide per month.

OpenRouter¶

Model	Quotas	Verification	Summary
Qwen3 Coder 480B (`:free` variant when available)	`model / 20 / 50d (<$10) or 1000d (>= $10) / n/p / n/p`	Verified	CODE OPEN BUDGET Account: OpenRouter. Quality: Excellent. Free limits are account-plan based.
GPT-OSS-120B (`:free` variant when available)	`model / 20 / 50d or 1000d (>= $10) / n/p / n/p`	Verified	CODE OPEN BUDGET Account: OpenRouter. Quality: Very Good. Free variants can rotate.
Llama 3.3 70B (`:free` variant when available)	`model / 20 / 50d or 1000d (>= $10) / n/p / n/p`	Verified	CODE OPEN BUDGET Account: OpenRouter. Quality: Very Good. Free router pool is dynamic.
Mistral Small 3.1 (`:free` variant when available)	`model / 20 / 50d or 1000d (>= $10) / n/p / n/p`	Verified	CODE OPEN BUDGET Account: OpenRouter. Quality: Good. Best for low-volume testing.
DeepSeek R1 (`:free` variant when available)	`model / 20 / 50d or 1000d (>= $10) / n/p / n/p`	Verified	REASON VERIFY BUDGET Account: OpenRouter. Quality: Excellent. Current docs differ from older community RPD values.

Cerebras¶

Model	Quotas	Verification	Summary
Llama 4 Maverick 400B	`128K (paid) / 30 / 14,400 / 60K / 1M`	Partially verified	FAST CODE OPEN Account: Cerebras. Quality: Very Good. Context/limits vary by tier and model page.
Qwen3 Coder 235B	`64K free, 131K paid / 30 / 14,400 / 60K / 1M`	Partially verified	FAST CODE OPEN Account: Cerebras. Quality: Excellent. Verify live limits on model page.
Llama 3.1 8B	`8K free, 32K paid / 30 / 14,400 / 60K / 1M`	Verified	FAST BUDGET OPEN Account: Cerebras. Quality: Good. Free-tier limits are explicitly published.

xAI Grok¶

Model	Quotas	Verification	Summary
Grok 4.1 Fast	`model / credit / credit / credit / credit`	Unverified	REASON VERIFY Account: xAI. Quality: Very Good. Promotional credits may exist; fixed "$25 startup credits" not consistently documented.

MiniMax¶

Model	Quotas	Verification	Summary
MiniMax-M2.5 (Coding Plan Starter)	`200K / 40 prompts per 5h / n/p / n/p / n/p`	Verified	CODE BUDGET REASON Account: MiniMax. Quality: Excellent. Optimized for coding. Fixed-fee subscription.
MiniMax-M2.5 (Pay-as-you-go)	`200K / plan / plan / plan / plan`	Verified	CODE FAST Account: MiniMax. Quality: Excellent. Competitive RMB pricing (2.1/8.4 per 1M tokens).

Moonshot AI¶

Model	Quotas	Verification	Summary
moonshot-v1-128k	`128K / tier / tier / tier / tier`	Partially verified	LONGCTX REASON Account: Moonshot AI. Quality: Very Good. Famous for pioneer long-context stability.

Model Intelligence-per-Dollar Value (2026 Triage)¶

Based on community analysis (April 2026), models are categorized by their efficiency relative to cost: - Top Intelligence: GPT-5.4, Gemini 3.1 Pro. Use these for high-stakes reasoning where cost is secondary. - Best Value: MiMo-V2-Flash. Offers the highest intelligence-per-dollar for high-volume tasks. - Balanced: GLM-5, Kimi K2.5, Gemini 3 Flash. These models offer a strong middle ground for general-purpose agentic loops.

AI Tooling Landscape — The architectural overview of the entire AI stack.
Model Classes — Understanding the different "tiers" of models (Frontier, Performance, Mini).
OpenRouter — The primary provider used in this repo for multi-model access.
Groq — Recommended for ultra-fast, low-latency free-tier inference.
Mistral — A key open-weights provider with a strong experiment plan.
DeepSeek — High-value Chinese provider often leading on cost-per-token.
Benchmarking — How we verify the "Intelligence" part of the value equation.

Sources / References¶

Contribution Metadata¶

Last reviewed: 2026-06-07
Confidence: high