Chat / Instruct Models | Inference Hub

Gemini 3 Flash

Google

LLM

Fast and efficient Gemini 3 model for high-throughput workloads.

1M ctx

6 providers

$0.0010 /MTok input

from Muapi

Gemma 3 4B

Google

LLM

Compact open-weight model for edge and mobile deployment.

4B 32k ctx Open

3 providers

$0.020 /MTok input

from Together AI

Gemma 3 12B

Google

LLM

Mid-size open-weight Gemma model with vision support.

12B 128k ctx Open

4 providers

$0.040 /MTok input

from OpenRouter

Qwen 3.5 9B

Alibaba

LLM

Compact Qwen 3.5 for single-GPU deployment.

9B 128k ctx Open

2 providers

$0.040 /MTok input

from DeepInfra

GPT-5 Nano

OpenAI

LLM

Ultra-lightweight GPT-5 for high-speed, low-cost text generation.

128k ctx

4 providers

$0.050 /MTok input

from Replicate

Qwen 3 8B

Alibaba

LLM

Compact Qwen 3 for edge and single-GPU deployment. Open weight.

8B 128k ctx Open

6 providers

$0.050 /MTok input

from OpenRouter

GLM 4.7

Zhipu AI

LLM

Optimized for coding, reasoning, and tool use.

128k ctx

12 providers

$0.060 /MTok input

from DeepInfra

Nova Lite

Amazon

LLM

Amazon Nova Lite for fast, cost-efficient tasks.

128k ctx

1 provider

$0.060 /MTok input

from OpenRouter

GPT OSS 120B

OpenAI

LLM

Open-weight 117B MoE model (5.1B active) achieving near o4-mini reasoning. Apache 2.0 licensed, runs on a single 80GB GPU.

117B MoE (5.1B active) 131k ctx Open

7 providers

$0.070 /MTok input

from Venice AI

Gemma 3 27B

Google

LLM

Largest Gemma 3 model with strong reasoning and instruction following.

27B 128k ctx Open

5 providers

$0.080 /MTok input

from OpenRouter

Llama 4 Scout

Qwen 3 32B

Alibaba

LLM

Mid-size Qwen 3 with strong coding and math capabilities. Open weight.

32B 128k ctx Open

6 providers

$0.080 /MTok input

from OpenRouter

Gemini 2.5 Flash

Google

LLM

Speed-optimized Gemini with strong reasoning and multimodal capabilities.

1M ctx

5 providers

$0.090 /MTok input

from KIE AI

Mistral Small 4

Mistral AI

LLM

Unified model combining fast instruct, deep reasoning, and multimodal chat. 119B params.

119B 256k ctx Open

4 providers

$0.090 /MTok input

from Parasail

Gemini 2.0 Flash

Google

LLM

Fast and efficient Gemini model for high-throughput workloads.

1M ctx

2 providers

$0.100 /MTok input

from OpenRouter

Llama 3.3 70B

Ministral 3 8B

Mistral AI

LLM

Edge-optimized model with vision support. Apache 2.0 licensed.

8B 128k ctx Open

3 providers

$0.100 /MTok input

from Mistral AI

Qwen 3 235B

Alibaba

LLM

Largest Qwen 3 model with hybrid thinking modes for flexible reasoning control.

235B MoE 128k ctx Open

10 providers

$0.100 /MTok input

from Parasail

Qwen 3.5 35B

Alibaba

LLM

Mid-size Qwen 3.5 MoE model with 35B total, 3B active parameters.

35B MoE (3B active) 128k ctx Open

3 providers

$0.100 /MTok input

from Alibaba Cloud

MiniMax M2.5

MiniMax

LLM

MiniMax general-purpose LLM with competitive reasoning and coding capabilities.

128k ctx

6 providers

$0.118 /MTok input

from OpenRouter

Gemma 4 27B

Google

LLM

Most capable open Gemma model with best intelligence-per-parameter.

27B 128k ctx Open

4 providers

$0.130 /MTok input

from Parasail

GLM 4.5

Zhipu AI

LLM

Strong reasoning and coding with 106B total, 12B active MoE architecture.

106B MoE (12B active) 128k ctx Open

5 providers

$0.140 /MTok input

from SiliconFlow

DeepSeek V3.1

DeepSeek

LLM

Updated DeepSeek V3 with improved coding and reasoning performance.

671B MoE 128k ctx Open

7 providers

$0.150 /MTok input

from OpenRouter

GPT-4o Mini

OpenAI

LLM

Cost-efficient smaller GPT-4o variant for lightweight tasks.

128k ctx

4 providers

$0.150 /MTok input

from Replicate

Llama 4 Maverick

Kimi K2

Moonshot AI

LLM

State-of-the-art 1T MoE model with 32B active parameters. Strong coding and agentic capabilities.

1T MoE (32B active) 128k ctx Open

9 providers

$0.195 /MTok input

from AIMLAPI

DeepSeek V3

DeepSeek

LLM

Open-weight 671B MoE model with strong coding and reasoning at low cost.

671B MoE 128k ctx Open

10 providers

$0.200 /MTok input

from OpenRouter

GPT-5.4 Nano

OpenAI

LLM

Ultra-lightweight GPT-5.4 for high-speed, low-cost tasks.

128k ctx

2 providers

$0.200 /MTok input

from OpenRouter

Grok 3 Mini

xAI

LLM

Lightweight Grok optimized for cost-efficient reasoning.

2M ctx

3 providers

$0.200 /MTok input

from xAI

Qwen 3.5 72B

Alibaba

LLM

Native multimodal Qwen with text, image, and video processing.

72B 128k ctx Open

2 providers

$0.200 /MTok input

from Alibaba Cloud

Kimi K2.5

Moonshot AI

LLM

Open-weight multimodal model with agent swarm mode supporting up to 100 parallel sub-agents.

128k ctx Open

12 providers

$0.230 /MTok input

from SiliconFlow

GPT-5 Mini

OpenAI

LLM

Compact GPT-5 variant for lightweight tasks and rapid prototyping.

128k ctx

4 providers

$0.250 /MTok input

from Replicate

DeepSeek V3.2

DeepSeek

LLM

Latest DeepSeek V3 with improved reasoning and coding. 671B MoE (37B active), MIT licensed, 164K context.

671B MoE (37B active) 164k ctx Open

10 providers

$0.260 /MTok input

from OpenRouter

DeepSeek R1

DeepSeek

LLM

Reasoning-focused model with chain-of-thought capabilities rivaling o1.

671B MoE 128k ctx Open

8 providers

$0.280 /MTok input

from DeepSeek

Qwen 3.5 122B

Alibaba

LLM

Large Qwen 3.5 MoE model with 122B total, 10B active parameters.

122B MoE (10B active) 128k ctx Open

3 providers

$0.290 /MTok input

from DeepInfra

MiniMax M2.7

MiniMax

LLM

Latest MiniMax general-purpose LLM with improved reasoning.

128k ctx

3 providers

$0.300 /MTok input

from OpenRouter

Claude 4.5 Haiku

Anthropic

LLM

Latest Haiku tier with improved capabilities at fast speed and low cost.

200k ctx

6 providers

$0.350 /MTok input

from KIE AI

Gemini 2.5 Pro

Google

LLM

High-capability Gemini model for complex reasoning and coding tasks.

1M ctx

3 providers

$0.380 /MTok input

from KIE AI

GLM 4.6

Zhipu AI

LLM

Open-source frontier model with 355B parameters. MIT licensed.

355B 128k ctx Open

5 providers

$0.390 /MTok input

from SiliconFlow

GPT-5.4 Mini

OpenAI

LLM

Compact GPT-5.4 variant balancing capability and cost.

128k ctx

3 providers

$0.400 /MTok input

from WaveSpeed AI

Qwen 3.6 Plus

Alibaba

LLM

Alibaba's latest flagship with 1M context and advanced agentic coding.

1M ctx

2 providers

$0.400 /MTok input

from Alibaba Cloud

GPT-5.2

OpenAI

LLM

GPT-5.2 general-purpose model.

128k ctx

4 providers

$0.440 /MTok input

from KIE AI

DeepSeek R1 0528

DeepSeek

LLM

Updated R1 with improved reasoning accuracy and reduced hallucination.

671B MoE 128k ctx Open

8 providers

$0.450 /MTok input

from OpenRouter

Gemini 3 Pro

Google

LLM

High-capability Gemini 3 model. Deprecated in favor of 3.1 Pro.

1M ctx

2 providers

$0.500 /MTok input

from KIE AI

GPT-5 Codex

OpenAI

LLM

GPT-5 variant optimized for code generation and software engineering.

128k ctx

2 providers

$0.500 /MTok input

from KIE AI

GPT-5.1 Codex

OpenAI

LLM

GPT-5.1 code-optimized variant.

128k ctx

2 providers

$0.500 /MTok input

from KIE AI

Qwen 3.5 397B

Alibaba

LLM

Largest Qwen 3.5 MoE model with 397B total, 17B active parameters.

397B MoE (17B active) 128k ctx Open

3 providers

$0.540 /MTok input

from DeepInfra

GPT-5.2 Codex

OpenAI

LLM

GPT-5.2 code-optimized variant.

128k ctx

2 providers

$0.700 /MTok input

from KIE AI

GPT-5.3 Codex

OpenAI

LLM

GPT-5.3 code-optimized variant.

128k ctx

2 providers

$0.700 /MTok input

from KIE AI

GPT-5.4

OpenAI

LLM

OpenAI's latest frontier model combining reasoning, coding, and agentic workflows.

128k ctx

7 providers

$0.700 /MTok input

from KIE AI

GPT-5.4 Codex

OpenAI

LLM

Latest GPT-5.4 code-optimized variant with industry-leading coding capabilities.

128k ctx

2 providers

$0.700 /MTok input

from KIE AI

GLM 5

Zhipu AI

LLM

Frontier 744B model trained on Huawei Ascend chips. Open source with strong agentic capabilities.

744B 128k ctx Open

11 providers

$0.720 /MTok input

from OpenRouter

Claude 3.5 Haiku

Anthropic

LLM

Fast and affordable Claude model for high-throughput tasks.

200k ctx

4 providers

$0.800 /MTok input

from OpenRouter

Nova Pro

Amazon

LLM

Amazon Nova Pro for balanced capability and cost.

128k ctx

1 provider

$0.800 /MTok input

from OpenRouter

o4 Mini

OpenAI

LLM

Lightweight reasoning model balancing chain-of-thought rigor with speed and cost efficiency.

200k ctx

4 providers

$1.00 /MTok input

from Replicate

Claude 4.5 Sonnet

Anthropic

LLM

High-capability Claude model balancing intelligence and speed.

200k ctx

5 providers

$1.05 /MTok input

from KIE AI

Claude 4.6 Sonnet

Anthropic

LLM

Latest Sonnet with Opus-tier capabilities at Sonnet pricing.

200k ctx

9 providers

$1.05 /MTok input

from KIE AI

Mistral Large 3

Mistral AI

LLM

Mistral's most capable model. 675B MoE with 41B active parameters.

675B MoE (41B active) 128k ctx Open

3 providers

$1.20 /MTok input

from Fireworks AI

Qwen 3 Max

Alibaba

LLM

Alibaba's highest capability Qwen 3 model.

128k ctx

4 providers

$1.20 /MTok input

from DeepInfra

Qwen 3 Max Thinking

Alibaba

LLM

Qwen 3 Max with extended reasoning and chain-of-thought capabilities.

128k ctx

2 providers

$1.20 /MTok input

from DeepInfra

Gemini 3.1 Pro

Google

LLM

Google's current flagship model with top benchmark scores and 1M context.

1M ctx

5 providers

$1.25 /MTok input

from WaveSpeed AI

GLM 5.1

Z.ai

LLM

Coding-focused frontier model scoring 94% of Claude Opus 4.6. 744B MoE trained on Huawei Ascend 910B. #1 on SWE-Bench Pro (open source).

744B MoE (40B active) 203k ctx

3 providers

$1.40 /MTok input

from Fireworks AI

Claude 4.5 Opus

Anthropic

LLM

Previous Opus generation with strong reasoning and coding.

200k ctx

3 providers

$1.75 /MTok input

from KIE AI

Claude 4.6 Opus

Anthropic

LLM

Anthropic's most capable model with 1M token context and advanced reasoning.

1M ctx

9 providers

$1.75 /MTok input

from KIE AI

Grok 4

xAI

LLM

Latest Grok with improved instruction following and reduced hallucination.

2M ctx

3 providers

$2.00 /MTok input

from OpenRouter

o3

OpenAI

LLM

Reasoning-focused model with step-by-step deliberation for complex math, coding, and science tasks.

200k ctx

3 providers

$2.00 /MTok input

from OpenRouter

Command A

Cohere

LLM

Cohere's latest flagship model for enterprise RAG, tool use, and agents.

256k ctx

2 providers

$2.50 /MTok input

from OpenRouter

Command R+

Cohere

LLM

Scalable enterprise model optimized for RAG and multilingual tasks.

128k ctx

3 providers

$2.50 /MTok input

from OpenRouter

GPT-4o

OpenAI

LLM

OpenAI's flagship multimodal model with strong reasoning, coding, and vision capabilities.

128k ctx

4 providers

$2.50 /MTok input

from Replicate

Nova Premier

Amazon

LLM

Amazon's most capable LLM for complex reasoning and enterprise tasks.

128k ctx

1 provider

$2.50 /MTok input

from OpenRouter

Grok 3

xAI

LLM

xAI's flagship LLM trained on 200K+ GPUs with real-time web and X integration.

2M ctx

1 provider

$3.00 /MTok input

from OpenRouter

o3 Pro

OpenAI

LLM

Most capable reasoning model in OpenAI's lineup with extended thinking for maximum reliability.

200k ctx

2 providers

$20.00 /MTok input

from OpenRouter

GPT-5.4 Pro

OpenAI

LLM

Highest capability GPT-5.4 tier with maximum reasoning depth. Premium pricing.

128k ctx

3 providers

$30.00 /MTok input

from OpenRouter

Gemma 4 12B

Google

LLM

Latest Gemma generation optimized for reasoning and agentic workflows.

12B 128k ctx Open

No providers yet