Vision Models | Inference Hub

Gemini 3 Flash

Google

LLM

Fast and efficient Gemini 3 model for high-throughput workloads.

1M ctx

6 providers

$0.0010 /MTok input

from Muapi

Gemma 3 12B

Google

LLM

Mid-size open-weight Gemma model with vision support.

12B 128k ctx Open

4 providers

$0.050 /MTok input

from OpenRouter

Gemma 3 4B

Google

LLM

Compact open-weight model for edge and mobile deployment.

4B 32k ctx Open

3 providers

$0.050 /MTok input

from OpenRouter

Qwen 3.5 9B

Alibaba

LLM

Compact Qwen 3.5 for single-GPU deployment.

9B 128k ctx Open

3 providers

$0.050 /MTok input

from Alibaba Cloud

Nova Lite

Amazon

LLM

Amazon Nova Lite for fast, cost-efficient tasks.

128k ctx

1 provider

$0.060 /MTok input

from OpenRouter

Gemma 3 27B

Google

LLM

Largest Gemma 3 model with strong reasoning and instruction following.

27B 128k ctx Open

5 providers

$0.080 /MTok input

from OpenRouter

Gemini 2.5 Flash

Google

LLM

Speed-optimized Gemini with strong reasoning and multimodal capabilities.

1M ctx

5 providers

$0.090 /MTok input

from KIE AI

Mistral Small 4

Mistral AI

LLM

Unified model combining fast instruct, deep reasoning, and multimodal chat. 119B params.

119B 256k ctx Open

4 providers

$0.090 /MTok input

from Parasail

Gemini 2.0 Flash

Google

LLM

Fast and efficient Gemini model for high-throughput workloads.

1M ctx

2 providers

$0.100 /MTok input

from OpenRouter

Llama 4 Scout

Qwen 3.5 35B

Alibaba

LLM

Mid-size Qwen 3.5 MoE model with 35B total, 3B active parameters.

35B MoE (3B active) 128k ctx Open

3 providers

$0.100 /MTok input

from Alibaba Cloud

Qwen3 VL 32B Instruct

Alibaba

Multimodal

Qwen3 VL 32B Instruct, an open-weight Alibaba model available via OpenRouter.

262k ctx Open

1 provider

$0.104 /MTok input

from OpenRouter

Qwen3 VL 8B Instruct

Alibaba

Multimodal

Qwen3 VL 8B Instruct, an open-weight Alibaba model available via OpenRouter.

256k ctx Open

1 provider

$0.117 /MTok input

from OpenRouter

Qwen3 VL 8B Thinking

Alibaba

Multimodal

Qwen3 VL 8B Thinking, an open-weight Alibaba model available via OpenRouter.

256k ctx Open

1 provider

$0.117 /MTok input

from OpenRouter

Gemma 4 27B

Google

LLM

Most capable open Gemma model with best intelligence-per-parameter.

27B 128k ctx Open

4 providers

$0.120 /MTok input

from OpenRouter

Qwen3 VL 30B A3B Instruct

Alibaba

Multimodal

Qwen3 VL 30B A3B Instruct, an open-weight Alibaba model available via OpenRouter.

262k ctx Open

1 provider

$0.130 /MTok input

from OpenRouter

Qwen3 VL 30B A3B Thinking

Alibaba

Multimodal

Qwen3 VL 30B A3B Thinking, an open-weight Alibaba model available via OpenRouter.

131k ctx Open

1 provider

$0.130 /MTok input

from OpenRouter

GPT-4o Mini

OpenAI

LLM

Cost-efficient smaller GPT-4o variant for lightweight tasks.

128k ctx

4 providers

$0.150 /MTok input

from Replicate

Llama 4 Maverick

Ministral 3 8B

Mistral AI

LLM

Edge-optimized model with vision support. Apache 2.0 licensed.

8B 128k ctx Open

3 providers

$0.150 /MTok input

from OpenRouter

Grok 3 Mini

xAI

LLM

Lightweight Grok optimized for cost-efficient reasoning.

2M ctx

3 providers

$0.200 /MTok input

from xAI

Qwen 3.5 72B

Alibaba

LLM

Native multimodal Qwen with text, image, and video processing.

72B 128k ctx Open

2 providers

$0.200 /MTok input

from Alibaba Cloud

Qwen3 VL 235B A22B Instruct

Alibaba

Multimodal

Qwen3 VL 235B A22B Instruct, an open-weight Alibaba model available via OpenRouter.

262k ctx Open

1 provider

$0.200 /MTok input

from OpenRouter

GPT-5 Mini

OpenAI

LLM

Compact GPT-5 variant for lightweight tasks and rapid prototyping.

128k ctx

4 providers

$0.250 /MTok input

from Replicate

Qwen3 VL 235B A22B Thinking

Alibaba

Multimodal

Qwen3 VL 235B A22B Thinking, an open-weight Alibaba model available via OpenRouter.

131k ctx Open

1 provider

$0.260 /MTok input

from OpenRouter

Qwen 3.5 122B

Alibaba

LLM

Large Qwen 3.5 MoE model with 122B total, 10B active parameters.

122B MoE (10B active) 128k ctx Open

3 providers

$0.290 /MTok input

from DeepInfra

GLM 4.6V

Z.ai

Multimodal

GLM 4.6V, an open-weight Z.ai model available via OpenRouter.

131k ctx Open

2 providers

$0.300 /MTok input

from OpenRouter

Qwen 3.6 Plus

Alibaba

LLM

Alibaba's latest flagship with 1M context and advanced agentic coding.

1M ctx

3 providers

$0.325 /MTok input

from OpenRouter

Claude 4.5 Haiku

Anthropic

LLM

Latest Haiku tier with improved capabilities at fast speed and low cost.

200k ctx

6 providers

$0.350 /MTok input

from KIE AI

Kimi K2.5

Moonshot AI

LLM

Open-weight multimodal model with agent swarm mode supporting up to 100 parallel sub-agents.

1T MoE (32B active) 262k ctx Open Modified MIT

12 providers

$0.375 /MTok input

from OpenRouter

Gemini 2.5 Pro

Google

LLM

High-capability Gemini model for complex reasoning and coding tasks.

1M ctx

3 providers

$0.380 /MTok input

from KIE AI

GPT-5.2

OpenAI

LLM

GPT-5.2 general-purpose model.

128k ctx

5 providers

$0.440 /MTok input

from KIE AI

Qwen 3.5 397B

Alibaba

LLM

Largest Qwen 3.5 MoE model with 397B total, 17B active parameters.

397B MoE (17B active) 128k ctx Open

4 providers

$0.450 /MTok input

from DeepInfra

Gemini 3 Pro

Google

LLM

High-capability Gemini 3 model. Deprecated in favor of 3.1 Pro.

1M ctx

2 providers

$0.500 /MTok input

from KIE AI

Mistral Large 3

Mistral AI

LLM

Mistral's most capable model. 675B MoE with 41B active parameters.

675B MoE (41B active) 128k ctx Open

3 providers

$0.500 /MTok input

from Mistral AI

GLM 4.5V

Z.ai

Multimodal

GLM 4.5V, an open-weight Z.ai model available via OpenRouter.

66k ctx Open

2 providers

$0.600 /MTok input

from OpenRouter

Kimi K2.6

Moonshot AI

LLM

Moonshot AI's open-weight flagship: natively multimodal 1T MoE model (32B active) with text, image, and video input, thinking and non-thinking modes, and 256K context. State-of-the-art open-source coding and agentic performance (SWE-Bench Verified 80.2); Agent Swarm scales to 300 parallel sub-agents.

1T MoE (32B active) 262k ctx Open Modified MIT

13 providers

$0.660 /MTok input

from OpenRouter

GPT-5.4

OpenAI

LLM

OpenAI's latest frontier model combining reasoning, coding, and agentic workflows.

128k ctx

7 providers

$0.700 /MTok input

from KIE AI

Kimi K2.7 Code

Moonshot AI

Code

Coding-focused agentic model built on Kimi K2.6 with ~30% lower thinking-token usage. 1T MoE (32B active), native INT4 quantization, 256K context, text/image/video input. Forces thinking and preserve_thinking modes; tuned for long-horizon software engineering and agent workflows.

1T MoE (32B active) 262k ctx Open Modified MIT

14 providers

$0.740 /MTok input

from OpenRouter

GPT-5.4 Mini

OpenAI

LLM

Compact GPT-5.4 variant balancing capability and cost.

128k ctx

3 providers

$0.750 /MTok input

from OpenRouter

Claude 3.5 Haiku

Anthropic

LLM

Fast and affordable Claude model for high-throughput tasks.

200k ctx

4 providers

$0.800 /MTok input

from OpenRouter

Nova Pro

Amazon

LLM

Amazon Nova Pro for balanced capability and cost.

128k ctx

1 provider

$0.800 /MTok input

from OpenRouter

o4 Mini

OpenAI

LLM

Lightweight reasoning model balancing chain-of-thought rigor with speed and cost efficiency.

200k ctx

4 providers

$1.00 /MTok input

from Replicate

Claude 4.5 Sonnet

Anthropic

LLM

High-capability Claude model balancing intelligence and speed.

200k ctx

5 providers

$1.05 /MTok input

from KIE AI

Claude 4.6 Sonnet

Anthropic

LLM

Latest Sonnet with Opus-tier capabilities at Sonnet pricing.

200k ctx

9 providers

$1.05 /MTok input

from KIE AI

GLM 5V Turbo

Z.ai

Multimodal

GLM 5V Turbo, an open-weight Z.ai model available via OpenRouter.

203k ctx Open

1 provider

$1.20 /MTok input

from OpenRouter

Grok 4

xAI

LLM

Latest Grok with improved instruction following and reduced hallucination.

2M ctx

3 providers

$1.25 /MTok input

from OpenRouter

Claude 4.5 Opus

Anthropic

LLM

Previous Opus generation with strong reasoning and coding.

200k ctx

3 providers

$1.75 /MTok input

from KIE AI

Claude 4.6 Opus

Anthropic

LLM

Anthropic's most capable model with 1M token context and advanced reasoning.

1M ctx

9 providers

$1.75 /MTok input

from KIE AI

Gemini 3.1 Pro

Google

LLM

Google's current flagship model with top benchmark scores and 1M context.

1M ctx

5 providers

$2.00 /MTok input

from Replicate

o3

OpenAI

LLM

Reasoning-focused model with step-by-step deliberation for complex math, coding, and science tasks.

200k ctx

3 providers

$2.00 /MTok input

from OpenRouter

GPT-4o

OpenAI

LLM

OpenAI's flagship multimodal model with strong reasoning, coding, and vision capabilities.

128k ctx

4 providers

$2.50 /MTok input

from Replicate

Nova Premier

Amazon

LLM

Amazon's most capable LLM for complex reasoning and enterprise tasks.

128k ctx

1 provider

$2.50 /MTok input

from OpenRouter

Grok 3

xAI

LLM

xAI's flagship LLM trained on 200K+ GPUs with real-time web and X integration.

2M ctx

1 provider

$3.00 /MTok input

from OpenRouter

o3 Pro

OpenAI

LLM

Most capable reasoning model in OpenAI's lineup with extended thinking for maximum reliability.

200k ctx

2 providers

$20.00 /MTok input

from OpenRouter

GPT-5.4 Pro

OpenAI

LLM

Highest capability GPT-5.4 tier with maximum reasoning depth. Premium pricing.

128k ctx

3 providers

$30.00 /MTok input

from OpenRouter

Gemma 4 12B

Google

LLM

Latest Gemma generation optimized for reasoning and agentic workflows.

12B 128k ctx Open

No providers yet