Groq LLMs

Inference provider operating LPU hardware for ultra-low-latency open-weight serving.

Groq hosts Llama, DeepSeek, Qwen, Mistral, and other open weights. Its differentiator is throughput: tokens per second far higher than GPU-based providers, which matters for streaming UIs and agent loops. Prices below reflect Groq Cloud list rates.

Founded: 2016 · HQ: Mountain View, USA · Docs: console.groq.com ↗

All Groq models

Model	Family	Context	Input $/M	Output $/M	Released	Status

Comparisons with other providers

The most-searched comparisons involving Groq models:

Working with the Groq API

Documentation lives at https://console.groq.com/docs. Before paying for any call, count input tokens with the appropriate counter:

OpenAI Token Counter — for GPT-* and o-series models
Claude Token Counter — for Anthropic Claude models
Gemini Token Counter — for Google Gemini models