Groq LLMs
Inference provider operating LPU hardware for ultra-low-latency open-weight serving.
Groq hosts Llama, DeepSeek, Qwen, Mistral, and other open weights. Its differentiator is throughput: tokens per second far higher than GPU-based providers, which matters for streaming UIs and agent loops. Prices below reflect Groq Cloud list rates.
Founded: 2016 · HQ: Mountain View, USA · Docs: console.groq.com ↗
All Groq models
| Model | Family | Context | Input $/M | Output $/M | Released | Status |
|---|
Comparisons with other providers
The most-searched comparisons involving Groq models:
Working with the Groq API
Documentation lives at https://console.groq.com/docs. Before paying for any call, count input tokens with the appropriate counter:
- OpenAI Token Counter — for GPT-* and o-series models
- Claude Token Counter — for Anthropic Claude models
- Gemini Token Counter — for Google Gemini models