Groq LLMs

Inference provider operating LPU hardware for ultra-low-latency open-weight serving.

Groq hosts Llama, DeepSeek, Qwen, Mistral, and other open weights. Its differentiator is throughput: tokens per second far higher than GPU-based providers, which matters for streaming UIs and agent loops. Prices below reflect Groq Cloud list rates.

Founded: 2016 · HQ: Mountain View, USA · Docs: console.groq.com ↗

All Groq models

ModelFamilyContextInput $/MOutput $/MReleasedStatus

Comparisons with other providers

The most-searched comparisons involving Groq models:

Working with the Groq API

Documentation lives at https://console.groq.com/docs. Before paying for any call, count input tokens with the appropriate counter: