Best LLM for Every Task

There is no single "best" language model — the right pick depends on the workload. Each guide below ranks models for one specific task using benchmark scores, capability fit and current list pricing, with a clear rationale for every rank. 20 tasks covered, updated as new models ship.

Pick by task

Best LLM for Agentic Workflows
Current top pick: Claude Opus 4.7
Best LLM for Audio Understanding
Best LLM for Best Open-Weight Model
Current top pick: DeepSeek-V3
Best LLM for Cheapest Model with Vision
Current top pick: GPT-5 Nano
Best LLM for Code Editing & Refactoring
Best LLM for Code Generation
Current top pick: Claude Opus 4.7
Best LLM for Content Moderation
Best LLM for Creative Writing
Best LLM for Customer Support
Current top pick: GPT-5 Mini
Best LLM for Data Extraction
Best LLM for Document Q&A / RAG
Current top pick: Command R+
Best LLM for Hard Reasoning Tasks
Best LLM for High-Throughput Inference
Current top pick: Llama 3.3 70B
Best LLM for Long Context Summarization
Current top pick: Gemini 2.5 Pro
Best LLM for Math & STEM Reasoning
Current top pick: GPT-5
Best LLM for Multilingual Chat
Best LLM for On-Device / Edge
Best LLM for Text Classification
Best LLM for Translation
Best LLM for Vision / Image Analysis
Current top pick: GPT-5

How these rankings are built

Each task weights benchmarks differently — "code generation" leans on coding benchmarks and price, while "customer support" weights price and latency over peak quality. Composite scores combine those weighted benchmarks with capability fit (tool use, vision, function calling). The output is an ordered list with reasons, not a single verdict, so you can override on constraints the ranking can't model.

Reproduce the math with your own assumptions: compare any two models with the comparison tool, browse the full model database, or estimate spend with the cost calculator.