Best LLM for Every Task
There is no single "best" language model — the right pick depends on the workload. Each guide below ranks models for one specific task using benchmark scores, capability fit and current list pricing, with a clear rationale for every rank. 20 tasks covered, updated as new models ship.
Pick by task
-
Best LLM for Agentic Workflows
Current top pick: Claude Opus 4.7
- Best LLM for Audio Understanding
-
Best LLM for Best Open-Weight Model
Current top pick: DeepSeek-V3
-
Best LLM for Cheapest Model with Vision
Current top pick: GPT-5 Nano
- Best LLM for Code Editing & Refactoring
-
Best LLM for Code Generation
Current top pick: Claude Opus 4.7
- Best LLM for Content Moderation
- Best LLM for Creative Writing
-
Best LLM for Customer Support
Current top pick: GPT-5 Mini
- Best LLM for Data Extraction
-
Best LLM for Document Q&A / RAG
Current top pick: Command R+
- Best LLM for Hard Reasoning Tasks
-
Best LLM for High-Throughput Inference
Current top pick: Llama 3.3 70B
-
Best LLM for Long Context Summarization
Current top pick: Gemini 2.5 Pro
-
Best LLM for Math & STEM Reasoning
Current top pick: GPT-5
- Best LLM for Multilingual Chat
- Best LLM for On-Device / Edge
- Best LLM for Text Classification
- Best LLM for Translation
-
Best LLM for Vision / Image Analysis
Current top pick: GPT-5
How these rankings are built
Each task weights benchmarks differently — "code generation" leans on coding benchmarks and price, while "customer support" weights price and latency over peak quality. Composite scores combine those weighted benchmarks with capability fit (tool use, vision, function calling). The output is an ordered list with reasons, not a single verdict, so you can override on constraints the ranking can't model.
Reproduce the math with your own assumptions: compare any two models with the comparison tool, browse the full model database, or estimate spend with the cost calculator.