Vision Token Calculator
Images are billed as tokens, and every provider counts them differently. Enter an image's width and height to see how many input tokens it costs on GPT-4o, Claude, and Gemini — each computed with that provider's actual tiling rules — and turn it into a dollar figure. Useful when a vision-heavy app's bill is dominated by images, not text. Calculated in your browser.
Token counts use each provider's documented tiling: OpenAI 512px tiles after resizing, Claude ~(w×h)/750 after capping at 1568px / 1.15MP, Gemini 768px crops (258 tokens each, or 258 for images up to 384px). GPT-4o-mini multiplies these tokens by a model-specific factor.
How to use the Vision Token Calculator
Type the image's pixel dimensions and how many images you send per request (or per the period you are costing). For OpenAI, choose the detail level: low is a flat 85 tokens regardless of size, while high tiles the image and costs far more for large pictures. The table updates with the token count for each provider.
To get a dollar figure, pick the provider you are billing on and enter its input price per million tokens — image tokens are charged at the same input rate as text. The cost line multiplies the provider's per-image token count by your image count and price. Because the three providers tile images so differently, the cheapest one for a given image size is not always obvious; the table lets you compare at a glance.
How each provider counts image tokens
OpenAI (GPT-4o) uses a tile model. In low detail an image is a flat 85 tokens. In high detail the image is first scaled to fit within 2048×2048, then scaled so its shortest side is 768px, then divided into 512×512 tiles. The cost is 85 base tokens plus 170 per tile, so a 1024×1024 image works out to 765 tokens. GPT-4o-mini uses the same tiling but multiplies the result by a large model-specific factor, making images proportionally far more expensive on the mini model.
Claude uses a simple area formula: tokens are approximately width times height divided by 750. Before counting, Claude resizes any image whose long edge exceeds 1568px or whose area exceeds about 1.15 megapixels down to those limits, so very large images are capped rather than billed in full. A 1092×1092 image is near Claude's effective maximum at roughly 1590 tokens; smaller images cost proportionally less.
Gemini tiles too, but with 768px crops worth 258 tokens each. An image where both sides are 384px or smaller is a flat 258 tokens. Larger images are divided into 768px tiles, each adding 258 tokens. The practical upshot is that the three schemes diverge sharply: small thumbnails are cheapest on OpenAI's low detail, mid-size images are often cheapest on Claude, and the right choice for a high-volume vision workload can change the bill by a large multiple. Always confirm against each provider's current documentation, since these formulas are periodically revised.
Common use cases
- Budgeting a vision app. Estimate the per-image token cost before you process thousands of screenshots, receipts, or photos.
- Choosing a provider. Compare GPT-4o, Claude, and Gemini for your typical image size to find the cheapest per image.
- Deciding on detail level. See how much OpenAI's low-detail mode saves when full resolution is not needed.
- Sizing image preprocessing. Find the resolution at which downscaling stops reducing tokens because the provider already caps it.