Unicode Inspector

Inspect every code point in a string: hex, decimal, UTF-8 byte sequence, Unicode category, and visual form. Useful for debugging text encoding issues, finding invisible characters (zero-width space, BOM), or comparing visually-similar characters (Cyrillic а vs Latin a).

How to use the Unicode Inspector

Paste text. Every code point shows on its own row with hex code, decimal, glyph, UTF-8 byte sequence, and category. The summary at top gives character count, code-point count, and UTF-8 byte size — these can differ for strings with multi-code-point characters (like emoji with skin-tone modifiers).

About code points, characters, and bytes

One "character" you see is often multiple code points, which is often multiple UTF-8 bytes. The emoji "👨‍👩‍👧" looks like one glyph but is 8 code points joined by zero-width joiners — and 25 UTF-8 bytes. String.length in JavaScript counts UTF-16 code units, not code points or characters, which is why '👋'.length === 2.

This tool helps debug when those numbers diverge in unexpected ways — for example, when a database column with a length limit silently truncates emoji, or when a tokenizer splits a character in a surprising place.