Unicode Normalizer (NFC, NFD, NFKC, NFKD)

The character "é" can be encoded two ways in Unicode: as U+00E9 (single composed code point) or as U+0065 + U+0301 (lowercase e + combining acute). They look identical but are different bytes — and equality comparisons fail unless both sides are normalized to the same form. This tool normalizes between the four standard forms and shows the actual code points.

How to use the Unicode Normalizer (NFC, NFD, NFKC, NFKD)

Type or paste any Unicode text. The tool normalizes to all four standard forms (NFC / NFD / NFKC / NFKD) and shows the byte length and code-point sequence for each — making it easy to spot when "the same string" differs in encoding.

About Unicode Normalizer (NFC, NFD, NFKC, NFKD)

Unicode normalization defines four canonical forms:

  • NFC (Normalization Form Canonical Composition) — composed: é stays as one code point. Default for most systems — JSON, HTML, most APIs.
  • NFD (Normalization Form Canonical Decomposition) — decomposed: é = e + combining accent. Used by macOS HFS+ filesystem internally.
  • NFKC (Compatibility Composition) — like NFC but also folds compatibility characters (½ → 1⁄2, fi ligature → fi).
  • NFKD (Compatibility Decomposition) — like NFD but also folds compatibility.

Use cases: NFC for storage / transmission; NFD for case-insensitive comparison of accented text; NFKC for search / matching where ½ should match 1/2; NFKD for full normalization before case-folding.

Common use cases

  • Database storage — normalize to NFC before insert to ensure consistent matching.
  • String comparison — when "café" might be NFC or NFD; normalize both sides.
  • macOS file paths — APFS / HFS+ may return NFD; normalize before comparing to user input.
  • Search indexes — NFKC + casefold for permissive matching.
  • Debugging — figure out why two visually-identical strings aren't equal.