HTML Entity Encoder + Decoder

HTML entities — those &, <, ' sequences — appear all over scraped HTML, JSON-escaped strings, and database dumps. Reading them in raw form is painful; converting them at the command line means firing up Python. This tool does both directions: paste encoded HTML and get the decoded text, or paste plain text and get safely-encoded HTML. Supports named entities (&), decimal references ('), and hex references (').

How to use the HTML Entity Encoder + Decoder

Pick Decode or Encode. For decoding, paste any text containing HTML entities — output appears instantly. For encoding, paste plain text; the output shows the safely-encoded HTML version. Encode styles control how aggressive the encoding is — "minimal" only escapes the five mandatory characters (&, <, >, ", '); the others go further.

About HTML Entity Encoder + Decoder

HTML entities are escape sequences used to represent characters that would otherwise have special meaning in HTML (<, >, &) or that are awkward to type (©, , math symbols, emoji). Three forms exist:

  • Named entities&amp;, &lt;, &eacute;. About 250 named entities are standardized by HTML5; older HTML used a smaller subset.
  • Decimal numeric references&#39; (apostrophe, code point 39).
  • Hexadecimal numeric references&#x27; (same apostrophe, hex code).

Any Unicode code point can be expressed via numeric reference (&#x1F600; = 😀). Modern HTML treats all three forms equivalently, but parsers are strict about the terminating semicolon — &amp without a semicolon is technically invalid (though browsers typically accept it for backward compatibility).

Encoding matters for XSS protection: any user-supplied string interpolated into HTML markup should escape at minimum &, <, and >. If interpolating into attribute values, also escape " and '. Modern templating engines (React, Vue, Angular) do this automatically — bare string concatenation into innerHTML is where bugs creep in.

Common use cases

  • Reading scraped HTML — convert &amp; back to & for clean text extraction.
  • Debugging XSS — see exactly what an attacker-supplied string would render as.
  • JSON cleanup — APIs that double-encode HTML entities inside JSON strings.
  • Email templates — encoding user-supplied names safely for HTML email bodies.
  • CSV exports — decoding entities back to plain text before loading into Excel.

Frequently asked questions

What if a named entity isn't recognized?

Decoder leaves it as-is. Encoder only emits names it knows are valid HTML5.

Why does "minimal" only escape 5 characters?

Those are the characters that have special meaning in HTML. The others (é, ©, emoji) render correctly as raw UTF-8 in modern browsers — encoding them is optional and mostly historical.

Decoding speed?

O(n) — fast enough for any reasonable input. Browser DOM parser handles the entity table.

Does it handle CDATA / numeric references with leading zeros?

Yes — &#0039; decodes to apostrophe, same as &#39;.