HTML Entity Encoder + Decoder
HTML entities — those &, <, ' sequences — appear all over scraped HTML, JSON-escaped strings, and database dumps. Reading them in raw form is painful; converting them at the command line means firing up Python. This tool does both directions: paste encoded HTML and get the decoded text, or paste plain text and get safely-encoded HTML. Supports named entities (&), decimal references ('), and hex references (').
How to use the HTML Entity Encoder + Decoder
Pick Decode or Encode. For decoding, paste any text containing HTML entities — output appears instantly. For encoding, paste plain text; the output shows the safely-encoded HTML version. Encode styles control how aggressive the encoding is — "minimal" only escapes the five mandatory characters (&, <, >, ", '); the others go further.
About HTML Entity Encoder + Decoder
HTML entities are escape sequences used to represent characters that would otherwise have special meaning in HTML (<, >, &) or that are awkward to type (©, €, math symbols, emoji). Three forms exist:
- Named entities —
&,<,é. About 250 named entities are standardized by HTML5; older HTML used a smaller subset. - Decimal numeric references —
'(apostrophe, code point 39). - Hexadecimal numeric references —
'(same apostrophe, hex code).
Any Unicode code point can be expressed via numeric reference (😀 = 😀). Modern HTML treats all three forms equivalently, but parsers are strict about the terminating semicolon — & without a semicolon is technically invalid (though browsers typically accept it for backward compatibility).
Encoding matters for XSS protection: any user-supplied string interpolated into HTML markup should escape at minimum &, <, and >. If interpolating into attribute values, also escape " and '. Modern templating engines (React, Vue, Angular) do this automatically — bare string concatenation into innerHTML is where bugs creep in.
Common use cases
- Reading scraped HTML — convert
&back to&for clean text extraction. - Debugging XSS — see exactly what an attacker-supplied string would render as.
- JSON cleanup — APIs that double-encode HTML entities inside JSON strings.
- Email templates — encoding user-supplied names safely for HTML email bodies.
- CSV exports — decoding entities back to plain text before loading into Excel.
Frequently asked questions
What if a named entity isn't recognized?
Why does "minimal" only escape 5 characters?
é, ©, emoji) render correctly as raw UTF-8 in modern browsers — encoding them is optional and mostly historical.Decoding speed?
Does it handle CDATA / numeric references with leading zeros?
' decodes to apostrophe, same as '.