Regex Unicode Property Escapes Reference

A practical reference for Unicode property escapes in regular expressions — \p{L} for any letter, \p{N} for any number, script properties like \p{Script=Greek}, and many more. Click any entry to load it into the live tester, type your own text, and watch the matches highlight instantly. Everything runs in your browser.

Pattern

The u flag is always applied, since property escapes require it.

Test text

Matches highlighted

Property reference — click a row to try it

How to use the Regex Unicode Property Escapes Reference

Type a regular expression in the Pattern box using one or more Unicode property escapes, then type or paste into the Test text box. Every match is highlighted live in the panel below, with a running count. The u flag is applied automatically because property escapes only work in Unicode mode; matching is global so you see all matches, not just the first. If the pattern is invalid the error is shown and the highlight clears.

Below the tester is a reference table of the most useful properties — general categories such as letters, numbers, and punctuation; the case sub-categories; whitespace and marks; and a few script examples. Click any row to drop that escape straight into the pattern box so you can see what it matches in your sample. Combine them as you would any regex: \p{Lu}\p{Ll}+ finds capitalised words, [\p{L}\p{N}]+ matches "word" characters across every language, and \P{ASCII} (capital P negates) finds anything outside plain ASCII.

It all runs in your browser, so matching is instant, works offline, and nothing you paste is uploaded anywhere.

What Unicode property escapes are

A Unicode property escape is a regular-expression construct that matches characters by what they are according to the Unicode standard, rather than by listing them. Written \p{Property} — or \P{Property} for its negation — it lets you say "any letter", "any decimal digit", or "any character in the Greek script" without enumerating thousands of code points. Every character in Unicode is tagged with a set of properties: a general category (is it a letter, a number, punctuation, a symbol?), the script it belongs to, whether it is whitespace, and many more. Property escapes simply ask the regex engine to consult those tags, which makes patterns both shorter and dramatically more correct for international text.

The contrast with the old ASCII shorthands is the whole point. For decades \d meant "0 through 9", \w meant "A-Z, a-z, 0-9 and underscore", and [a-zA-Z] was how everyone matched "a letter". Those work for English and break everywhere else: they miss accented letters, Greek, Cyrillic, Arabic, the digits used in many scripts, and the entire CJK range. \p{L} matches a letter in any writing system, \p{N} matches a number character anywhere, and [\p{L}\p{N}_] is the honest, Unicode-aware version of \w. The general categories come as one- or two-letter codes: L for letters with sub-categories Lu (uppercase), Ll (lowercase), and Lt (titlecase); N for numbers; P for punctuation; S for symbols; M for combining marks; and Z for separators. Using the two-letter form narrows the match, while the single letter matches the whole family.

Beyond categories you can match by script — \p{Script=Greek} or its short form \p{sc=Greek}, and likewise Latin, Han, Cyrillic, Arabic, Hiragana and the rest — which is invaluable for detecting or splitting mixed-language content. There are also binary properties like \p{White_Space}, \p{Alphabetic}, and \p{Emoji}. A few practical notes make these reliable in practice. In JavaScript the regex must carry the u (or v) flag or the escape is a syntax error, which is why this tool always applies it; other engines such as PCRE, Python's newer regex module, Java, and .NET support property escapes with their own minor spelling differences. Capital \P{...} negates a property, and properties combine freely inside character classes. Once you reach for them, patterns that used to be long, fragile, and English-only become short, readable, and genuinely global.

Common use cases

International validation. Match names, words, or identifiers in any language with \p{L} instead of [a-zA-Z].
Cleaning text. Strip everything outside a category, e.g. remove non-letters with \P{L}.
Script detection. Find or separate Greek, Cyrillic, Arabic, or CJK runs in mixed content.
Learning. See exactly which characters a property matches by highlighting them in your own text.

Frequently asked questions

Why is the u flag forced on?

In JavaScript, \p{...} property escapes are only recognised when the regular expression uses the u (Unicode) or v flag. Without it the pattern is a syntax error, so the tester always adds u for you.

What is the difference between \p and \P?

Lowercase \p{Prop} matches characters that have the property; uppercase \P{Prop} matches characters that do not. So \p{L} is "any letter" and \P{L} is "anything that is not a letter".

How do I match a specific script?

Use \p{Script=Name} or the short form \p{sc=Name}, for example \p{Script=Greek} or \p{sc=Han}. Script names follow Unicode, such as Latin, Cyrillic, Arabic, Hiragana, and Katakana.

Do these work outside JavaScript?

Yes, most modern engines support property escapes — PCRE, Java, .NET, Ruby, and Python via its regex module — though flag names and a few property spellings differ. The categories shown here are standard across engines.

Embed this tool on your site

Free to embed, no attribution required (but appreciated). Paste this where you want the tool to appear:

<iframe src="https://codeswap.net/regex/regex-unicode-property-reference/?embed=1" width="100%" height="520" loading="lazy" style="border:1px solid #e5e7eb;border-radius:8px" title="Regex Unicode Property Escapes Reference"></iframe>
<p style="font-size:13px">Tool by <a href="https://codeswap.net/regex/regex-unicode-property-reference/">Regex Unicode Property Escapes Reference — Codeswap</a></p>