Word List to Regex

Have a list of words and need one regex that matches any of them? Paste the list and this tool builds it: every term is safely escaped, optionally wrapped in word boundaries, and — if you like — compressed by merging shared prefixes into a compact pattern. It even confirms that each term still matches the result. Everything runs locally.

Words or phrases (one per line, or comma-separated)

Optimize (merge common prefixes) Add word boundaries (\b…\b) Capturing group Case-insensitive

Regex

How to use the Word List to Regex

Paste your terms — one per line or separated by commas. The tool trims whitespace, drops blanks, removes duplicates, and builds a single pattern that matches any term. Special regex characters in your words (dots, parentheses, plus signs, and so on) are escaped automatically, so a term like node.js matches literally rather than as a pattern.

Optimize merges words that share a prefix into a compact form — cat, car, can become ca[nrt] and do, dog, done become do(?:ne|g)? — which is shorter and faster for the engine. Turn it off to get a plain a|b|c alternation with the longest terms placed first. Add word boundaries wraps the pattern in \b so it only matches whole words. Capturing group uses ( ) instead of the default non-capturing (?: ), and Case-insensitive prefixes the flag note and validates accordingly.

The line under the output confirms that every input term still matches the generated regex, so you can trust it before pasting it into your code. It all runs in your browser.

From a list to a correct alternation

The obvious way to match a set of words is alternation: join them with the | operator to get foo|bar|baz. Two things make this harder than it looks, and both are easy to get wrong by hand. The first is escaping. Regex treats many punctuation characters as operators, so a literal term containing a dot, plus sign, question mark, or parentheses must be escaped or it will match the wrong thing — c++ as a pattern is an error, and example.com would match exampleXcom. A generator escapes every metacharacter so each term matches exactly the literal text you typed.

The second is ordering. Regex alternation is eager and tries branches left to right, stopping at the first that matches. If you write for|foreach against the text "foreach", the engine matches for first and leaves "each" behind. Putting longer alternatives first — foreach|for — avoids this surprise, which is why the plain output sorts terms from longest to shortest. With word boundaries the issue is reduced, but ordering still matters whenever one term is a prefix of another.

The optimization option goes further by building a trie — a tree of shared character prefixes — and emitting a regex that factors out common beginnings. Instead of cat|car|can you get ca[nrt]; instead of do|dog|done you get do(?:ne|g)?. The matched language is identical, but the pattern is shorter and the engine does less backtracking because it commits to a prefix once. This is the same trick used by tools that compile large keyword lists — spam filters, syntax highlighters, tokenizers — into a single fast pattern. Whichever form you choose, wrapping the result in \b word boundaries keeps it matching whole words, so a list containing "cat" does not also fire inside "category."

Common use cases

Keyword filters. Build one pattern that matches any word in a blocklist or allowlist.
Find & replace. Generate a regex to search many terms at once in an editor or codebase.
Tokenizers and highlighters. Compile a list of keywords into a compact, fast alternation.
Validation. Match an input against a fixed set of allowed values with a single expression.

Frequently asked questions

Does it escape special characters in my words?

Yes. Every regex metacharacter — dots, plus signs, parentheses, brackets, and so on — is escaped, so each term matches literally rather than being interpreted as a pattern.

What does the Optimize option do?

It merges terms that share a prefix into a compact trie-based pattern, like turning cat|car|can into ca[nrt]. The set of matched words is identical, but the regex is shorter and faster.

Why are longer words placed first without optimization?

Alternation matches left to right and stops at the first hit. Ordering longer terms first prevents a shorter prefix (like "for") from matching before a longer term ("foreach").

Will it match whole words only?

If you enable word boundaries, the pattern is wrapped in \b so it matches whole words. Without that, terms can match inside larger words.

Embed this tool on your site

Free to embed, no attribution required (but appreciated). Paste this where you want the tool to appear:

<iframe src="https://codeswap.net/regex/word-list-to-regex/?embed=1" width="100%" height="520" loading="lazy" style="border:1px solid #e5e7eb;border-radius:8px" title="Word List to Regex"></iframe>
<p style="font-size:13px">Tool by <a href="https://codeswap.net/regex/word-list-to-regex/">Word List to Regex — Codeswap</a></p>