Word List to Regex
Have a list of words and need one regex that matches any of them? Paste the list and this tool builds it: every term is safely escaped, optionally wrapped in word boundaries, and — if you like — compressed by merging shared prefixes into a compact pattern. It even confirms that each term still matches the result. Everything runs locally.
How to use the Word List to Regex
Paste your terms — one per line or separated by commas. The tool trims whitespace, drops blanks, removes duplicates, and builds a single pattern that matches any term. Special regex characters in your words (dots, parentheses, plus signs, and so on) are escaped automatically, so a term like node.js matches literally rather than as a pattern.
Optimize merges words that share a prefix into a compact form — cat, car, can become ca[nrt] and do, dog, done become do(?:ne|g)? — which is shorter and faster for the engine. Turn it off to get a plain a|b|c alternation with the longest terms placed first. Add word boundaries wraps the pattern in \b so it only matches whole words. Capturing group uses ( ) instead of the default non-capturing (?: ), and Case-insensitive prefixes the flag note and validates accordingly.
The line under the output confirms that every input term still matches the generated regex, so you can trust it before pasting it into your code. It all runs in your browser.
From a list to a correct alternation
The obvious way to match a set of words is alternation: join them with the | operator to get foo|bar|baz. Two things make this harder than it looks, and both are easy to get wrong by hand. The first is escaping. Regex treats many punctuation characters as operators, so a literal term containing a dot, plus sign, question mark, or parentheses must be escaped or it will match the wrong thing — c++ as a pattern is an error, and example.com would match exampleXcom. A generator escapes every metacharacter so each term matches exactly the literal text you typed.
The second is ordering. Regex alternation is eager and tries branches left to right, stopping at the first that matches. If you write for|foreach against the text "foreach", the engine matches for first and leaves "each" behind. Putting longer alternatives first — foreach|for — avoids this surprise, which is why the plain output sorts terms from longest to shortest. With word boundaries the issue is reduced, but ordering still matters whenever one term is a prefix of another.
The optimization option goes further by building a trie — a tree of shared character prefixes — and emitting a regex that factors out common beginnings. Instead of cat|car|can you get ca[nrt]; instead of do|dog|done you get do(?:ne|g)?. The matched language is identical, but the pattern is shorter and the engine does less backtracking because it commits to a prefix once. This is the same trick used by tools that compile large keyword lists — spam filters, syntax highlighters, tokenizers — into a single fast pattern. Whichever form you choose, wrapping the result in \b word boundaries keeps it matching whole words, so a list containing "cat" does not also fire inside "category."
Common use cases
- Keyword filters. Build one pattern that matches any word in a blocklist or allowlist.
- Find & replace. Generate a regex to search many terms at once in an editor or codebase.
- Tokenizers and highlighters. Compile a list of keywords into a compact, fast alternation.
- Validation. Match an input against a fixed set of allowed values with a single expression.