Regex Match Extractor

When you need every email, URL, or ID buried in a wall of text, a regex is the fastest tool — but reading them off a highlighted match view is tedious. This extractor runs your pattern and gives you just the matches, one per line, ready to copy. Choose whole matches or a specific capture group, with options to sort and remove duplicates.

How to use the Regex Match Extractor

Enter a pattern, paste your text, and the matches appear on the right immediately, one per line, with a live count. Toggle ignore case and multiline as needed. If your pattern has capture groups, pick which one to output — Whole match returns the full hit, while Group 1/2/3 returns just that captured part, which is handy for pulling, say, only the domain out of an email or the ID out of a longer string.

Unique removes duplicate results (on by default, since extraction usually wants a distinct set), and sort orders them alphabetically. The default example extracts email addresses and, with unique on, collapses the repeated address into a single entry. Use Copy list to grab everything at once.

The pattern is matched globally with your browser's own regex engine. Nothing is uploaded, so it is safe to extract data from internal logs, exports, or documents.

Extracting data with regular expressions

Extraction is one of the most practical uses of regular expressions. Instead of testing whether text matches a pattern, you scan the whole input and collect every place it does. A global match walks through the string and returns each hit in order, which turns an unstructured blob — a log file, an email thread, a scraped page, a CSV cell — into a tidy list of just the parts you care about. Common targets are email addresses, URLs, IP addresses, phone numbers, hashtags, ticket IDs, and numeric values.

Capture groups make extraction sharper. A pattern can match a broad context but capture only the piece you want: match href="(...)" across a page and output group 1 to get just the URLs, or match a key-value line and capture the value. Choosing the group to emit means you do not have to post-process the results to trim away surrounding text. That is why this tool lets you select which group becomes the output line.

Two refinements turn raw matches into usable data: deduplication and sorting. Real text repeats things — the same address, the same domain — so a distinct set is usually what you actually want, and sorting makes a long list scannable and diff-friendly. Keep in mind that regular expressions are pattern matchers, not parsers: they are perfect for well-shaped tokens like emails and IDs, but for deeply nested structures such as HTML or JSON a real parser is safer. For the everyday job of pulling repeated tokens out of text, a global regex with a capture group is hard to beat.

Common use cases

  • Harvesting contacts. Pull every email address or phone number out of a document or export into a clean list.
  • Collecting URLs. Extract all links from HTML or text, optionally capturing just the domain or path.
  • Mining logs. Grab all IDs, IPs, timestamps, or error codes that match a pattern for further analysis.
  • Building datasets. Turn messy source text into a sorted, deduplicated column ready to paste into a spreadsheet.

Frequently asked questions

How do I extract only part of each match?

Add a capture group to your pattern and select Group 1, 2, or 3 as the output. The tool then emits just that captured substring instead of the whole match.

Does it remove duplicates?

Yes, when the Unique option is on (the default). It keeps the first occurrence of each distinct result. Turn it off to keep every match including repeats.

Why are some expected matches missing?

Check case sensitivity and the multiline flag, and confirm your pattern is not too strict. Extraction always runs globally, so every non-overlapping match in the text is returned.

Can I extract from HTML reliably?

For simple tokens like URLs or attributes, yes. For nested or malformed HTML structure, a real parser is more robust — regex is best for well-shaped patterns.