Extract URLs, Emails, Phones from Text

You have a wall of text — meeting notes, scraped HTML, an email thread, log output — and you want just the URLs (or just the emails, or just the phone numbers). Paste it here and get a deduplicated, validated list of each type. Useful for building contact lists from documents, extracting links from blog posts, or sanitizing data before analysis.

How to use the Extract URLs, Emails, Phones from Text

Paste text. The tool extracts emails, URLs, and phone numbers separately. Toggle dedupe, sort, and output format. Each extracted list has a "Copy" button.

About Extract URLs, Emails, Phones from Text

Email extraction uses a permissive RFC 5322-compatible regex (handles dots, plus-aliasing, subdomains). URL extraction matches http://, https://, and bare www. domains, with path / query / fragment components. Phone number extraction uses libphonenumber-style heuristics adapted for regex — recognizes North American 10-digit formats, international with + country code, and common separators (spaces, dashes, parentheses, dots).

Phone number detection is the hardest of the three because the format varies wildly by country (no fixed length, no fixed separators). The implementation flags candidates with 10+ digits in a phone-shaped pattern, then validates the country code if present. This catches the vast majority of real phone numbers but will have some false positives in numeric-heavy text (UUIDs, IDs).

Common use cases

  • Building contact lists — extract emails from a notes doc to import into a CRM.
  • Auditing scraped content — find every URL in a blog post you copied.
  • Data sanitization — remove URLs / emails / phones from sensitive text before sharing.
  • Log analysis — pull out URLs accessed from raw log text.
  • SEO research — extract competitor links from an article.

Frequently asked questions

Does it find phone numbers in all formats?

Most common formats: (415) 555-2010, +44 20 7946 0958, 415-555-2010, 415.555.2010. May miss country-specific formats with unusual separators.

What about URLs without protocol?

Recognized if they start with www.. Bare domains like example.com without protocol or www aren't extracted (too many false positives in normal text).

Are extracted emails validated?

Basic regex validation. For "is this email deliverable?" you need an SMTP probe or service like Hunter / NeverBounce.