Text

URLs

What is the URL Extractor?

You have a 200-line email thread, a Slack channel export, a log file, a customer-support transcript, or some markdown notes — and you want every URL it mentions, in order, deduped, ready to paste somewhere else. Maybe you're audit-trailing what links a customer clicked. Maybe you're sweeping a doc for outbound references before publishing. Maybe you're feeding the URLs into a crawler. Whatever it is, you don't want to scroll and copy-paste 40 times. Drop the text in the left panel and the right panel gives you a JSON array of every URL it found.

The extraction is built on a simple regex that matches http:// and https:// followed by any non-whitespace characters. Each match is then trimmed of trailing punctuation that's almost never part of a URL (period, comma, semicolon, closing paren or bracket — the stuff that comes from sentences ending in a link), and validated through the URL constructor. If the constructor accepts it, it's real; if it throws, it's skipped. Duplicates are removed while preserving document order — first occurrence wins. The approach matches what the WHATWG URL Standard calls "URL parsing" and lines up with how JavaScript regex commonly handle URL detection.

Output is a JSON array of strings, in the order the URLs appeared. If there were no URLs in the text, you get [] — empty array, no error, no toast. The whole thing runs in your browser. Nothing is uploaded, nothing is logged. RFC 3986 is the underlying spec for what counts as a valid URL, and the URL constructor implements it.

How to Use the Extractor

Three steps. Each one matches a button on this page.

Paste the Text or Load the Sample

Drop your text into the left panel — an email body, a log file, a transcript, an article, anything. Click Sample to load a realistic example: an internal team note with four URLs scattered through it. Sample input:

Hi team — please review the order at https://api.shop.example.com/v1/orders/ORD-1001
and also check the dashboard https://admin.shop.example.com/dashboard?tab=orders.
The customer (Ava Chen) reached out via http://support.shop.example.com/tickets/T-4521 — see also our docs at https://docs.shop.example.com/api/orders.

The extractor only catches <code>http://</code> and <code>https://</code> URLs. Bare domains like <code>shop.example.com</code> without a scheme are skipped (they're ambiguous — could be a hostname, a filename, or just text).

Read the URL Array

The right panel shows a JSON array of URLs in document order, with duplicates removed. Trailing punctuation that came from the surrounding sentence is stripped. Each URL is validated through the URL constructor — anything malformed is silently skipped, so the array only contains real, parseable URLs.

Copy or Download

Click Copy to send the JSON to your clipboard, or Download to save it as a .json file. Minify compacts the array onto one line if you need it for a log entry. Use Clear on the input to start with a fresh blob.

When You'd Actually Use This

Auditing customer-support tickets

A customer pastes a long email thread into a ticket. The agent (Marco Rivera) needs to know every URL the customer is referencing. Drop the email body in here, get the array, click through each one. Saves the eyestrain of scrolling and the risk of missing one.

Crawling a Slack or Discord export

You exported a channel and want to feed every linked resource into your link checker or archive bot. The export is JSON or HTML, but the URLs are mixed in with text, emoji, and metadata. Paste the whole thing in here and you have a clean URL list ready for fetch() in a loop.

Pre-publish link audits for blog posts and docs

Priya Patel is about to publish a 4,000-word post and wants to verify every outbound link still works. Paste the markdown source in, get the URL array, run them through a checker. Google's crawlability guidance assumes your links work — broken ones hurt rankings.

Pulling URLs out of structured logs

Application logs often embed URLs in error messages, request traces, or referrer fields. If you're investigating an incident and need to see all the URLs that flowed through a 5-minute window, paste the log slice and dedupe. Often that single deduped list is the fastest way to spot the one weird URL that started the cascade.

Common Questions

Does it catch ftp:// or mailto: or other schemes?

No — only http:// and https://. Web links are what people usually mean by "extract URLs," and the regex is intentionally tight to avoid false positives. If you need other schemes, the WHATWG URL Standard supports them but the matching gets fuzzier (mailto: addresses can look like text, ftp: is rare in modern usage). Open a feature request if you actually have the use case.

How does it handle markdown links like [text](https://example.com)?

Cleanly. The regex finds https://example.com inside the parentheses and the trailing-punctuation trim strips the closing paren. So you get the bare URL out of markdown, BBCode, HTML <a href>, and most other surrounding syntaxes without needing format-specific parsing.

Are duplicates kept in order or deduped?

Deduped, with first-occurrence order preserved. So if your text mentions https://shop.example.com three times, it appears once in the output, at the position of the first occurrence. The dedupe is a simple Set filter — see MDN on Set.

What about URLs with weird characters — pipes, brackets, parens?

Pipes (|) and brackets are valid in URLs but rare in real ones, so the regex catches them as long as they're not preceded by whitespace. Closing brackets and parens at the very end of a match get trimmed because they're much more likely to be sentence punctuation than part of the URL. If you have a real URL ending in ), the URL constructor will still accept it — but you'd need to add a space after it in the text to keep the closing paren attached.

Is there a size limit?

Yes — 1 MB. That's a few thousand pages of text, which is more than enough for any email thread, transcript, or log slice you'd realistically paste. If you have a multi-megabyte file, split it into chunks or run it through grep/ripgrep on the command line first.

Does it work offline?

Yes. Everything runs in your browser — the regex match, the URL validation, the dedupe. The only network traffic is the initial page load. Once the page is open you can disconnect and the tool keeps working. The URL constructor is built into every modern browser per the URL API.

Other URL & Text Tools

Extracting is one operation. Here's what else pairs naturally with it:

Extract URLs from Text