What is Remove Duplicate Lines?
A Remove Duplicate Lines computes remove duplicate lines from the inputs you provide. It applies the standard formula to the values you enter and returns the result instantly, without sending any data to a server. Strip exact-duplicate lines from a pasted list while keeping the original order of the first occurrence.
Remove Duplicate Lines
Strip exact duplicate lines from any list. Keeps the first occurrence in its original position.
TLDR
Walks the input line by line. Each new line is kept the first time you see it; later duplicates are dropped. Order of first occurrence is preserved.
How to use this tool
- Paste your list. Drop in the email list, URL list, tag list, or any line-per-item content.
- Press Dedupe. Duplicates disappear; first occurrences stay in their original positions.
- Copy the unique list. Paste back into your spreadsheet, mailing tool, or wherever you need a clean list.
- Lowercase first for case-insensitive dedupe. If 'apple' and 'Apple' should match, run through the lowercase converter before deduping.
Real-world scenarios where this tool helps
Email list cleaning
Paste a list of addresses from multiple sources and get a unique list out.
URL deduplication
Combine link lists from several pages and dedupe before crawling.
Tag list cleanup
Pasted tag exports from CMS often contain near-duplicates; this catches the exact ones.
Log triage
Reduce a noisy log to its distinct lines for pattern review.
Vocabulary lists
Build a unique word-frequency list by pasting word-per-line input and deduplicating.
What this tool does
- Splits the input on LF and CRLF line endings.
- Keeps each unique line the first time it appears.
- Drops every later occurrence of an exact match (case-sensitive, whitespace-sensitive).
- Preserves the original order of first occurrences.
- Joins the survivors back with LF.
What it does NOT do
- Does not match case-insensitively. 'apple' and 'Apple' are different lines.
- Does not normalize whitespace. 'apple ' (trailing space) and 'apple' are different lines.
- Does not sort the output - order matches first-appearance order.
- Does not detect fuzzy duplicates - typos slip through.
- Does not save anything.
About duplicate line removal
Deduplicating a list sits behind a surprising amount of everyday plumbing: SQL SELECT DISTINCT, Unix sort -u, Python set(), and spreadsheet "Remove duplicates" all do the same job at different layers. This page is the browser-only version. You paste a list, the tool keeps one copy of each unique line in the order it first appeared, and the output is yours to copy. No upload, no API call, no length cap.
The dedupe key is the raw line string. Two lines are the same only if every byte matches, including case, accents, leading and trailing whitespace, and zero-width characters. That is deliberate: loose matching is destructive, so we leave it as a separate step the user controls.
How the algorithm works
seen = empty Set
for each line in input.split(/\r?\n/):
if line not in seen:
seen.add(line); output.push(line)
return output.join('\n')
Cost is O(n) time and O(k) memory, where n is input lines and k is unique lines. Set lookup is amortised constant time because modern engines back Set with a hash table. A 100,000 line list dedupes in well under a second; the bottleneck is textarea repaint, not the matching loop.
Worked example
Input (10 lines, mixed case, 6 unique):
alice@example.com bob@example.com alice@example.com carol@example.com Alice@example.com bob@example.com dave@example.com alice@example.com eve@example.com carol@example.com
Line 1 is new, kept. Line 2 is new, kept. Line 3 matches line 1 byte for byte, dropped. Line 4 is new, kept. Line 5 (Alice with uppercase A) differs by one byte and is kept as a separate row. Lines 6, 8 and 10 match earlier ones and are dropped. Lines 7 and 9 are new.
Dedupe options across common tools
| Tool | Syntax | Case | Order |
|---|---|---|---|
| This page | paste and click | sensitive | preserved |
| Unix sort | sort -u file | sensitive | ASCII sorted |
| awk one-liner | awk '!s[$0]++' | sensitive | preserved |
| Python | list(dict.fromkeys(x)) | sensitive | preserved (3.7+) |
| Excel | Data, Remove Duplicates | insensitive default | preserved |
| SQL | SELECT DISTINCT col | collation dep. | not guaranteed |
Common mistakes and pitfalls
- Assuming case-insensitive dedupe. Lowercase your list first if case should not matter. Email addresses are technically case-insensitive in the local part but the SMTP servers that read them often treat case as significant; pick a rule before deduping.
- Ignoring trailing whitespace. Spreadsheet paste, Word copy, and many CMS exports carry trailing spaces that are invisible to the eye but break exact-match dedupe. Trim each line first if survivors look duplicated.
- Treating dedupe as deduplication-by-meaning.
New York City,NYC, andNew York, NYare three different strings and all survive. Aliases need a lookup table, not a hash set. - Pasting CSV rows and treating them as lines. A quoted field with an embedded newline is one logical row but multiple physical lines, so the row gets split and duplicates pass through.
- Mixing Unicode normalisation forms.
e+ combining acute accent (NFD) andéprecomposed (NFC) look identical but hash differently. If two sources contribute the same word and the tool keeps both, run them through a normaliser. - Forgetting the order rule. First occurrence wins. If you sort your input before pasting, the kept copy will be whichever sorted first. If you pasted in chronological order, the earliest event wins.
Related tools
Frequently asked questions
Is the match case-sensitive?
Yes. The dedupe key is the literal line string, so 'apple' and 'Apple' are different lines. If you want case-insensitive dedupe, run the input through a lowercase converter first, then paste the result back here.
What about trailing whitespace?
It counts. 'apple ' (with a trailing space) and 'apple' are different lines. Spreadsheet exports often add invisible trailing spaces, so trim first if your duplicates are not getting caught.
Does it sort the output?
No. The tool preserves the order of first occurrence. The line that appears earliest in the input survives; later duplicates are dropped. To sort the result, paste into a spreadsheet column or use a follow-up sort tool.
Will it work on very long lists?
Yes. The algorithm is O(n) with a hash set: it touches each line once and looks up duplicates in constant time. Lists of 100,000+ lines run in well under a second on a modern laptop, limited mostly by textarea paint speed in the browser.
Can I dedupe by partial match?
No. This tool matches whole lines exactly. For fuzzy or partial matches (collapsing 'New York City' and 'NYC' to one row, or matching by the first 50 characters), use a script with regex or a fuzzy-match library.
What if my list has CSV-style embedded newlines?
Embedded newlines inside quoted CSV fields will break the line-based logic, because each physical line break is treated as a row boundary. Parse with a real CSV reader (Excel, Google Sheets, Python csv module) first, dedupe in the parsed form, then export back.
Will it handle Unicode and emoji correctly?
Yes. The matcher uses JavaScript string equality, which is code-point exact. Emoji, accented characters, and non-Latin scripts (Cyrillic, CJK, Arabic, Devanagari) all dedupe correctly as long as the visible character sequences are identical. Normalisation forms (NFC vs NFD) can produce visually identical strings that hash differently, so paste from one source when possible.
