CSV Deduplicator

Drop a CSV in, pick the columns that define a duplicate, get a deduplicated output. Choose case-insensitive matching, whitespace trimming, or both for forgiving comparison. Useful for cleaning email lists, removing repeated rows from log exports, or normalising address data before import.

How to use the CSV Deduplicator

Paste a CSV (the first row is treated as the header). Check the columns that should define uniqueness — the rest of the columns travel with the row but don’t affect duplicate detection. Toggle case-insensitive and whitespace-trim for fuzzier matching (e.g., “[email protected] ” should equal “[email protected]” for email deduplication). Pick whether to keep the first or last occurrence — last is useful when later rows have more recent data.

About CSV Deduplicator

Deduplicating a CSV in Excel is doable but error-prone: the Remove Duplicates dialog wants the right columns selected, doesn’t offer case-insensitive matching, and silently corrupts long numeric IDs by converting them to scientific notation. Command-line sort -u works for whole-line dedup but can’t target specific columns. This tool gives you precise control: pick the dedup key, control trim and case, and choose first-vs-last semantics.

The output preserves every column and the original row order (minus the dropped duplicates). Header row passes through unchanged. RFC 4180 quoting is respected on both ends — fields with commas and newlines survive. A status line tells you how many input rows there were and how many remain after dedup so you can sanity-check the result.

Common use cases

  • Email list cleaning — dedup by email (case-insensitive) before sending a campaign.
  • Log file dedup — collapse repeated rows from a multi-source log export.
  • Address normalisation — dedup by (street, city, postcode) with trim+case-insensitive matching.
  • Survey response dedup — keep the last response per respondent ID when participants resubmit.

Frequently asked questions

Does it modify the kept row in any way?

No. The dedup key is normalised for comparison only \xE2\x80\x94 the row written out is the original row, untouched. If you want normalised output, do that as a separate pass.

What if I don't pick any columns?

It dedups on whole-row equality (every column, original case, no trim). Effectively the same as sort -u minus the sort.

How does "Keep last" work with large files?

It still processes in one pass but stores the last-seen index for every key. Memory usage is proportional to the number of unique keys, not the file size.