TextMay 24, 2026

How to Remove Duplicate Lines from Text — Methods and Free Tool

Learn multiple methods to remove duplicate lines from text files and strings, including command line tools, programming approaches, and our free online deduplication tool.

Why Remove Duplicate Lines?

Duplicate lines in text files cause problems across many scenarios:

Data cleaning: Removing duplicate entries from CSV exports or log files
List management: Deduplicating email lists, IP addresses, or domain lists
Code cleanup: Removing duplicate import statements or configuration entries
Log analysis: Filtering repeated log entries for cleaner analysis
SEO and content: Ensuring unique meta tags, keywords, or URLs

Method 1: Using FreeToolJet's Remove Duplicate Lines Tool

Our Remove Duplicate Lines tool is the easiest way to deduplicate text:

Step-by-Step Guide

Open the Remove Duplicate Lines tool
Paste your text into the input area (or upload a file)
Choose your options:
Click "Remove Duplicates"
Copy the cleaned text or download as a file

Features

Instant results: No page refresh, real-time processing
Case sensitivity options: Control how matching works
Whitespace handling: Optionally trim spaces before comparing
Preserve order: Keep first occurrence order (or sort alphabetically)
Statistics: See how many duplicates were removed
Client-side only: Your text never leaves your browser

Method 2: Command Line Tools

Using `sort` and `uniq` (Linux/macOS)

The classic Unix approach:

# Remove duplicates, keep sorted output

# Remove duplicates, keep original order (preserve first occurrence) awk '!seen[$0]++' input.txt > output.txt

# Case-insensitive deduplication sort -f input.txt | uniq -i > output.txt

# Count occurrences before removing sort input.txt | uniq -c > with_counts.txt `

Using PowerShell (Windows)

# Remove duplicates, preserve order

# Alternative: preserve original order $lines = Get-Content input.txt $lines | Select-Object -Unique | Out-File output.txt

# Case-insensitive (Get-Content input.txt).ToLower() | Select-Object -Unique `

Method 3: Text Editors

VS Code

Open your file
Press Ctrl+Shift+P (or Cmd+Shift+P on Mac)
Type "Sort Lines Ascending" and run it
Press Ctrl+H to open Find/Replace
Enable regex mode (.* button)
Find: ^(.*)(\n\1)+$
Replace: $1
Click "Replace All"

Sublime Text

Open file
Edit → Sort Lines
Edit → Permute Lines → Unique

Vim

# Sort and remove duplicates

# Remove duplicates without sorting (preserve order) :g/^$.*$$\n\1/d `

Method 4: Programming Languages

Python

# Method 1: Using dict.fromkeys() (preserves order, Python 3.7+)
with open('input.txt', 'r') as f:

unique_lines = list(dict.fromkeys(lines))

with open('output.txt', 'w') as f: f.writelines(unique_lines)

# Method 2: Using set (doesn't preserve order) with open('input.txt', 'r') as f: unique_lines = set(f.readlines())

with open('output.txt', 'w') as f: f.writelines(unique_lines)

# Method 3: Case-insensitive, preserving order of first occurrence def remove_duplicates_preserve_order(lines, case_sensitive=False): seen = set() result = [] for line in lines: compare_line = line if case_sensitive else line.lower() if compare_line not in seen: seen.add(compare_line) result.append(line) return result `

JavaScript/Node.js

// Method 1: Using Set (doesn't preserve order)
const fs = require('fs');
const lines = fs.readFileSync('input.txt', 'utf8').split('
');
const unique = [...new Set(lines)];
fs.writeFileSync('output.txt', unique.join('

// Method 2: Preserve order function removeDuplicates(lines, caseSensitive = true) { const seen = new Set(); return lines.filter(line => { const key = caseSensitive ? line : line.toLowerCase(); if (seen.has(key)) return false; seen.add(key); return true; }); }

const lines = fs.readFileSync('input.txt', 'utf8').split(' '); const unique = removeDuplicates(lines, false); // case-insensitive fs.writeFileSync('output.txt', unique.join(' ')); `

Go

import ( "bufio" "fmt" "os" "strings" )

func removeDuplicates(lines []string, caseSensitive bool) []string { seen := make(map[string]bool) var result []string for _, line := range lines { key := line if !caseSensitive { key = strings.ToLower(line) } if !seen[key] { seen[key] = true result = append(result, line) } } return result }

func main() { file, _ := os.Open("input.txt") defer file.Close() var lines []string scanner := bufio.NewScanner(file) for scanner.Scan() { lines = append(lines, scanner.Text()) } unique := removeDuplicates(lines, true) output, _ := os.Create("output.txt") defer output.Close() writer := bufio.NewWriter(output) for _, line := range unique { fmt.Fprintln(writer, line) } writer.Flush() } `

Advanced Deduplication Scenarios

Remove Duplicate Lines Based on a Column

For CSV or tabular data, you might want to deduplicate based on a specific column:

def remove_duplicates_by_column(input_file, output_file, column_index): seen = set() with open(input_file, 'r') as infile, open(output_file, 'w') as outfile: reader = csv.reader(infile) writer = csv.writer(outfile) for row in reader: key = row[column_index] if key not in seen: seen.add(key) writer.writerow(row)

# Remove duplicates based on first column (index 0) remove_duplicates_by_column('data.csv', 'cleaned.csv', 0) `

Remove Near-Duplicates (Fuzzy Matching)

For lines that are similar but not identical:

def is_similar(line1, line2, threshold=0.9): return SequenceMatcher(None, line1, line2).ratio() > threshold

def remove_near_duplicates(lines, threshold=0.9): result = [] for line in lines: if not any(is_similar(line, existing, threshold) for existing in result): result.append(line) return result `

Remove Duplicate Lines with Count

Sometimes you want to know how many times each line appeared:

with open('input.txt', 'r') as f: lines = f.readlines()

counts = Counter(lines)

for line, count in counts.items(): print(f"{count}: {line.strip()}") `

Performance Considerations

When processing large files:

Method	Memory Usage	Speed	Preserves Order
`sort	uniq`	Low (streaming)	Fast	No
`awk '!seen[$0]++'`	Medium	Fast	Yes
Python `set()`	High	Very Fast	No
Python `dict.fromkeys()`	High	Very Fast	Yes

For very large files (GBs): Use streaming approaches like awk or process the file in chunks.

Common Pitfalls

Whitespace differences: "hello" and "hello " are different lines

Line ending differences: \n vs \r\n

Case sensitivity: "Hello" and "hello" are different

Empty lines: Multiple blank lines may be considered duplicates

Unicode normalization: Accented characters can have multiple representations

When to Use Each Method

Scenario	Recommended Method
Quick one-time cleanup	FreeToolJet Remove Duplicate Lines tool
Large files (GBs)	`awk '!seen[$0]++'` or streaming Python
Part of a data pipeline	Python script with proper error handling
In a text editor	VS Code / Sublime Text / Vim commands
Windows without WSL	PowerShell
Preserve order	FreeToolJet tool or `awk` method
Case-insensitive	FreeToolJet tool or `sort -f	uniq -i`

Related Tools

Remove Duplicate Lines — Remove duplicates from text
Word Counter — Count words, characters, and lines
Text Case Converter — Change text case (affects matching)
Diff Checker — Compare two texts for differences

How to Remove Duplicate Lines from Text — Methods and Free Tool

Why Remove Duplicate Lines?

Method 1: Using FreeToolJet's Remove Duplicate Lines Tool

Step-by-Step Guide

Features

Method 2: Command Line Tools

Using `sort` and `uniq` (Linux/macOS)

Using PowerShell (Windows)

Method 3: Text Editors

VS Code

Sublime Text

Vim

Method 4: Programming Languages

Python

JavaScript/Node.js

Go

Advanced Deduplication Scenarios

Remove Duplicate Lines Based on a Column

Remove Near-Duplicates (Fuzzy Matching)

Remove Duplicate Lines with Count

Performance Considerations

Common Pitfalls

When to Use Each Method

Related Tools

Try These Tools

More Articles

How to Remove Duplicate Lines from Text — Methods and Free Tool

Why Remove Duplicate Lines?

Method 1: Using FreeToolJet's Remove Duplicate Lines Tool

Step-by-Step Guide

Features

Method 2: Command Line Tools

Using sort and uniq (Linux/macOS)

Using PowerShell (Windows)

Method 3: Text Editors

VS Code

Sublime Text

Vim

Method 4: Programming Languages

Python

JavaScript/Node.js

Go

Advanced Deduplication Scenarios

Remove Duplicate Lines Based on a Column

Remove Near-Duplicates (Fuzzy Matching)

Remove Duplicate Lines with Count

Performance Considerations

Common Pitfalls

When to Use Each Method

Related Tools

Try These Tools

More Articles

Using `sort` and `uniq` (Linux/macOS)