How to Remove Duplicate Lines from Text — Methods and Free Tool
Learn multiple methods to remove duplicate lines from text files and strings, including command line tools, programming approaches, and our free online deduplication tool.
Why Remove Duplicate Lines?
Duplicate lines in text files cause problems across many scenarios:
- Data cleaning: Removing duplicate entries from CSV exports or log files
- List management: Deduplicating email lists, IP addresses, or domain lists
- Code cleanup: Removing duplicate import statements or configuration entries
- Log analysis: Filtering repeated log entries for cleaner analysis
- SEO and content: Ensuring unique meta tags, keywords, or URLs
Method 1: Using FreeToolJet's Remove Duplicate Lines Tool
Our Remove Duplicate Lines tool is the easiest way to deduplicate text:
Step-by-Step Guide
- Open the Remove Duplicate Lines tool
- Paste your text into the input area (or upload a file)
- Choose your options:
- Click "Remove Duplicates"
- Copy the cleaned text or download as a file
Features
- Instant results: No page refresh, real-time processing
- Case sensitivity options: Control how matching works
- Whitespace handling: Optionally trim spaces before comparing
- Preserve order: Keep first occurrence order (or sort alphabetically)
- Statistics: See how many duplicates were removed
- Client-side only: Your text never leaves your browser
Method 2: Command Line Tools
Using sort and uniq (Linux/macOS)
The classic Unix approach:
# Remove duplicates, keep sorted output
# Remove duplicates, keep original order (preserve first occurrence) awk '!seen[$0]++' input.txt > output.txt
# Case-insensitive deduplication sort -f input.txt | uniq -i > output.txt
# Count occurrences before removing
sort input.txt | uniq -c > with_counts.txt
`
Using PowerShell (Windows)
# Remove duplicates, preserve order
# Alternative: preserve original order $lines = Get-Content input.txt $lines | Select-Object -Unique | Out-File output.txt
# Case-insensitive
(Get-Content input.txt).ToLower() | Select-Object -Unique
`
Method 3: Text Editors
VS Code
- Open your file
- Press
Ctrl+Shift+P(orCmd+Shift+Pon Mac) - Type "Sort Lines Ascending" and run it
- Press
Ctrl+Hto open Find/Replace - Enable regex mode (
.*button) - Find:
^(.*)(\n\1)+$ - Replace:
$1 - Click "Replace All"
Sublime Text
- Open file
Edit → Sort LinesEdit → Permute Lines → Unique
Vim
# Sort and remove duplicates
# Remove duplicates without sorting (preserve order)
:g/^\(.*\)$\n\1/d
`
Method 4: Programming Languages
Python
# Method 1: Using dict.fromkeys() (preserves order, Python 3.7+)
with open('input.txt', 'r') as f:
unique_lines = list(dict.fromkeys(lines))
with open('output.txt', 'w') as f: f.writelines(unique_lines)
# Method 2: Using set (doesn't preserve order) with open('input.txt', 'r') as f: unique_lines = set(f.readlines())
with open('output.txt', 'w') as f: f.writelines(unique_lines)
# Method 3: Case-insensitive, preserving order of first occurrence
def remove_duplicates_preserve_order(lines, case_sensitive=False):
seen = set()
result = []
for line in lines:
compare_line = line if case_sensitive else line.lower()
if compare_line not in seen:
seen.add(compare_line)
result.append(line)
return result
`
JavaScript/Node.js
// Method 1: Using Set (doesn't preserve order)
const fs = require('fs');
const lines = fs.readFileSync('input.txt', 'utf8').split('
');
const unique = [...new Set(lines)];
fs.writeFileSync('output.txt', unique.join('
// Method 2: Preserve order function removeDuplicates(lines, caseSensitive = true) { const seen = new Set(); return lines.filter(line => { const key = caseSensitive ? line : line.toLowerCase(); if (seen.has(key)) return false; seen.add(key); return true; }); }
const lines = fs.readFileSync('input.txt', 'utf8').split('
');
const unique = removeDuplicates(lines, false); // case-insensitive
fs.writeFileSync('output.txt', unique.join('
'));
`
Go
import ( "bufio" "fmt" "os" "strings" )
func removeDuplicates(lines []string, caseSensitive bool) []string { seen := make(map[string]bool) var result []string for _, line := range lines { key := line if !caseSensitive { key = strings.ToLower(line) } if !seen[key] { seen[key] = true result = append(result, line) } } return result }
func main() {
file, _ := os.Open("input.txt")
defer file.Close()
var lines []string
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
unique := removeDuplicates(lines, true)
output, _ := os.Create("output.txt")
defer output.Close()
writer := bufio.NewWriter(output)
for _, line := range unique {
fmt.Fprintln(writer, line)
}
writer.Flush()
}
`
Advanced Deduplication Scenarios
Remove Duplicate Lines Based on a Column
For CSV or tabular data, you might want to deduplicate based on a specific column:
def remove_duplicates_by_column(input_file, output_file, column_index): seen = set() with open(input_file, 'r') as infile, open(output_file, 'w') as outfile: reader = csv.reader(infile) writer = csv.writer(outfile) for row in reader: key = row[column_index] if key not in seen: seen.add(key) writer.writerow(row)
# Remove duplicates based on first column (index 0)
remove_duplicates_by_column('data.csv', 'cleaned.csv', 0)
`
Remove Near-Duplicates (Fuzzy Matching)
For lines that are similar but not identical:
def is_similar(line1, line2, threshold=0.9): return SequenceMatcher(None, line1, line2).ratio() > threshold
def remove_near_duplicates(lines, threshold=0.9):
result = []
for line in lines:
if not any(is_similar(line, existing, threshold) for existing in result):
result.append(line)
return result
`
Remove Duplicate Lines with Count
Sometimes you want to know how many times each line appeared:
with open('input.txt', 'r') as f: lines = f.readlines()
counts = Counter(lines)
for line, count in counts.items():
print(f"{count}: {line.strip()}")
`
Performance Considerations
When processing large files:
| Method | Memory Usage | Speed | Preserves Order | |
|---|---|---|---|---|
| `sort | uniq` | Low (streaming) | Fast | No |
awk '!seen[$0]++' | Medium | Fast | Yes | |
Python set() | High | Very Fast | No | |
Python dict.fromkeys() | High | Very Fast | Yes |
For very large files (GBs): Use streaming approaches like awk or process the file in chunks.
Common Pitfalls
- Whitespace differences:
"hello"and"hello "are different lines
- Line ending differences:
\nvs\r\n
- Case sensitivity:
"Hello"and"hello"are different
- Empty lines: Multiple blank lines may be considered duplicates
- Unicode normalization: Accented characters can have multiple representations
When to Use Each Method
| Scenario | Recommended Method | |
|---|---|---|
| Quick one-time cleanup | FreeToolJet Remove Duplicate Lines tool | |
| Large files (GBs) | awk '!seen[$0]++' or streaming Python | |
| Part of a data pipeline | Python script with proper error handling | |
| In a text editor | VS Code / Sublime Text / Vim commands | |
| Windows without WSL | PowerShell | |
| Preserve order | FreeToolJet tool or awk method | |
| Case-insensitive | FreeToolJet tool or `sort -f | uniq -i` |
Related Tools
- Remove Duplicate Lines — Remove duplicates from text
- Word Counter — Count words, characters, and lines
- Text Case Converter — Change text case (affects matching)
- Diff Checker — Compare two texts for differences