How Duplicate Detection Works
Sorty identifies duplicate files using content-based hashing (SHA-256), not just filename matching. This means:
Files with identical content are grouped, regardless of name
Renamed files are detected as duplicates
Files in different locations are found
Safe deletion with recovery options
Duplicates are found by computing a cryptographic hash of file contents, ensuring 100% accuracy.
Detection Methods
Sorty offers three comparison methods:
Most Accurate Computes SHA-256 hash of entire file contents: public static func computeSHA256 ( for url : URL) -> String ? {
guard let data = try ? Data ( contentsOf : url) else { return nil }
let hash = SHA256. hash ( data : data)
return hash. compactMap { String ( format : "%02x" , $0 ) }. joined ()
}
Pros:
100% accurate
Detects renamed files
Cryptographically secure
Cons:
Slower for large files
CPU-intensive
Good Balance Groups by filename and size: let key = " \( file. name ) _ \( file. size ) "
Pros:
Very fast
Good for large batches
Low CPU usage
Cons:
May miss renamed duplicates
False positives possible (same name + size, different content)
Strictest Groups by filename, size, and modification date: let timestamp = file. modificationDate ? . timeIntervalSince1970 ?? 0
let key = " \( file. name ) _ \( file. size ) _ \( Int (timestamp) ) "
Pros:
Reduces false positives from Fast mode
Still very quick
Cons:
Misses duplicates with different timestamps
Not useful for files copied at different times
Use Exact mode for critical files. Use Fast mode for quick scans of large directories.
Duplicate Groups
Files are grouped by hash:
public struct DuplicateGroup : Identifiable {
public let id: UUID
public let hash: String
public let files: [FileItem]
public let totalSize: Int64
public let potentialSavings: Int64 // Size - one copy
public var duplicateCount: Int {
max ( 0 , files. count - 1 )
}
}
Example Group
Group: 3 files (2 duplicates)
Hash: a7f3b9c8...
Total Size: 15.2 MB
Potential Savings: 10.1 MB (keeping 1, removing 2)
Files:
1. /Users/me/Downloads/photo.jpg (5.1 MB) ← Oldest
2. /Users/me/Desktop/photo.jpg (5.1 MB)
3. /Users/me/Photos/photo.jpg (5.1 MB)
Semantic Duplicates
In addition to exact matches, Sorty can detect semantic duplicates (similar but not identical):
public struct SemanticDuplicateGroup : Identifiable {
public let files: [FileItem]
public let similarity: Double // 0.0-1.0
public let groupType: GroupType
public let recommendation: DuplicateRecommendation
}
public enum GroupType : String {
case nearDuplicate = "Near Duplicate"
case versionedFile = "Versioned File"
case resizedImage = "Resized Image"
case reencoded = "Re-encoded"
}
Detection Methods
Detects images that are the same but different resolutions:
Compare aspect ratios
Check EXIF similarity
Visual similarity (if vision mode enabled)
Example: photo_original.jpg (4032x3024)
photo_thumbnail.jpg (800x600)
Similarity: 95%
Identifies file versions by name patterns:
file_v1.pdf, file_v2.pdf
report_draft.docx, report_final.docx
design_2025-01-01.psd, design_2025-01-15.psd
Recommendation: Keep newest or largest
Finds files with minor differences:
Slight edits
Cropped images
Compressed versions
Requires manual review.
Semantic Similarity Threshold
Configure in Settings → Duplicates :
public var semanticSimilarityThreshold: Double = 0.85 // 85% similar
Lower threshold = more matches (higher false positive rate). Higher threshold = fewer matches (more conservative).
Unified Duplicate Groups
Exact and semantic duplicates are presented together:
public enum UnifiedDuplicateGroup : Identifiable {
case exact (DuplicateGroup)
case semantic (SemanticDuplicateGroup)
public var confidenceLevel: ConfidenceLevel {
switch self {
case . exact :
return . high
case . semantic ( let group) :
if group.similarity >= 0.98 { return . high }
else if group.similarity >= 0.90 { return . medium }
else { return . low }
}
}
}
public enum ConfidenceLevel : String {
case high = "Safe to Merge"
case medium = "Review Suggested"
case low = "Manual Review"
}
Safe Deletion
When enabled (recommended), “deleted” duplicates aren’t immediately removed:
Mark for Deletion
Files are flagged but not deleted from disk.
Track in History
Deletion is recorded in organization history.
Restore if Needed
Go to History → find the cleanup session → click Restore .
Confirm Deletion
Only after confirmation are files permanently removed.
Disabling Safe Deletion means files are immediately sent to Trash and cannot be recovered through Sorty.
Bulk Operations
Quick actions for managing duplicates:
Delete All (Keep Newest)
Keeps the most recently modified file:
let newest = group. files . max { ( $0 . modificationDate ?? . distantPast ) < ( $1 . modificationDate ?? . distantPast ) }
let toDelete = group. files . filter { $0 . id != newest ? . id }
Delete All (Keep Oldest)
Keeps the original (oldest) file:
let oldest = group. files . min { ( $0 . creationDate ?? . distantFuture ) < ( $1 . creationDate ?? . distantFuture ) }
let toDelete = group. files . filter { $0 . id != oldest ? . id }
Delete All (Keep Largest)
Keeps the file with the largest size (e.g., highest quality image):
let largest = group. files . max { $0 . size < $1 . size }
let toDelete = group. files . filter { $0 . id != largest ? . id }
Custom Selection
Manually select which files to keep/delete:
Review each duplicate group
Select files to delete (checkboxes)
Click Delete Selected
Duplicate Detection Manager
Manages the scanning process:
@MainActor
public class DuplicateDetectionManager : ObservableObject {
@Published public var state: DuplicateScanState = . idle
@Published public var duplicateGroups: [DuplicateGroup] = []
@Published public var semanticGroups: [SemanticDuplicateGroup] = []
@Published public var scanProgress: Double = 0
public var totalDuplicates: Int {
duplicateGroups. reduce ( 0 ) { $0 + $1 . duplicateCount }
}
public var potentialSavings: Int64 {
duplicateGroups. reduce ( 0 ) { $0 + $1 . potentialSavings }
}
}
Scan Process
Preparing
Initialize scan state, clear previous results.
Computing Hashes
Calculate SHA-256 for each file: for i in 0 ..< files. count {
if files[i].sha256Hash == nil {
files[i]. sha256Hash = HashUtility. computeSHA256 (
for : URL ( fileURLWithPath : files[i]. path )
)
}
scanProgress = Double (i + 1 ) / Double (total)
}
Grouping
Group files by hash: var hashGroups: [ String : [FileItem]] = [ : ]
for file in files {
guard let hash = file.sha256Hash else { continue }
hashGroups[hash, default : []]. append (file)
}
let duplicates = hashGroups
. filter { $0 . value . count > 1 }
. map { DuplicateGroup ( hash : $0 . key , files : $0 . value ) }
. sorted { $0 . potentialSavings > $1 . potentialSavings }
Semantic Analysis (Optional)
If enabled, run semantic duplicate detection: if settings.includeSemanticDuplicates {
let semanticDetector = SemanticDuplicateDetector (
similarityThreshold : settings. normalizedSemanticSimilarityThreshold
)
semanticGroups = await semanticDetector. findSemanticDuplicates ( in : files)
}
Complete
Update state, display results.
Scan Settings
public struct DuplicateSettings : Codable {
public var comparisonMethod: ComparisonMethod = . exact
public var includeSemanticDuplicates: Bool = false
public var semanticSimilarityThreshold: Int = 85 // 0-100
public var safeDeletion: Bool = true
public var scanHiddenFiles: Bool = false
}
public enum ComparisonMethod : String , Codable {
case exact = "Exact (SHA-256)"
case fast = "Fast (Name + Size)"
case metadata = "Metadata (Name + Size + Date)"
}
Hash Caching
Hashes are cached in FileItem.sha256Hash:
if files[i].sha256Hash == nil {
files[i]. sha256Hash = computeHash ()
}
Re-scanning the same directory uses cached hashes, making subsequent scans much faster.
Incremental Progress
UI updates are yielded periodically:
if i % 10 == 0 {
await Task. yield () // Let UI update
}
Cancellation Support
Scans can be cancelled mid-process:
if Task.isCancelled {
isScanning = false
state = . idle
return
}
Potential Savings Display
Formatted savings with human-readable units:
public var formattedSavings: String {
ByteCountFormatter. string ( fromByteCount : potentialSavings, countStyle : . file )
}
// Examples:
// 1.5 GB
// 245.3 MB
// 12.8 KB
Integration with Organization
Duplicate detection runs automatically during organization:
private func duplicateDetectionPhase ( files : [FileItem]) async throws -> ([FileItem], String ) {
updateProgress ( 0.21 , stage : "Checking for duplicates..." )
let detector = DuplicateDetector ()
var updatedFiles = files
if updatedFiles. contains ( where : { $0 . sha256Hash == nil }) {
await detector. computeHashes ( for : & updatedFiles)
}
let duplicates = await detector. findDuplicates ( in : updatedFiles)
await MainActor. run {
self . detectedDuplicates = duplicates
}
if aiConfig ? .detectDuplicates ?? true {
return (updatedFiles, PromptContextHelper. duplicateContext ( from : duplicates))
}
return (updatedFiles, "" )
}
Duplicate Context in AI Prompt
Duplicates are included in the AI’s organization context:
DUPLICATE FILES DETECTED:
- 5 groups of duplicate files found
- Total potential savings: 2.3 GB
Group 1: photo.jpg (3 copies)
/Downloads/photo.jpg
/Desktop/photo.jpg
/Photos/photo.jpg
Consider organizing these files together and removing duplicates.
Duplicate Handling Strategies
Keep in Place
Merge to One Location
Smart Keep
Archive Duplicates
Leave all copies where they are, flag for manual review.
Move all duplicates to a single folder: Duplicates/
├── photo.jpg (original)
├── photo (1).jpg ← marked for deletion
└── photo (2).jpg ← marked for deletion
Automatically keep the “best” copy:
Highest quality (largest size)
Most recently modified
In preferred location (e.g., organized folder vs. Downloads)
Move duplicates to Archives/Duplicates-[Date]/ for review.
CLI Commands
# Scan for duplicates
sorty duplicates /path/to/scan
# Auto-start scan
sorty duplicates /path/to/scan --auto
# Use specific comparison method
sorty duplicates /path/to/scan --method exact
sorty duplicates /path/to/scan --method fast
Deeplinks
Deeplink Description sorty://duplicatesOpen duplicates view sorty://duplicates?path=/Users/me/DownloadsScan specific path sorty://duplicates?path=/Users/me/Downloads&autostart=trueAuto-start scan
Workspace Health Monitor directory health and clutter
File Organization AI-powered intelligent organization
The Learnings Learn from your organization habits