Question 1

What is record matching?

Accepted Answer

Record matching is the process of comparing two or more data records to determine whether they refer to the same real-world entity. Techniques range from exact string matching to fuzzy matching and machine learning-based approaches, each trading off between precision and recall.

Question 2

What is fuzzy matching?

Accepted Answer

Fuzzy matching compares records that are similar but not identical — handling typos, abbreviations, nickname variants, and formatting differences. Common algorithms include Levenshtein distance (edit distance), Jaro-Winkler similarity (weighted for prefix matches), and phonetic matching (Soundex, Metaphone).

Question 3

How does ML-based record matching work?

Accepted Answer

Machine learning-based matching trains models on labeled pairs of records (match/non-match) to learn which field combinations best predict a true match. ML models consider all fields simultaneously and handle ambiguous cases better than rule-based approaches, especially at scale.

Question 4

What is blocking in record matching?

Accepted Answer

Blocking is a performance optimization that groups records into buckets (by first letter, zip code, company, etc.) and only compares records within the same block. Without blocking, comparing every record to every other record is computationally infeasible at scale — a million records would require trillions of comparisons.

Question 5

How does Salmon perform record matching?

Accepted Answer

Salmon combines exact, fuzzy, and ML-based matching across multiple fields — name, email, company, title, and social profiles. Cross-source verification from live web data adds a second confirmation layer. Confidence scores on every match let teams set thresholds for automatic merging vs human review.

What is record matching?

Record matching techniques

Challenges at scale

How Salmon performs record matching

Record matching in practice

Related concepts

See real-time enrichment on your data.