Your CRM has 800,000 contacts. Your sales team touches 5% of them. The rest is a slow-decaying swamp of stale records, fat-fingered entries, and the occasional Russian bot that hit your web forms three years ago.
I'm not exaggerating. In a recent conversation, the head of data engineering at a global biotech company walked me through his reality: 800,000 contacts, 200 salespeople, data flowing in from somewhere between 10 and 20 unique sources — web forms, marketing automation, CMS integrations, third-party vendors. A decade of accumulation with a team one-third the size it should be.
His CRM is held together by five business analysts and three devs. The system needs two to three times that. So they triage. And data quality sits at the bottom of the priority list, underneath feature releases, system migrations, and the latest AI initiative that just got greenlit because someone in leadership heard the word "copilot."
The Perception Threshold
Here's the part nobody talks about. The data doesn't have to be perfect. It has to be good enough that your sales team doesn't revolt.
This same leader described his threshold: keep the error rate under 2-3%, and nobody notices. But the moment it drifts past 5%, salespeople start screaming bloody murder. They don't say "5% of my data is wrong." They say "the whole system is garbage." They discount the 95% that's right along with the 5% that isn't.
That's the psychology of bad data. It doesn't degrade linearly. It collapses trust all at once. And once you lose the salesforce, you're not getting them back with a dashboard or a memo.
So the real job of a data team at scale isn't achieving perfection — it's managing perceptions. Keep the error rate invisible and everyone thinks the CRM works. Let it slip and suddenly you're defending your entire system's existence at a quarterly review.
What's Actually in There
When you look under the hood at a CRM that's been accumulating data for a decade, the archaeology is bleak.
There are contacts that were created when a first-year postdoc filled out a form in 2014. Nobody's talked to them since. The salesperson who owned that record left five years ago. But the contact is still there, sitting in a segment, maybe even getting drip emails.
There are bot submissions that slipped past form validation — Russian bots, spam farms, whatever automated thing decided to fill out your "request a demo" page at 3am. The team finds these periodically and laughs about it, but nobody has bandwidth to go hunt them systematically.
There's data from third-party vendors that arrived pre-stale. One company told me their Dun & Bradstreet data was nine months out of date — organizations listed as active that had already shut down. Another invested millions in Sales Navigator over several years and discovered the fundamental flaw: nobody puts their work email into LinkedIn. The data looked rich. It was practically useless for outreach at scale.
And then there's the stuff that's just hard to reconcile. Your ERP system uses one naming convention. Your CRM uses another. Marketing automation uses whatever the prospect typed into a form. A single university shows up six different ways across your systems. Finance calls the customer an account. Sales calls the customer a person. The CRM says both, neither, and twelve duplicates.
Why Cleanup Doesn't Stick
The instinct is always a cleanup project. Deduplicate the database. Hire a contractor. Run a batch enrichment job. And it works — for about three months. Then new records start flowing in, old records start decaying again, and you're back where you started.
The biotech leader I spoke with put it perfectly: his users keep asking why they're still seeing garbage in the system. His answer is that the salesperson didn't pay enough attention when they created the record. The system doesn't have a self-correcting mechanism. It just accumulates entropy.
This is why batch enrichment is a treadmill. You're not solving the problem. You're resetting a timer.
What Actually Works
What works is verification that runs continuously — not on a schedule, not once a quarter, not as a project. The same way your reps instinctively Google a contact before picking up the phone, except automated and running across your entire CRM.
That's what we built Salmon to do. We check records against the live web as they're accessed. Company still active? Person still at that job? Email domain still matches the employer? We surface the signals that tell you whether a record is current or stale — and we do it in real time, not six months after the data went bad.
The goal isn't perfect data. The goal is staying below that 2-3% threshold where your sales team trusts the system and spends their time selling instead of Googling.
If your CRM has more contacts than your team can realistically maintain, we should talk. Get in touch.