Do not Repair Dangerous Knowledge High quality, Do This As a substitute


Folks don’t know what they imply once they speak about knowledge high quality.

Photograph by No Revisions on Unsplash

A couple of years in the past, our knowledge platform group aimed to pinpoint the first considerations of our knowledge customers. We performed a survey amongst people interacting with our knowledge platform, and unsurprisingly, the primary concern highlighted was knowledge high quality.

The preliminary response, attribute of our engineering mindset, was to develop knowledge high quality tooling. We launched an inside software named Contessa. Regardless of being considerably cumbersome and necessitating important handbook configuration, Contessa facilitated checks for normal dimensions of information high quality, encompassing consistency, timeliness, validity, uniqueness, accuracy and completeness. After operating the software for a few months with tons of of information high quality checks we concluded that:

  • Knowledge high quality checks often assisted knowledge customers in discovering, in a shorter timeframe, that the info was compromised and couldn’t be relied upon.
  • Regardless of the frequent execution of information high quality checks, there was no noticeable enchancment within the subjective notion of information high quality.
  • For a good portion of points, notably these recognized by way of automated knowledge high quality checks reminiscent of consistency or validity, no corrective actions had been ever taken.

Survey and goal measurement are helpful instruments, however nothing can change a dialogue over espresso and cake, as Jane Carruthers writes in her guide, “The Chief Data Officer’s Playbook”. Certainly, I like to recommend this to anyone, as one-on-one conversations helped us uncover one other essential angle of the scenario. A few of these conversations unfolded as follows:

“Hey, you say, that knowledge high quality is poor, what do you imply by that?”

#1 Pricing enterprise analyst: “We’re engaged on organising value for the ancillary product X. Within the dataset we use, we’re lacking knowledge on what was the precise income from the product X per every order. We have now this dataset , but it surely incorporates solely anticipated worth of the income from X at time of the acquisition. We will see additionally the precise income per product, however not on the order granularity.”


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Check Also
Back to top button