Data high quality has been broadly mentioned over the previous 12 months. The growing adoption of information contracts, knowledge merchandise, and knowledge observability instruments definitely reveals knowledge practitioners’ dedication to offering high-quality knowledge to their customers. All of us like to see this!
One important constructing block in knowledge options is knowledge exams. It’s one of the elementary and sensible methods to validate knowledge high quality and is explicitly or implicitly embedded in lots of knowledge options.
Whereas its effectiveness has yielded vital advantages for knowledge groups, it additionally raises questions concerning tips on how to maximize its potential values as a result of having extra exams doesn’t essentially imply having increased knowledge high quality. On this article, I wish to present you some approaches to designing knowledge exams. Hopefully, they will shed some mild right here.
It’s value noting that you’re advisable to mix these approaches and discover a stability that works greatest for you.
High quality > Amount
I’m a type of who love creating exams as a result of they offer me elevated confidence in my options. With a background in Software program Engineering, I as soon as lived by the motto “The extra exams, the merrier”. I used to be at all times enthusiastic about knowledge frameworks providing easy knowledge take a look at creation strategies.
Nevertheless, I underestimated the negative effects of getting an extreme variety of knowledge exams. (Is there even a facet impact? YES!) Let’s first perceive the excellence between knowledge exams and unit exams (i.e. logic exams). In brief, a unit take a look at is supposed to validate the correctness of the code’s logic that we’ve written. The extra unit exams we have now, the extra assured we’re in dealing with edge circumstances. However an information take a look at goes past the code logic, it additionally examines the standard of the supply knowledge, knowledge pipeline configurations, upstream dependencies, and so forth. The metrics are countless and might be overwhelming. It’s tempting to create quite a few exams simply in case, however they don’t at all times carry worth and would possibly introduce pointless noise. For instance, let’s face…