IQI® Wissen

Big Data Quality: What’s right, what’s wrong?

Large environments such as the internet reveal human wisdom and creativity. With the number of data sources and data producers also the number of different opinions, synonyms, and inconsistencies usually rises. Big data sets are by nature full of different opinions and inconsistencies. But what is right or wrong in this perspective? Many times there will not be a single truth for many reasons. For example data may be used in different contexts for different purposes and, therefore, may have multiple truths. Moreover, data may point to different states in the timeline. Last but not least there are some truths that we may not be aware of yet. When I analyzed the different genders of persons documented in wikipedia I discovered the values „Nerd“, „Puppet“, and „Cylon“ which were associated to fictious figures of TV shows. We could argue whether all figures should be associated to male or female. However, achieving a general truth about this is neither possible, nor feasible without suppressing the creativity of another group of people. Moreover, it harms the community and most likely equals censorship.

However, when we analyze big data sets we should be aware of the variety of truths that we will most likely face and include them into interpretation of data. It is totally fine to apply a specific perspective for data consuming purposes, but caution is required when attempting to apply subjective truths for general big data cleansing. What are your experiences and opinions about data quality in big data environments?

Diese Artikel könnten Sie auch interessieren

Die Wahrheit über Datenqualität: Warum die Erde manchmal eine Scheibe ist

Die Wahrheit über Datenqualität: Warum die Erde manchmal eine Scheibe ist

In der deutschen Ausgabe von Wikipedia gibt es derzeit keinen Konsens darüber, was Wahrheit eigentlich bedeutet. Ist Wahrheit immer eine…
Naturinstinkte bei der Datenmigration: Jagen und Sammeln

Naturinstinkte bei der Datenmigration: Jagen und Sammeln

Manchmal ist es nicht einfach, sich von etwas zu trennen, obwohl es durchaus sinnvoll ist. Schließlich liegt das Sammeln ja…

Data Quality vs. Big Data: A Popularity Analysis

I mentioned earlier in one of my blog posts that the interest in Big Data is currently tremendous, while Data…