We all live with data. Regardless their origin, format or meaning, data depict world around us. From the accounting department spreadsheets to your lastly received text message, data take many aspects. But can we understand a single datum? Without naming, without description, it would be difficult to understand the meaning of all the data we use everyday. And to understand it, we need other data. We call it metadata. What is metadata? How is it used for? What is the difference with regular data?
What is metadata?
Metadata is all the data we use to describe and understand data. Like data, it can be found in various forms. As an example, creator identity, update time or contributors names are information we use to qualify data. And many others exist. Actually, it would be impossible to list every possible metadata types.
However, some standardization initiatives appeared. The Dublin Core defines a list of metadata describing web pages. Subject, contributors, sources… Linked ISO 15836 lists 15 fields to efficiently describe web resources. But if the Dublin Core successfully qualify wenb pages, it would poorly describe other kind of data like, let’s say, a sales database.
Because each kind of data has its metadata. Understanding a text message needs us to know its author, and recipient. For a temperature statement, location, time and sensor model are relevant. Eventually, data users need to build their own metadata references. What are the differences between metadata and data?
No legal definition separates the two concepts. Actually, purpose decides the right classification. But do we need this split? Very often, metadata can be considered as full-fledged data. Additionally, it can sometimes deliver more relevant knowledge than regular data. In the case of phone calls, operators traditionally consider content as data. Hosts names, call time and length take part of the metadata. Yet, gathering all metadata from one person calls and analyzing it can deliver accurate information about its close relationships. In the case of such an analysis, metadata is not processed differently than other data.
Analyzing data alone would be an error. The real issue remains to retrieve and organize annex data in a clear and structured way. This step even begins at the very start of the analysis process, before profiling. Master Data Management practices cover this area. However, metadata should not be an end in itself. It should only allow a better understanding of data and enable relevant and accurate analysis.