Metadata is Not Neutral

As you begin to consider how metadata is relevant to you, and how learning about it might change some of your practices, it’s important to remember that metadata is not neutral. Human beings are unavoidably biased. Bias creeps into every metadata decision. To complicate things, the metadata standards we use reflect the worldviews and value systems of the people that created them.

The formalized metadata that we have today started with library cataloguing systems, which were created by American and European men and embedded with racism, misogyny, and all the other -isms that do not serve us.

For example, in the Dewey Decimal system, which is the most commonly used library cataloguing metadata system, 200 is the number for religion, but 200-290 is all Christianity with only 290-299 reserved for the rest of the entire world’s religions. In addition, Indigenous creation stories are shelved with fairy tales. These are only two examples of problematic categorization. There is a lot of work happening in libraries to dismantle these structural biases but it is time consuming and difficult work. Going forward, inclusive metadata requires inclusion.

library card catalogue

In contemporary metadata for digital media and environments there are many metadata schema and encoding systems to meet different needs. Some have become formal international standards, governed by standards organizations, and some are created and maintained by other organizations of all sizes. The values and intentions of those who create metadata systems are embedded in those systems.

Dublin Core is one of the most ubiquitous metadata frameworks. Its origin is from workshop to discuss metadata semantics in Dublin, Ohio, March 1995. At this event led by the National Center for Supercomputing Applications more than 50 people discussed how a core set of semantics for web-based resources would be extremely useful for categorizing the web for easier search and retrieval.

image of letters spelling S E O

Schema.org, an ostensibly open vocabulary for web encoding, was founded in 2011 by Google, Microsoft, Yahoo and Yandex (dominant search engine in Russia) in their efforts to create an entirely machine-readable web. Now it is managed and primarily maintained by Google.

Schema.org, was created for SEO, eCommerce, crawling purposes, and is the foundation of Google’s ability to monetize search behaviour. It helps Google maintain its knowledge graph which was launched in 2012. By May 2020 it has grown to 500 billion facts on 5 billion entities with linked relationships.

Also, the flip side of all the metadata that powers the internet and electronic resources we use, is that those very systems can and do pervasively generate, collect, store, and share metadata about us. The metadata data that is collected and shared about us is aggregated by large companies who then know a lot about our internet use, our preferences, our digital tools, our purchases, and our health, among other things. The decisions about what metadata to collect and how to use it are made by the people who control the tools.

Metadata can also be used for negative purposes. Google will not index, or will de-index, websites it deems inappropriate, thus making them difficult to discover. Zoom has recently cancelled academic presentations that it claims violate the terms of service without notifying the event hosts. One event featured a speaker on Palestinian human rights. Many of the spaces that we treat as public are not, and our public discourse happens mostly in private spaces under corporate control. Sharing metadata can sometimes put an organization or activity at risk of being cancelled by corporate interests, and can also put individuals in harm’s way by facilitating targeted harassment and abuse.

Metadata is not neutral

If you approach it with critical caution, you can benefit from the effective use of metadata to keep your content organized and discoverable without unintended consequences.