Metadata is not Neutral

How Bias Creeps into Metadata Decisions

As you begin to consider how metadata is relevant to you, and how learning about it might change some of your practices, it’s important to remember that metadata is not neutral. Human beings are unavoidably biased. Bias creeps into every metadata decision. To complicate things, the metadata standards we use reflect the worldviews and value systems of the people that created them.

The formalized metadata that we have today started with library cataloguing systems, which were created by American and European men and embedded with racism, misogyny, and all the other -isms that do not serve us.

For example, in the Dewey Decimal system, which is the most commonly used library cataloguing metadata system, 200 is the number for religion, but 200-290 is all Christianity with only 290-299 reserved for the rest of the entire world’s religions. In addition, Indigenous creation stories are shelved with fairy tales. These are only two examples of problematic categorization. There is a lot of work happening in libraries to dismantle these structural biases but it is time consuming and difficult work. Going forward, inclusive metadata requires inclusion.

In contemporary metadata for digital media and environments there are many metadata schema and encoding systems to meet different needs. Some have become formal international standards, governed by standards organizations, and some are created and maintained by other organizations of all sizes. The values and intentions of those who create metadata systems are embedded in those systems.

Dublin Core is one of the most ubiquitous metadata frameworks. Its origin is from workshop to discuss metadata semantics in Dublin, Ohio, March 1995. At this event led by the National Center for Supercomputing Applications more than 50 people discussed how a core set of semantics for web-based resources would be extremely useful for categorizing the web for easier search and retrieval.

Schema.org, an ostensibly open vocabulary for web encoding, was founded in 2011 by Google, Microsoft, Yahoo and Yandex (dominant search engine in Russia) in their efforts to create an entirely machine-readable web. Now it is managed and primarily maintained by Google.

Schema.org, was created for SEO, eCommerce, crawling purposes, and is the foundation of Google’s ability to monetize search behaviour. It helps Google maintain its knowledge graph which was launched in 2012. By May 2020 it has grown to 500 billion facts on 5 billion entities with linked relationships.

image of thought bubble

Thinking Critically About Metadata

Thinking critically about metadata is similar to how you would approach any other communication or transmission of information. It is important to start with why you are doing this work, rather than just jumping to the how.

Metadata describes people and objects and other things and our biases and concerns will also end up expressed in our metadata.

For example, if you are collecting information about people there can be implications for those people. Once information is collected it can be stored, shared, deleted, and compromised. So too can your metadata. Your metadata also needs to accurately describe, so it needs to be chosen carefully.

To continue with this example, if you are collecting information about gender, the first question to ask is: is it appropriate or necessary to collect at all? If it is necessary then this is where your metadata practices will come into use.

What vocabulary or choices will you make to indicate gender? Are they ones that are respectful to the people involved?

This introduces the need for controlled vocabulary in some circumstances, and not in others. Controlled vocabulary provides a consistent way to describe data. Examples of controlled vocabularies include subject headings, thesauri, ontologies, and taxonomies. In a gender question on a form you could limit the options to the controlled vocabulary options in a dropdown menu, or you could include a free text field so people could type their own words for their gender. Both of these choices have implications for your data and metadata as well as for the people involved.

In order to be useful, most metadata usually needs to be standardized to some degree. This includes agreeing on language, spelling, date format, etc. If everyone uses a different standard, it can be very difficult to compare data to other data. A key component of metadata is the schema. Metadata schemas are the overall structure for the metadata. It describes how the metadata is set up, and usually addresses standards for common components of metadata like dates, names, and places. There are also discipline-specific schemas used to address specific elements needed by a discipline.

Always keep the users’ perspective in mind

Pick a scheme that is going to make sense for the users who will access and use your data, as well as those users managing and preserving your data.

Adopt or Adapt?

Generally you should be able to find a metadata schema and standard to suit your needs. When you find one, use it. If you find one that is close to your needs, but not quite, you can customize it by adding new fields to extend it, or shorten it by removing fields you will never need to make it suit your needs.

There are many types of metadata standards/schemas. Some are generic, while others are domain-specific. Generic ones such as Dublin Core tend to be easy to use and widely adopted, but often need to be expanded in order to cover more specific information. Domain-specific schemas have a much richer vocabulary and structure, but tend to be highly specialized and only understandable by users in that area.

By now, you must be wondering how to set up your own metadata system. Sarah Stang gives us pointers on the next page.

We acknowledge the support of the Canada Council for the Arts

Leave a Comment