Common Metadata Languages

Machine Readable Cataloguing (MARC)

The most widely used metadata language in the library community far predates XML and RDF technologies and indeed modern metadata formats in general. MAchine Readable Cataloging (MARC) first arose in 1968 out of a pilot project at the Library of Congress to experiment with distributing the information on catalog cards to libraries in machine readable form. Since then, it has become entrenched as the metadata format underlying online library catalogs and as the way libraries share records with each other. OCLC’s WorldCat database, used for sharing records among libraries and allowing users to find holdings across multiple institutions, holds nearly 380 million MARC bibliographic records as of July 2016. MARC is standardized as ANSI/NISO Z39.2 Information Interchange Format and ISO 2709 Information and documentation—Format for information exchange.

Dublin Core

The Dublin Core Metadata Element Set (DCMES) grew out of a 1995 meeting in Dublin, Ohio, that was focused on metadata for networked electronic information. Attendees were tasked with identifying a core set of features common to most types of digital information. In this first meeting, 13 core elements were defined, which soon grew to the 15 elements known as DCMES today. These are: contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title, and type. This set, also known as “simple Dublin Core,” or Dublin Core (DC), is standardized as ISO 15836 and ANSI/NISO Z39.85, both called The Dublin Core metadata element set. is an initiative launched in 2011 by the world’s major search engines to create and support a common set of schemas for structured data markup on the Internet, on web pages, in email messages and on web objects. This structured data helps search engines understand the published content, and is the common tool used for SEO (search engine optimization). The schemas are a set of ‘types’, each associated with a set of properties. The types are arranged in a hierarchy. The vocabulary currently consists of 841 Types, 1369 Properties, and 352 Enumeration values. You use the vocabulary along with the Microdata, RDFa, or JSON-LD formats to add information to your Web content. Although specific elements, such as dates, use standardized forms metadata is itself not standardized by any of the international standardization bodies and is essentially a privately managed set of guidelines. A note on the name – the term schema pre-dates the development of and is the general term for any metadata schema, so if you see the word on its own it is not necessarily referring to

Exchangeable Image File Format (EXIF)

The EXchangeable Image File Format (Exif) is somewhat of a misnomer, as it is not a file format, but rather a tag structure for embedded metadata within digital image files. It emerged from the Japanese digital camera industry and is currently supported by nearly all digital camera and smartphone manufacturers. Various software packages for editing and sharing images, including Photoshop and Flickr, support Exif,. However, many social media sites, such as Instagram and Facebook, strip Exif metadata from shared images, though they often read and store the geolocation information from the Exif metadata before doing so. The TIFF and JPEG file formats support embedded Exif, but JPEG2000, PNG, and GIF do not. Exif stores mostly technical metadata about the image, but there is also a supplementary section for metadata about embedded audio. The Exif specification includes metadata elements such as pixel dimensions, date and time taken, ISO setting, aperture, white balance, and information on the lens used.

Visual Resources Association Core (VRA CORE)

The visual resources community supports disciplines, such as art history, that rely upon discovering and using reproductions of works of art and architecture. A professional organization for that community, the Visual Resources Association (VRA), has developed the VRA Core metadata vocabulary for recording information about these works of art and specific representations of them. The VRA Core is represented as two different XML Schemas, a restricted version that strictly enforces the use of certain pre-specified values in the “type” attribute on many VRA Core XML elements, and an unrestricted version that allows free text to be used for these values.

Text Encoding Initiative (TEI)

The Text Encoding Initiative (TEI) is what you’ll commonly see used in digital humanities work. It is a markup language for machine-readable texts of all types, including prose, verse, texts, transcripts of spoken-word performances, dictionaries, and manuscripts and other primary sources. It is an extraordinarily large markup language, and as such is not designed to be used as a whole in any given implementation. Instead, its source XML definitions can be rendered as XML DTDs, XML schemas, or RELAX NG schemas, by picking which modules (groups of related elements intended for a specific purpose) should be included for a specific project.

Music Encoding Initiative (MEI)

The Music Encoding Initiative (MEI) is an encoding format for musical notation closely based on the TEI. It is an XML language, and borrows from the TEI community its organization into modules, methods for customization for specific projects, and use of ODD.

MEI supports several of the most commonly used forms of musical notation, including common Western music notation (the form with which most readers will be familiar), mensural and neumatic notation, and tablature for guitar and lute. In addition to a header for metadata about the musical score being described, MEI has features for all symbols needed in each of the supported notation formats, including elements for scores, parts, staffs, key signatures, clefs, measures, bar lines, notes, and chords. Like TEI, MEI also supports the encoding of analytical and editorial structures.

We acknowledge the support of the Canada Council for the Arts

Leave a Comment