Metadata is only useful if it is understandable to the software applications and people that use it. To aid in this understanding, organizations frequently predefine metadata sets to meet certain needs, and publish these definitions for system designers (and sometimes end users) to consult. Metadata vocabularies are known as schemas, element sets, or sometimes formats. A schema defines the elements that make up a valid item in that format, along with the attributes each element can take, in what order they can appear, and how many times they can appear.
Schemas can be formally standardized through organizations such as the International Organization for Standardization (ISO), the National Information Standards Organization (NISO), or the World Wide Web Consortium (W3C). Industry- or community- leading bodies such as the Library of Congress can also often serve as organizational homes and maintenance organizations for XML-based metadata standards, endorsing them for use in their target communities.
When you are doing metadata work for internal management you will only need to use metadata that serves your purpose, so you don’t need to use complete complex systems found in the documentation for standardized metadata. However for interoperability, or usefulness outside your organization, it is a good idea to look at these standards and use the parts of them that are relevant to you so that your metadata is consistent with others in your field or industry.
If you are submitting your metadata to a repository or larger system they will have a standard format that they use that you will be required to follow so there may be some re-formatting of your metadata required. Generally they will provide a template.
In addition to standardizing syntax, metadata designers often wish to standardize metadata through control of the actual values used. One way in which this is done is through the use of controlled vocabularies. A controlled vocabulary is a predetermined list of terms on a certain topic or of a certain type. These lists typically identify one preferred word or phrase for a given concept, and sometimes provide mappings from other terms for the concept to the preferred one. They also frequently define (often hierarchical) relationships among terms.
Controlled vocabularies can be exceedingly simple, using only a dozen or so terms, or much more robust, including as many as tens of thousands. Examples of controlled vocabularies include Internet MIME types, Spotify genres, the Book Industry Standards and Communications (BISAC) vocabulary, and Library of Congress Subject Headings (LCSH). Those developed and maintained by the cultural heritage community, such as LCSH, tend to be among the most robust controlled vocabularies in common use.
Controlled vocabularies enforce consistency, which is useful for grouping similar objects and reduces incorrect labelling. Controlled vocabularies can be challenging to keep up-to-date when cultural uses of words change and some older controlled vocabulary taxonomies (sets of terms and their hierarchy) reflect outdated values. Updating an out-of-date controlled vocabulary has an impact on discoverability and information organization so needs to be done carefully.
A second method for standardizing the values that appear in metadata is the use of content standards, which are sets of guidelines that dictate how textual values in metadata should be structured. They are common as formal guidelines documents in the cultural heritage community. In other communities, where they can be known as style guides, they tend to be briefer and more informal.
Content standards typically cover topics such as where the information to be recorded should be found; what punctuation, capitalization, and abbreviations should be used; and how to make decisions about which information to record. They sometimes define and dictate the use of small controlled vocabularies as well. Examples of content standards include the Wikipedia Manual of Style guidelines for Infoboxes.