Draft Best Practices for Language Tags in Bibliographic Linked Data
June 30, 2016
Language tags are used in RDF to record the language, script, region and other characteristics of text strings. Unlike MARC, which uses language codes at the level of bibliographic records, language tags are assigned at the level of the individual property, which allows a great deal of useful specificity.
The current standard for language tags is the Internet Engineering Task Force’s Request for Comment 5646 (IETF RFC 5646), dated September 2009. (In the language of the IETF, “request for comment” means a final standards documents.)
IETF RFC 5646
https://tools.ietf.org/html/bcp47
Useful explanation of how to apply language tags from W3C
https://www.w3.org/International/articles/language-tags/
IANA Language Subtag Registry (use Ctrl-F to search it)
http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
Key points from the standard
Language subtags are either two or three letters long.
Script and region subtags should be omitted when they add no distinguishing value.
Redundant and grandfathered tags should be avoided.
Best practices for bibliographic data
Capitalization
Follow the capitalization of subtags given in the IANA registry.
Extended Language Subtags
Do not use extended language (extlang) subtags; use the corresponding single language subtag instead.
Script Subtags
Do not add a script subtag for the primary script of a language. In particular, note the Suppress-Script field in the IANA registry, which indicates when a script subtag is not used.
Subtag: fr
Description: French
Suppress-Script: Latn
When text is in a script other than the primary one of a language, or a language routinely uses more than one script, add a script subtag.
Uzbek uses both Latin and Cyrillic scripts:
uz-Cyrl
uz-Latn
Region Subtags
Language tags for English as the “language of cataloging”, used with notes and other non-transcription properties, should not have a region subtag since RDA is an international cataloging code and bibliographic data are distributed worldwide.
“Includes bibliographies and index”@en
NOT
“Includes bibliographies and index”@en-US
Variant Subtags
For text romanized according to the 1997 edition of the ALA-LC tables , add the variant “alalc97”.
“Neotpravlennoe pis’mo”@ru-alalc97
There is currently no approved subtag for later ALA-LC tables.
Extension and Private-Use Subtags
Do not use extension and private-use subtags.