Wikipedia:Geographical names

Source: Wikipedia, the free encyclopedia.

Wikipedia has over 700,000 articles about geographical entities such as villages, districts, lakes, rivers, mountains and protected areas. Their infoboxes vary considerably in layout and the information they support. The article title holds the common English form but the article may also give the common names used in the local language(s), official names, former names, other names and nicknames. Non-Latin script may be followed by a romanized or phonetic form.

All non-English forms of a name should be marked up so they are rendered correctly by a screen reader. This essay proposes standard ways to gather, validate and format the different names in the article text and in infoboxes, and outlines a migration approach. The core proposal is to adapt all the geographical entity infoboxes to use a standard child template, {{infobox geonames}}, which will undertake validation and formatting of the names.

Current situation

There are several hundred geo-infoboxes used in over 700,000 articles about geographical entities. As of February 2022 {{Infobox settlement}} was used in over 543,000 articles, {{Infobox river}} in 28,870, {{Infobox mountain}} in 26,448, {{Infobox building}} in 24,502, and so on down to a long tail of infoboxes like {{Infobox Tibetan Buddhist monastery}} (286 articles) or {{Infobox dive site}} (18 articles). As shown in #Sample infobox templates (below) the infoboxes are very inconsistent in the name-related parameters they accept, and as shown in #Current usage examples (below) they are also very inconsistent in the format they render.

Non-English names are common even in countries where English is the national language. A place in California might have former names in Spanish and indigenous languages. A place in England may have former names in Common Brittonic or Old English. In France, there may be variants of local names in Breton, Occitan or Corsican. India has a wealth of languages and scripts. Due to lack of consistent support for non-English names, editors may struggle with the default formatting, as with

  • |native_name = {{nobold|四国}}
  • |native_name = {{lang|tr|Anadolu Selçuklu Devleti}} {{lang|fa|سلجوقیان روم}} Saljūqiyān-i Rūm

Introducing standard validation and formatting for names in all geo-infoboxes will give a more consistent reader experience, reduce accessibility problems with screen readers, and make life easier for editors.

Proposed guidelines

1. Articles about geographical entities may provide extensive information about names, including the different types of name, etymology, pronunciation, non-Latin script, romanization and so on. However, the information does not have to all be crammed into the infobox and the lead sentence. As illustrated in the article on the Nile, it may be relegated to a section on naming.
2. Any non-English name in Latin script should be rendered in italics with proper HTML mark-up for a screen reader, and the language should be rendered before the name,
  • If it is to be rendered in the native language by a screen reader and/or
  • If readers will want to know what language the name is in

Example: German: MünchenBavarian: Minga

3. If a non-English name in Latin script may be rendered in English pronunciation, and readers will not be particularly interested in the language, the language need not be identified.

Example: EboracumEoforwicJorvikEverwic These former names for York are from obsolete languages with uncertain pronunciation.

4. Names in non-Latin script may be followed by an italicized romanized or phonetic form if relevant, and the language should be identified.

Example: Russian: Москва [Moskva]

5. A list of names of the same type in an infobox should be formatted as a horizontal list if it will fit on one line. Otherwise it should be formatted as a simple vertical list. Thus:
    French: BruxellesDutch: Brussel
But
    Brussels-Capital Region
    French: Région de Bruxelles-Capitale
    Dutch: Brussels Hoofdstedelijk Gewest

Identifying languages

Non-English names are often formatted using {{lang}} or {{native name}}. However, both these templates require a 2- or 3-digit ISO code. Many editors do not know what these codes are, and many former place names are in languages that do not have an ISO code. Thus

Mouheneener
language. Sometimes the language is unknown. An explorer may have recorded what the "natives" called the place, but failed to record the natives' ethnic group.

The solution is to enhance the {{lang}} and {{native name}} templates, or create a new {{lang2}} template to allow the full names of languages as an alternative to the ISO code. Thus {{lang2|German|München}} and {{lang2|de|München}} should both be accepted and render the same result. {{infobox geonames}} would implement the same logic.

  • If a language is not found in the list of ISO codes that gives corresponding language names, check for it in a list of language names that gives corresponding ISO codes
  • The second list may include languages such as Chirr, Phuthi or Erzgebirgisch with ISO code "mis", meaning they have no ISO code
  • Both lists will also include the name of the Wikipedia article for the language, for use as a link
  • If the language is not known, use the language code "und"
  • Use the ISO code for HTML tagging and the corresponding language name for display purposes
  • Flag articles with unrecognized languages for manual follow-up

The enhanced or new template should also accept and display a romanised or phonetic version of the name. E.g.

{{lang2|ar|بَغْدَاد|baɣˈdaːd}} or {{lang2|Arabic|بَغْدَاد|baɣˈdaːd}}

would render

Arabic: بَغْدَاد [baɣˈdaːd]with the non-Latin name tagged with the html lang=ar.

Standard infobox parameters

See #Sample infobox templates (below) for parameters used in different infoboxes. Assuming the parameter names used in {{infobox settlement}} will prevail, and that official names, native names and other names can all have languages and may all have Romanized forms, the parameters could be

Alternative 1: Explicit

|name                =
|official_name       =
|official_name_lang  =     
|official_name_roman =     
<!--           Use |official_name2 = |official_name_lang2 = |official_name_roman2 = etc. for additional names, up to five -->
|native_name         =     
|native_name_lang    =     
|native_name_roman   =     
<!--           Use |native_name2 = |native_name_lang2 = |native_name_roman2 = etc. for additional names, up to five -->
|former_name         =
|former_name_lang    =     
|former_name_roman   =     
<!--           Use |former_name2 = |former_name_lang2 = |former_name_roman2 = etc. for additional names, up to five -->
|other_name          =
|other_name_lang     =     
|other_name_roman    =     
<!--           Use |other_name2 = |other_name_lang2 = |other_name_roman2 = etc. for additional names, up to five -->
|nickname            =

Alternative 2: Templated

|name                =    
|official_name       =    <!-- {{lang2|<language>|<name>|<roman form>}} or 
                               {{lang2 list |lang1=<language>|name1=<name> |roman1=<roman form> |lang2=<language>|name2=<name> |roman2=<roman form> ... }} -->
|native_name         =    <!-- {{lang2|<language>|<name>|<roman form>}} or 
                               {{lang2 list |lang1=<language>|name1=<name> |roman1=<roman form> |lang2=<language>|name2=<name> |roman2=<roman form> ... }} -->
|former_name         =    <!-- {{lang2=<language>|<name>|<roman form>}} or 
                               {{lang2 list |lang1=<language>|name1=<name> |roman1=<roman form> |lang2=<language>|name2=<name> |roman2=<roman form> ... }} -->
|other_name          =    <!-- {{lang2|<language>|<name>|<roman form>}} or  
                               {{lang2 list |lang1=<language>|name1=<name> |roman1=<roman form> |lang2=<language>|name2=<name> |roman2=<roman form> ... }} -->
|nickname            =

Comparison of alternatives

In both alternatives the editor must enter the same information:

|official_name = name
|official_name_lang = language
|official_name_roman = roman form

or

|official_name = {{lang2|language | name | roman form}}

The first format is probably slightly easier for the novice editors, who may be put off by the curly brackets and vertical bars in the second form. Articles about major geographical entities like Cairo, Brahmaputra River or Mount Everest attract seasoned editors who can deal with formatting issues. But the majority of geographical articles are stubs like Orto, Corse-du-Sud, Maquan River or Klinkit Creek Peak, where the editors may find even a simple infobox a bit of a challenge.

The first form also makes it easier to ensure that languages are rendered correctly, since the {{infobox geonames}} template can see and validate all the parameters, for example checking for unusual characters in a name such as ":" or "(" that may indicate attempts to pre-format them. With the second approach {{infobox geonames}} can only see the result rendered by {{lang2}}, and cannot be sure that only the correct formatting template has been used. This essay therefore recommends the first, explicit alternative.

Rendered layout

See #Current usage examples for the various ways in which geographical infoboxes render name information. There is no reason why they should be so inconsistent. The obvious way to standardize collection, validation and rendering of name data is to use a child infobox that can be shared by all the geographical entity infoboxes. To demonstrate, {{Infobox geonames parent}} embeds child {{infobox geonames}}, which formats the names. This is just a crude mock-up of the alternative 2 format, with no real validation and formatting, but illustrates the concept. The code at the left (or below on a phone) renders the result at the right.

Article name
Native name or names
OfficialList of official names
FormerlyFormer names
VariantsOther names
NicknameNicknames
Other dataSpecialized information about the geographical entity
{{Infobox geonames parent
  |name=Article name
  |native_name = Native name or names
  |official_name = List of official names
  |former_name= Former names
  |other_name= Other names
  |nickname= Nicknames  
  |image=File:Przełęcz Karkonoska - panorama.jpg
  |otherdata=Specialized information about the geographical entity
}}

This is a rough first cut. The format rendered by {{infobox geonames}} should be carefully reviewed and adjusted. Logic must be added to validate the languages and ensure that names, languages, non-Latin scripts and lists of names are formatted correctly, and titles must be pluralized as needed. But once this is done, the standard validations and formatting will then be picked up automatically by all geo-infoboxes that embed {{infobox geonames}}.

General migration approach

{{lang}}, {{native name}} etc. should be enhanced to support language names as an alternative to language codes, and to support romanized or phonetic forms. This can be done at any time, and will have no impact on existing articles.

Migration to a more standard way of collecting, validating and formatting names can be done infobox by infobox.

  • Every effort should be made to minimize disruption.
  • A geo-infobox change that introduces red error messages in the text of many articles where there were no error messages before is unacceptable
  • The preferred approach is to flag issues using a hidden tracking category, and allow gnomes to work through the flagged formatting replacing it by the new standard. Once almost all the non-standard formatting has been eliminated, the geo-infobox may start to render red error messages.

Two types of change may be introduced independently:

  1. The geo-infobox is changed to use the new {{infobox geonames}}
  2. The geo-infobox is changed to eliminate non-standard parameter names

Converting to {{infobox geonames}}

  • The first step for each geo-infobox is to obtain agreement on its talk page and associated project talk page to migrate to the standard {{infobox geonames}}
  • A version of the geo-infobox using {{infobox geonames}} is prepared and carefully tested
  • This version will use the standard parameter names, but will also accept variants to provide backward compatibility
  • Assuming no problems, the standardized geo-infobox template will be cut into production, passing "mode=transition" to {{infobox geonames}}. In this mode, {{infobox geonames}} will populate tracking categories with error messages, but will attempt to format the data provided, and will not generate red error messages.
  • Once the tracking categories have mostly been cleared, the geo-infobox will start passing "mode=strict" to {{infobox geonames}}. In this mode, {{infobox geonames}} will generate red error messages

Standardizing parameter names

In the long run, it will be easier for editors if all geo-infoboxes use the same names for the same parameters.

  • The geo-infobox passes {{infobox geonames}} parameters with the standard names, but also passes the old parameter names:
    |other_name={{{other_name|{{{name_other|}}} }}}
  • The documentation is changed to show both parameter names:
    |other_name=      <!-- or |name_other = -->
  • At some point, the old name is deprecated, with articles that use it put into maintenance categories
  • Gnomes work through changing to the standard parameter names
  • Eventually the old parameter names are dropped, and flagged as errors when the article is in edit mode

Providing support for the standard parameter names is important. Removing variant usage is less important, and should not be allowed to get in the way of the main thrust to standardize name validation and formatting.

Appendices

Sample infobox templates

See Category:Place infobox templates for the complete set.

Type Template Example Count[a] Parameters
Divisions
Continent {{Infobox continent}} Africa 56 title
Island {{Infobox islands}} Borneo 8,317 name, native_name (or local_name), native_name_link[b], native_name_lang, sobriquet (or nickname), etymology
Country {{Infobox country}} Albania 5,769 name, conventional_long_name, common_name, native_name, linking_name
Settlement {{Infobox settlement}} Brussels 543,470 name, official_name, other_name, native_name, native_name_lang, etymology, nickname
Structures
Airport {{Infobox airport}} Frankfurt Airport 15,543 name, nativename, nativename-a (non-western characters), nativename-r (Romanized)
Amusement park {{Infobox amusement park}} Epcot 1,027 name, previous_names
Ancient site {{Infobox ancient site}} Nineveh 4,653 name, native_name, native_name_lang, alternate_name
Bridge {{Infobox bridge}} Band-e Kaisar 5,684 name, native_name, native_name_lang, official_name, other_name, named_for
Building {{Infobox building}} Palace of Versailles 24,502 name, native_name, native_name_lang, former_names, alternate_names, etymology
Cemetery {{Infobox cemetery}} Glasnevin Cemetery 1,416 name, native_name, native_name_lang
Church {{Infobox church}} Durham Cathedral 13,394 name, fullname, other name, native_name, native_name_lang, former name
Dam {{Infobox dam}} Red Bluff Diversion Dam 4,159 name, name_official
Dzong {{Infobox Tibetan Buddhist monastery}} Potala Palace 286 name + language specifics[c]
Hindu temple {{Infobox Hindu temple}}
Meenakshi Temple, Madurai
2,274 name, native_name, native_name_lang
Historic site {{Infobox historic site}} Diocletian's Palace 10,063 name, native_name, native_language, native_name2, native_language2, native_name3, native_language3, other_name, etymology
Power station {{Infobox power station}} Ekibastuz GRES-2 Power Station 2,852 name, name_official
Natural geography
Mountain {{Infobox mountain}} Central Eastern Alps 26,448 name, other_name, etymology, nickname, native_name, native_name_lang, translation, pronunciation, authority
Body of water {{Infobox body of water}} Lake Sevan 17,050 name, native_name, other_name
River {{Infobox river}} Nile 28,870 name, native_name, name_other, name_etymology, nickname
Canal {{Infobox canal}} Royal Canal 584 name
Glacier {{Infobox glacier}} Vatnajökull 1,622 name, other_name
Landform {{Infobox landform}} Pongo de Manseriche 1,147 name, other_name
Mountain pass {{Infobox mountain pass}} Khunjerab Pass 1,303 name, other_name
Stratigraphic unit {{Infobox rockunit}} Burgess Shale 6326 name
Valley {{Infobox valley}} Alay Valley 737 name, other_name, native_name, translation
Waterfall {{Infobox waterfall}} Angel Falls 1,345 name
Ecology, parks etc.
Ecoregion {{Infobox ecoregion}} Alto Paraná Atlantic forests 919 name
Park {{Infobox park}} Park Güell 6,693 name, alt_name, native_name, native_name_lang
Protected area {{Infobox protected area}} Gran Paradiso National Park 13,312 name, alt_name
Site of Special Scientific Interest {{Infobox Site of Special Scientific Interest}} Lundy 2,052 name
Trail {{
Infobox hiking trail
}}
The Ridgeway 1,164 name
World Heritage Site {{Infobox UNESCO World Heritage Site}} Park Güell 1,587 WHS, Official_name
Zoo {{Infobox zoo}} Baghdad Zoo 1,229 name

Miscellaneous not reviewed:

Not checked:

Current usage examples

The examples below are taken from articles as of February 2022, with the infoboxes edited to remove information other than names, and to show a standard image. They illustrate the varied visual styles and approaches to presenting names, partly imposed by the infobox templates, and partly chosen by the editors.

Island

Borneo
Kalimantan

Borneo (

Java, west of Sulawesi, and east of Sumatra
.

Country

Republic of Albania
Republika e Shqipërisë (Albanian)
Location of Albania

Albania (

land borders with Montenegro to the northwest, Kosovo to the northeast, North Macedonia to the east and Greece to the south. Tirana is its capital and largest city, followed by Durrës, Vlorë and Shkodër
.

Settlement

Brussels
  • Brussels-Capital Region
  • Région de Bruxelles-Capitale (French)
  • Brussels Hoofdstedelijk Gewest (Dutch)
Nicknames: 
Capital of Europe, Comic City

Brussels (

GDP per capita. The five times larger metropolitan area of Brussels comprises over 2.5 million people, which makes it the largest in Belgium. It is also part of a large conurbation extending towards Ghent, Antwerp, Leuven and Walloon Brabant
, home to over 5 million people.

Airport

Frankfurt Airport

Flughafen Frankfurt Main
Summary

Frankfurt Airport (

. The airport covers an area of 2,300 hectares (5,683 acres) of land and features two passenger terminals with capacity for approximately 65 million passengers per year; four runways; and extensive logistics and maintenance facilities.

Ancient site

Nineveh
نَيْنَوَىٰ

Nineveh (

romanizedNīnwē; Akkadian: 𒌷𒉌𒉡𒀀 URUNI.NU.A Ninua) was an ancient Assyrian city of Upper Mesopotamia, located on the outskirts of Mosul in modern-day northern Iraq. It is located on the eastern bank of the Tigris River and was the capital and largest city of the Neo-Assyrian Empire, as well as the largest city in the world for several decades. Today, it is a common name for the half of Mosul that lies on the eastern bank of the Tigris, and the country's Nineveh Governorate
takes its name from it.

Bridge

Band-e Kaisar

بند قیصر,
Other name(s)Pol-e Kaisar, Bridge of Valerian, Shadirwan

The Band-e Kaisar (

Persian
territory. Its dual-purpose design exerted a profound influence on Iranian civil engineering and was instrumental in developing Sassanid water management techniques.

Building

Historic site

Historical Complex of Split with the Palace of Diocletian
Native name
Croatian: Povijesna jezgra grada Splita s Dioklecijanovom palačom

Diocletian's Palace (Croatian: Dioklecijanova palača, pronounced [diɔklɛt͡sijǎːnɔʋa pǎlat͡ʃa]) is an ancient palace built for the Roman emperor Diocletian at the turn of the fourth century AD, which today forms about half the old town of Split, Croatia. While it is referred to as a "palace" because of its intended use as the retirement residence of Diocletian, the term can be misleading as the structure is massive and more resembles a large fortress: about half of it was for Diocletian's personal use, and the rest housed the military garrison.

Mountain

Central Eastern Alps

The Central Eastern Alps (German: Zentralalpen or Zentrale Ostalpen), also referred to as Austrian Central Alps (German: Österreichische Zentralalpen) or just Central Alps, comprise the main chain of the Eastern Alps in Austria and the adjacent regions of Switzerland, Liechtenstein, Italy and Slovenia. South them is the Southern Limestone Alps.

Body of water

Lake Sevan
Սևանա լիճ (Armenian)

Lake Sevan (

Hrazdan River
, while the remaining 90% evaporates.

River

Valley

Alay Valley
Naming
Native nameАлай өрөөнү (Kyrgyz)

The Alay Valley (

Irkestam
border crossing to China.

Notes

  1. ^ Transclusion count as of February 2022
  2. ^ link to the article about the language used for the native name
  3. ^ Infobox Tibetan Buddhist monastery collects the following parameters for native name: |t=ཇོ་ཁང་ |w=Jo-khang |to = {{{to}}} |ipa={{IPA|{{{ipa}}}}} |z={{{z}}} |thdl=thdl |e={{{e}}} |tc=大昭寺 |s={{{s}}} |p=Dàzhāosì