Wikipedia:Reliability of GNIS data

Source: Wikipedia, the free encyclopedia.

Wikipedia has thousands of "populated place" stubs which were mass-created from the United States government's Geographic Names Information System (GNIS) database. Unfortunately, a major flaw has been found in this source: GNIS has labeled many locations as "populated places" in error rather than as a locale or another more accurate category. There are countless instances of discrepancies between the GNIS and print versions of the National Gazetteer, a publication of the USGS with the same entries. This means that everything from small homesteads to railroad junctions to river crossings have been mislabeled as "populated places".

Feature classes

Geographic Names Information System is the official repository for place names in the United States, with a database of over 2 million natural and man-made features.[1] Entries are compiled from sources such as atlases, gazetteers and topo maps.

Each place is assigned an official name and a "feature class" such as Park, School, Dam, Populated Place or Locale. Locale is meant to encompass miscellaneous human-made features such as battlefields, campgrounds, farms, railroad sidings, windmills, etc. However, since the topo maps that provide the bulk of GNIS entries do not clearly distinguish between locale-type features and cities/towns/villages/hamlets, many of these were incorrectly transcribed as "populated places", a label that is supposed to apply to "... a named community with a permanent human population, usually not incorporated and with no legal boundaries, ranging from rural clustered buildings to large cities and every size in between." That's right: Many of our "populated place" articles are only labelled as such because an employee poring over a map missed a subtle difference in typeface.

It's difficult to prove that there was never a human settlement at a given location, but in many cases it's been found that the place name has only been used in conjunction with a railroad siding, ranch, windmill or other feature. For example, Haberman, NY was the location of a train station built to serve the Haberman Manufacturing Company in Queens, and the USGS employee who added the location to the database failed to recognize the subtle difference in spacing which was used to distinguish a train station from a community on the topo map. This particular error doesn't seem to have been repeated by Wikipedia since we already had a Haberman station article based on a different source, but it did appear in other GNIS-derived sources such as Google Maps.[2]

Propagation of errors

Errors quickly propagate to other online sources which rely on GNIS for location data. Our AfD for Jolly Dump, South Dakota shows that it was never anything more than a place where railroad cars were loaded and unloaded, yet a Google search brings up the "Things to do in Jolly Dump" Facebook page, a list of nearby FedEx locations, a "Populated Place Profile" with coordinates and elevation copied from GNIS, nearby hotels ("lastminute.com has a fantastic range of hotels in Jolly Dump, with everything from cheap hotels to luxurious five star accommodation available!"), a weather forecast and daylight savings time information. Although this type of coverage is sometimes presented as evidence of notability, they don't meet our "significant coverage" requirement since they're simply copied from another source by an automated program. Wikipedia also forms a link in this chain of errors: When we describe a place as an "unincorporated community", a label that is often completely unsourced, Google Maps copies it as a description of the place.

GNIS itself has been found to propagate questionable information from other sources. Most entries were taken from USGS topographic maps at the smallest scale (1:24000 or 1:25000), but we have also found entries copied from NOAA navigational charts, from Forest Service maps, from promotional maps, from Rand-McNally atlases, from books of place names, and even from a philately journal, as well as items copied from larger scale topographic maps. One can readily deduce that these entries are not reflected in the small-scale topographic maps, which already adds an element of doubt; in the case of the nautical charts, which can be verified online, we have found that the charts were sometimes misread and sometimes bore name labels on shore which could not be reconciled with other maps. Promotional maps tend to list non-notable subdivisions; other sources report 4th class post offices, which were typically just a place in a store or railroad station or even a private residence where people could come to post and pick up their mail.

Official standards

Although GNIS provides the official name of a place, the "feature class" labels do not carry the same official standing. They're simply used for "efficient data search and retrieval purposes" and "have no status as standards".[1] In fact, GNIS specifically does not involve itself in such geographic minutiae as the differences between hills and mountains, lakes and ponds or rivers and creeks.[3] As editors we need to be aware of the purpose and shortcomings of GNIS, using it as a resource where it excels (name and coordinates) while relying on other sources for notability and feature type. After all, our research and editorial discretion is what distinguishes Wikipedia from machine-generated gazeteers such as Hometown Locator.

Feature classes abandoned in 2014

In 2017 the USGS made this announcement:

Data Content: Since GNIS staff has been unable to maintain Domestic administrative names for quite some time (since October 1, 2014), these records will be archived from GNIS database and will longer be available through the GNIS search application. The following feature classes will be archived: Airport, Bridge, Building, Cemetery, Church, Dam, Forest, Harbor, Hospital, Mine, Oilfield, Park, Post Office, Reserve, School, Tower, Trail, Tunnel, and Well.

Wikipedia articles bulk-added in earlier years based upon these archived records now link to blank records on the https://edits.nationalmap.gov/apps/gaz-domestic/public/search/names interface to the "gaz-domestic" (NGNDB) database.

Reliability of locations

While the GNIS entries are generally considered accurate, pace several AFD discussions where discussion has been derailed by what turned out to be a single-digit typing error on the part of a data entry clerk, they may not be appropriate. This is because Wikipedia has different rules to the GNIS rules.

  • Per Wikipedia:WikiProject Geographical coordinates/Linear Wikipedia wants the mid-point of linear features. However, the rules for the GNIS data compilation were that the primary coördinate be the "mouth" of the feature and secondary coördinates be any point on the feature as long as it indicated what (other) map(s) the feature crossed.[4][5]
  • Per Wikipedia:WikiProject Geographical coordinates#Which coordinates to use Wikipedia wants the centres of towns and cities. In Payne's own word in the USGS report on GNIS phase 1, the selection of a coördinate for a big town or city is "subjective", and the GNIS rule was, in contrast, to pick a prominent civic feature (town hall, main intersection, main public library, and so forth) rather than attempt a geometric centre.[4]
  • While in phase 1 coördinates were read straight from the markers on the maps, in phase 2 coördinates were interpolated, using contour lines.

Further complicating this is that there were alternative forms of the database that substituted coördinate information from the National Map database.

FAQ

  • Q: Aren't government sources always reliable?
  • A: They're generally accurate, but like any reliable source they're susceptible to errors.
  • Q: What's the harm in keeping these stubs?
  • A: Wikipedia is a trusted source that many organizations rely on. For example, some of these places appear on Google Maps with descriptions such as "Jones Windmill is an unincorporated community in Smith County", even though the "unincorporated community" designation has never appeared in a reliable source - it was applied by a Wikipedia editor, based on their own interpretation of an erroneous "populated place" label. When we keep these stubs, we play an active role in creating and propagating false information.
  • Q: But it returned 6,000 Google search results - There's even a FedEx office there!
  • A: Many websites use GNIS for automated location data. When you search for real estate listings, store locations or weather reports, the name is used to mark a point on a map and return the requested information. The source isn't saying that the location is notable, probably doesn't do business there and most likely isn't even aware of its existence.
  • Q: If it's listed in GNIS, wouldn't that make it a "populated, legally recognized place" and therefore presumed notable per
    WP:GEOLAND
    ?
  • A: According to the USGS, "populated place" is a designation for places that are generally not legally defined or recognized: "An entry with Feature Class = Populated Place represents a named community with a permanent human population, usually not incorporated and with no legal boundaries, ranging from rural clustered buildings to large cities and every size in between. The boundaries of most communities classified as Populated Place are subjective and cannot be determined." Wikipedia doesn't have a specific definition of what qualifies as a "legally recognized populated place", but repeated discussions have concluded that simply being listed in a government database or appearing on a map does not meet the requirement.

Relevant AfDs

To illustrate the range of misidentified places, here is a list of AfD discussions of GNIS "populated places":

Further reading

Cleanup efforts

Books to check against

There are usually Arcadia Publishing books for a particular locality. Arcadia books are not the be-all-and-end-all, but they do point the way and are generally the results of local historians already having done for us the poring over old maps, records, and photographs. Arcadia (and other local history) books helped sort out

Robert, California (AfD discussion) and Escalle, Larkspur, California; helped identify what Salminas Resort, California (AfD discussion) actually was; and conversely made the cases stronger against the likes of Ettawa Springs, California (AfD discussion
). All of these were two-sentence GNIS-only stubs at the time of deletion nomination, all claiming "unincorporated community".

Many states have books or other collections of place names compiled separately from GNIS, generally with greater detail. These works are frequently references in GNIS itself, though the quality of the interpretation in the latter tends to be spotty.

Gazetteers
These are useful for telling whether an "unincorporated community" that is just a dot nowadays is a historical post-town/post-village or only a post office; that then might be found in local county/state histories. Lippincott's, in particular, has a uniform scheme for this. Take care about dates, of course.

References

  1. ^ a b "Principles, Policies and Procedures" (PDF). Reston, VA: United States Board on Geographic Names, Domestic Names Committee. December 2016.
  2. ^ Schultz, Isaac (15 October 2019). "The Brief, Baffling Life of an Accidental New York Neighborhood". Atlas Obscura. Retrieved 9 May 2020.
  3. ^ "How Do I?". U.S. Board on Geographic Names. United States Geological Survey. Retrieved 9 May 2020.
  4. ^ a b Payne 1983, p. 5.
  5. ^ Payne 1985, p. 7.
  6. ^ Swanson 2014, p. 195.
  7. ^ Manca 2012, pp. 161–162.
  8. ^ USBLM, p. 106.
  9. ^ Harriman v. Brown, 8th Leigh. 697.
  10. JSTOR 1109094
    .

See also