Wikipedia:Overcategorization
This page documents an English Wikipedia consensus. When in doubt, discuss first on the talk page. |
This page in a nutshell: Do not create categories for every single verifiable fact in articles. This only makes the category system more crowded and less easy to navigate. |
Categorization is a Wikipedia feature used to group pages for ease of navigation, and correlating similar information. However, not every
To address these concerns, this page lists types of categories that should generally be avoided. Based on existing guidelines and previous precedent at Wikipedia:Categories for discussion, such categories, if created, are likely to be deleted.
Non-defining characteristics
One of the central goals of the categorization system is to categorize articles
Categorization by non-defining characteristics should be avoided. It is sometimes difficult to know whether or not a particular characteristic is "defining" for any given topic, and there is no one definition that can apply to all situations. However, the following suggestions or rules-of-thumb may be helpful:
- a defining characteristic is one that reliable, secondary sources commonly and consistently define, in prose, the subject as having. For example: "Subject is an adjective noun ..." or "Subject, an adjective noun, ...". If such examples are common, each of adjective and noun may be deemed to be "defining" for subject.
- if the characteristic would not be appropriate to mention in the lead section of an article (determined without regard to whether it is mentioned in the lead), it is probably not defining;
- if the characteristic falls within any of the forms of overcategorization mentioned on this page, it is probably not defining.
Often, users can become confused between the standards of
It is recommended to name or rename categories to have as little vagueness as possible, discouraging non-defining articles from being added. If you have just invented a subcategory on the spot that lacks a main article, it may not be a defining attribute. Examples include:
- Physicians instead of Medically-skilled people
- Quadcopters instead of Fast-moving drones
- Fiction about robots instead of Robots in fiction
In disputed cases, the
Trivial characteristics
Avoid categorizing topics by characteristics that are unrelated or wholly peripheral to the topic's notability.
For biographical articles, it is usual to categorize by such aspects as their career, origins, and major accomplishments. In contrast, someone's tastes in food, their favorite holiday destination, or the number of tattoos they have would be considered
Also avoid categorizing people by information associated with a person's death, such as the age at which the person died, the place of the person's death, or by whether the person still had unreleased or unpublished work at the time of their death.
Subjective inclusion criteria
Adjectives which imply a subjective, vague, or
Arbitrary inclusion criteria
There is no particular reason for choosing "7%", "$30,000", or the 100th episode as cutoff points in these cases. Likewise, a school district with 3,800 students is not meaningfully different from one with 4,100 students. A better way of representing this kind of information is to make it a list, either in an existing article, or as a separate list, such as "List of school districts in (region) by size". Note that Wikipedia allows a table to be made sortable by any column.
Intersection by year or time period
Categorizing by year (or group of years, such as by decade, by century, or even by historical era) is not generally considered an #ARBITRARY division for categorization.
However, avoid creating a
Similarly, If two or more by year categories have a large
In addition, people are categorized by time period only if their activity in that time period is a #DEFINING characteristic.
For example:
- a writer who lived from 1850 to 1910 and wrote their only work in 1908 should be categorized under Category:20th-century writers. They did no notable writing in the 19th century, so should not be included in Category:19th-century writers
- an English soldier born in 1590 and notable for military service in the 1620s should not be categorized in Category:People of the Tudor period, since their defining characteristic relates to years after the Tudor period ended in 1603.
While people may be categorized by the year of their birth and year of death, do not categorize people by day or month of birth or death. (See also list of CFD examples here.)
When categorizing by time period, clearly state the inclusion criteria at the top of the category. For example, This category is for politicians who were active in the 19th century is not the same as This category is for politicians who were born in the 19th century.
Intersection by location
Categorizing by the
However, avoid sub-categorizing subjects by location if that location does not have any relevant bearing on the subjects' other characteristics. For example, quarterbacks' careers are not defined merely by the specific state that they once lived in (unless they played for a team within that state).
People should not be categorized by place of residence, if the person has never resided in that place. The place of residence of
And while the place of a person's birth may seem significant from the perspective of local studies, is rarely defining from the perspective of the individual. The place of death is not normally categorized – consider using a list if this relates to a specific place or event. If it is relevant to identify the place of burial (either from the perspective of the person or the burial place), then someone buried in a less notable cemetery, or in a place with just a few notable burials, should be recorded in a list within the article about the burial place. However, if the burial place is notable in its own right and has too many other notable people to list, then such burials may be categorized.
Narrow intersection
Categories which intersect two (or more) topics or characteristics can result in very narrow categories with few members. Such categories should only be created when both parent categories are large enough for
- For example, if an article is in category "A" and in category "B" – a category A and B does not necessarily need to be created for this article.
- Similarly, while an article in categories A, B, and C could potentially be placed in categories "A and B", "B and C", and "A and C" – creating a "triple intersection" of category A, B, and C, should generally be avoided.
Miscellaneous categories
It is not necessary to completely empty every parent category into sub-categories. So do not categorize articles into "miscellaneous", "other", "not otherwise specified" or "remainder", categories. Such articles will have little in common. If there are some articles that don't fit appropriately into any of the sub-categories, then leave the articles in the parent category.
Mostly overlapping or duplicative
If a category is mostly duplicative or overlapping with another category (such as the coverage of "crime" and "crime history"), or if two categories' names are similar enough to have nearly identical inclusion criteria (such as "denial", and "skepticism"), it is generally better to merge the subjects to a single category, and re-categorize any articles or categories which might no longer meet the criteria of the unified target category.
It might also be appropriate to create lists to provide clarity and to detail the each of the instances.
Avoid categorizing by a subject's name when it is a non-defining characteristic of the subject, or by characteristics of the name rather than the subject itself.
For example, a category for unrelated people who happen to be named "Jackson" would be inappropriate. However, categorization may be appropriate if the categorized subjects are directly-related. For example, a category grouping articles directly-related to a specific Jackson family, such as Category:Jackson family (show business).
When considering grouping subjects that share a name, a disambiguation page might be a possible alternative solution.
By being associated with
The problem with saying that something is "associated" with something else, is that it can be a #SUBJECTIVE and vague determination. Determining what degree or nature of "association" with a particular subject is necessary to qualify for inclusion in such a category can also be subjective and vague, and any threshold set may fail #ARBITRARY.
However, it may be appropriate to have categories whose title clearly conveys a specific and defined relationship to a specific subject, such as Category:Obama family or Category:Obama administration personnel.
By opinion or preference of an issue or topic
Avoid categorizing people by their personal opinions, even if a reliable source can be found for the opinions. This includes supporters or critics of an issue, personal preferences (such as liking or disliking
Please note, however, the distinction between holding an opinion and being an
Potential candidates and nominees
Award recipients
A category of award recipients should exist only if receiving the award is a #DEFINING characteristic for the large majority of its notable recipients. And a recipient of an award should be added to a category of award recipients only if receiving the award is a defining characteristic of the recipient.
Per
Published list
Books, magazines, websites, and other such publications, regularly publish lists of the "top 10" (or some other number) in any particular field. Such lists tend to be #SUBJECTIVE and may be somewhat arbitrary. Some particularly well-known and unique lists such as the Billboard charts may constitute exceptions, although creating categories for them may risk violating the publisher's copyright or trademark.
Venues by event
Avoid categorizing locations by the events or event types that have been held there, such as arenas that have hosted specific sports events or concerts, convention centers that have hosted specific conventions or meetings, or cities featured in specific television shows that film at multiple locations.
Likewise, avoid categorizing events by their hosting locations. Many notable locations (e.g. Madison Square Garden) have hosted so many sports events and conventions over time that categories listing all such events would not be readable.
However, categories that indicate how a specific facility is regularly used in a specific and notable way for some or all of the year (such as Category:National Basketball Association venues) may sometimes be appropriate.
Performers by performance
Avoid categorizing performers by their performances. Examples of "performers" include (but are not limited to)
This includes categorizing a production by performers' performances. For example, just as we shouldn't categorize a performer by action or appearance, we shouldn't categorize a production by a performer's action or appearance in that production.
Performers by action or appearance
Avoid categorizing performers by some action they may have performed (such as a "
Performers by role or composition
- Performers who have portrayed <character name>
- Performers who have portrayed <a type of character>
- Performers who have performed <a specific work>
Avoid categories which categorize performers by their portrayal of a role. This includes:
- Portraying a specific Hamlet or Batman), including characters based upon real people from history or legend (such as King Arthur or Steve Jobs), and also non-human characters (such as Lassie or Kermit the Frog)
- Portraying, tribute acts.)
- Portraying a "type" of character (such as dead, female, gay, homeless, queen, old, president, religious, Scottish, wealthy, etc.) This also includes archetypes, stereotypes, and stock characters.
- Performing a specific work (such as To be or not to be" from Hamlet (the play), "Why did the chicken cross the road?" (a joke), etc.).
This also includes
Similarly, avoid categorizing artists based on producers, film directors or other artists they have worked with (such as "George Martin musicians" or "Steven Spielberg actors"). Performers are defined by their body of work, not by the people they have #ASSOCIATED with professionally. For example, Tom Hanks is distinguished by his performances as an actor, not by the fact that he has appeared in Steven Spielberg's films.
Performers by production or performance venue
- Performers who have performed at <location>
- Performers who have performed on <production>
Avoid categorizing performers by an appearance at an event or other performance venue. This also includes categorization by performance—even for permanent or recurring roles—in any specific radio, television, film, or theatrical production (such as The Jack Benny Program, M*A*S*H, Star Wars, or The Phantom of the Opera).
Note also that performers should not be categorized into a general category which groups topics about a particular performance venue or production (e.g. Category:Star Trek), when the specific performance category would be deleted (e.g. Category:Star Trek script writers).
Role or composition by performer
- <Characters> who have been portrayed by a specific performer
- <Types of characters> which have been portrayed by a specific performer
- <Works> which have been portrayed by a specific performer
Avoid categorizing characters or specific works by the performers who have portrayed them or appeared in them. A typical film or television series has many actors in various roles, so categorizing by actor results in needless clutter. Similarly, some roles, particularly animated ones like Woody Woodpecker and historical/mythological figures like Hercules, have been performed by multiple actors, and being performed by a particular actor is seldom a defining trait for such roles.
Notes
- ^ in declarative statements, rather than table or list form
- ^ Per this RfC
See also
- m:Help:Sorting and Help:Sortable tables
- feature requestswhich seek to be an alternative way to address overcategorization.