Wikipedia:Version 1.0 Editorial Team/Article selection
Wikipedia 1.0 — (talk) FAQ — To do |
---|
Release version tools Guide — (talk) — (stats) |
Article selection process (talk) Version 0.8 bot selection Version 0.8 feedback |
IRC channel ( IRC )
|
|
Release criteria |
Review team (FAQ) |
Version 0.8 release (manual selection) (t) |
"Selection" project (Talk) |
|
TORRENT (Talk) |
"Selection" project for kids ((t)) |
talk )
|
Pushing to 1.0 (talk) |
|
Static content subcom. |
The process for article selection for
The
The bot computes a numeric score for each article. Articles that have a score over a certain threshold (which will change from one release to the next) will be included in the release version. The threshold for Version 0.8 has been set at 1240. This page describes the algorithm that the selection bot uses to assign scores.
Older tests are described at Wikipedia:Version 1.0 Editorial Team/Selection trials.
Selection technique
The bot generates a score for each article in each project that has assessed the article. The overall article score consists of two components, the importance score and the quality score:
Overall article score = Importance_score + Quality score.
An article will have one overall score for each project that assesses the article. The highest score given to an article by any project will determine whether the article is included in a release version.
Importance score
In most cases, the overall importance score is obtained by adding points based on the importance assigned by the WikiProject and points based on external interest in the article:
Importance score = Assessed_importance_points + External_interest_points.
Some WikiProjects, such as
Importance_score = External_interest_points * (4/3).
This formula is also used for articles whose importance is marked as 'Unknown-Class' or 'Unassessed-Class'.
Assessed importance points
The assessed importance of an article is used to assign points based on the WikiProject itself and the importance rating assigned to the article:
Assessed_importance points = Base_importance_points + WikiProject_scope_points.
The base importance points are taken from the following table.
Rating | Top | High | Mid | Low |
Points | 400 | 300 | 200 | 100 |
If the importance is not assessed, the 4/3 formula is used, and the base importance points are not used in the final score calculation. In this case, the Wikiproject scope points also do not count towards the final score.
WikiProject scope points
WikiProject scope points are used to compensate for the difference in scope between WikiProjects. For example, the Geography WikiProject has a very broad scope, while the Åland WikiProject has a more narrow scope.
The WikiProject scope points are typically based on the external interest points, defined below, for the Top-Importance article that best represents the scope of the project. For example, Wikipedia:WikiProject Chicago is best represented by the article Chicago.
Some projects cover several subjects, either explicitly (
In other cases, there is no single article that adequately represents the entire project, or the "representative" article is of much lower score than major topics within that subject. In such cases, a selection two or three Top-Importance articles that lie at the core of the subject matter may be used. For example, the articles Jimi Hendrix and Eric Clapton were selected for Wikipedia:WikiProject Guitarists.
To compute the WikiProject score when multiple articles are considered, the page view counts, incoming page links, and interwiki links for all the articles are totaled, and then used as if they were the data for a single article in the formula for external interest points given below. This results in a raw score. The distribution of raw scores for Wikipedia 0.7 is shown in the following table.
Percentile | 10% below | 25% below | 50% below | 75% below | 90% below |
Raw score | 785 | 900 | 1025 | 1130 | 1200 |
The Wikiproject scope points are obtained by subtracting 1000 from the raw score and dividing the resulting number by 2.
Task forces and child projects
Many WikiProjects, such as
External interest points
These points measure the external interest in an article, independent of the ratings assigned by the WikiProject. The points are formed by combining the number of page views (hitcount) as well as the number of incoming internal links and the number of incoming interwiki links from Wikipedias in other languages:
External interest points = 50 * log10(hitcount) + 100 * log10(internal links) + 250 * log10(interwiki links)
The counts of page views, pagelinks, and interwiki links for all pages that redirect to a given article are added to the article's own counts before the external interest points are computed.
The hitcount data is obtained from http://dammit.lt/wikistats/ (this is the same data used by http://stats.grok.se). From this data, a list of daily hitcounts over a period of several weeks is formed. For each article, the highest 20 percent and lowest 20 percent of these daily hitcounts are discarded, and the remaining data points are averaged (see truncated mean). The resulting statistic is used as a measure of the typical daily page views of the article. The hit statistics displayed in the selection bot stats on the toolserver are actually monthly hitcounts.
Quality score
The quality score for an article in a project is based on the quality rating assigned by the wikiproject.
Rating | FA | FL | A | GA | B | C | Start | Other |
Points | 500 | 500 | 400 | 400 | 300 | 225 | 150 | 0 |