Wikipedia:Typo Team/moss

Source: Wikipedia, the free encyclopedia.

The moss project seeks to find and remove the furry green typos that have been growing on Wikipedia articles. It uses a python script named moss and written by User:Beland to automatically find misspellings, mistakes in English grammar, violations of the Wikipedia:Manual of Style, and confusing or broken wiki markup.

Dearth to tyops!

QUICK LINK TO THE BEST PAGE FOR NEW PARTICIPANTS

About misspellings

How the lists are made

The moss spell checker is run against a recent set of database dumps, which are generated on the 1st and 20th of every month (but take a few days to process). All the articles in the English Wikipedia are examined. The following are ignored:

  • Text inside references, templates, tables, quotation marks, sections like "External links" and "Works", and some other weird places.
  • Capitalized words (which are presumed to be correctly-spelled proper nouns)
  • Words that appear in titles in the English Wiktionary (which has definitions of all words in all languages, excluding proper nouns and systematic words like chemical names and large numbers)
  • Words that appear in titles in the English Wikipedia (which explains some things that don't appear in the dictionary)
  • Words that appear in titles in the Wikispecies (which has many technical words that don't appear in the dictionary or encyclopedia)

Many mistakes are not (yet) caught:

  • Improper addition of 's (possessives are not added to Wiktionary, so these are excluded systematically)
  • Incorrect capitalization
  • Incorrect multi-word phrases
  • Wrong word used in context
  • Non-English language words not tagged with {{lang}} or where an English misspelling happens to be the same as a word in another language. (These are counted as correct spellings if they are in the English Wiktionary, which lists words in all languages – only the definitions are restricted to English.)
  • Other situations listed in #False negatives below

2023 statistics

See also: Older statistics
Dump (moss version) Parse failures (articles + articles with
MOS:STRAIGHT
violations)
TOTAL (instances) A BC BW C D H HB HL L ME N P T/ T1 TE TF TS U Z
2023-01-01 (c2370a5) 161163 + 29891 1187870 10615 83981 534264 8233 0 1498 4601 110 1975 179206 1905 5 2229 41525 6115 198814 97810 1428 13556
2023-01-20 (36ce94e) 161298 + 29949 1182833 10598 83813 534411 8235 0 1525 4965 116 1958 178578 1889 6 2196 38722 6055 198441 96321 1402 13602
2023-02-01 (90a97fc) 161048 + 29944 1180485 10602 83842 534121 8245 0 1500 5011 111 1936 178163 1862 6 2183 38247 6050 197047 96542 1392 13625
2023-02-20 (f606b45) 161111 + 30009 1180176 10609 83664 534782 8249 0 1509 5224 108 1930 177709 1861 4 2071 37810 5997 196478 97105 1383 13683
2023-03-01 (75cbca7) 161224 + 30095 1179378 10613 83570 534792 8206 0 1510 5286 100 1918 177568 1860 5 2076 37445 5970 196360 97010 1382 13707
2023-03-20 (56a3811) 161344 + 30169 1177045 10566 83245 535523 8214 0 1509 5202 99 1911 176955 1861 5 2092 36281 5811 196309 96321 1361 13780
2023-04-01 (no run)
2023-04-20 (57a4619) 161810 + 30162 1178156 10577 83076 536215 8241 0 1541 5473 105 1904 175853 2043 5 2049 36561 5740 196528 96979 1370 13896
2023-05-01 (77de75d) 162001 + 30150 1171871 10418 82887 536140 8170 0 1535 4633 98 1890 173066 2028 5 2050 36282 5781 195082 96960 1361 13485
2023-05-20 (73bb66d) 162329 + 30138 1171817 10379 82480 536386 8161 0 1470 4913 88 1890 171905 2037 0 2064 36364 5817 195132 97814 1367 13550
2023-05-20 (d0a8560) 163084 + 29893 1170266 10186 81955 529811 8192 0 1473 4902 89 1879 173759 2042 1 2064 38044 5842 194194 100920 1366 13547
2023-06-01 (040dd4d) 163371 + 29818 1169150 10189 81451 529652 8200 0 1474 5163 90 1895 172815 2031 1 2052 37997 5827 193963 101375 1365 13610
2023-06-20 (50a82ce) 163664 + 29771 1169732 10189 81086 529892 8232 0 1519 5624 86 1879 171891 2050 1 2059 38342 5785 194184 101817 1364 13732
2023-07-01 (8533535) 163877 + 29747 1169420 10201 80978 529664 8242 0 1564 5806 83 1873 171484 2042 3 2061 38446 5814 193933 102073 1373 13780
2023-07-20 (9812c05) 164115 + 29742 1170482 10174 80456 529875 8255 0 1553 5943 80 1872 171720 2036 3 2057 38956 5806 194057 102367 1361 13911
2023-08-01 (7468187) 164308 + 29748 1170928 10136 80230 529739 8249 0 1549 6036 79 1873 171743 2037 5 2061 39182 5811 194411 102497 1351 13939
2023-08-20 (7170d29) 164473 + 29635 1171932 10148 80137 529804 8263 0 1556 6132 80 1874 171627 2048 8 2062 39280 5856 194769 102930 1344 14014
Dump (moss version) Parse failures (articles + articles with
MOS:STRAIGHT
violations)
TOTAL (instances) A BC BW C D H HB HL L ME N P T+gcld3_broken T/ T1 TS U Z
2023-09-01 (8c03bd1)* 164600 + 29593 1173119 10135 80154 530301 8245 0 1567 5692 87 1875 171823 2061 9 200991 2057 39595 103147 1337 14043
2023-09-20 (8c03bd1)* 164777 + 29611 1173098 10183 80123 530578 8240 0 1583 4775 85 1870 171711 2064 8 201138 2064 39874 103376 1339 14087
2023-10-01 (d531b95)* 164779 + 29586 1173193 10164 80017 530906 8238 0 1577 4719 87 1860 171300 2061 9 201083 2047 39886 103784 1328 14127
2023-10-20 (9c53721)* 164889 + 29667 1173548 10178 79977 531174 8243 138 1584 4762 87 1860 171070 2048 11 201277 2042 39910 103702 1323 14162
2023-11-01 (9c53721)* 165069 + 29668 1174710 10164 79988 531412 8252 138 1577 4738 90 1844 171440 2033 11 201449 2059 40250 103724 1338 14203
2023-11-20 (1edb851)* 165362 + 29748 1177078 10196 79995 531684 8262 138 1597 4859 93 1856 171957 2034 10 202060 2054 40847 103797 1323 14316
2023-12-01 (1edb851)* 165429 + 29788 1179043 10208 79941 531789 8294 138 1610 4950 93 1867 172253 2028 12 202513 2056 41284 104336 1310 14361
2023-12-20 (1edb851)* 165685 + 29862 1180181 10205 79762 531632 8362 138 1603 4895 103 1868 172415 2022 12 203189 2042 41499 104750 1301 14383

* Due to software issues, language detection wasn't working for this run.

2024 statistics

Dump (moss version) Parse failures (articles + articles with
MOS:STRAIGHT
violations)
TOTAL (instances) A BC BW C D H HB HL L ME N P T+gcld3_broken T/ T1 TS U Z
2024-01-01 (1edb851)* 165792 + 29766 1180781 10226 79927 531362 8352 0 1628 4917 100 1865 172474 2027 9 203478 2043 41749 104903 1301 14420
2024-01-20 (2caa23a)* 165661 + 29837 1180491 10237 79493 531501 8345 0 1624 4127 103 1858 172622 2019 9 203838 2044 41878 105071 1298 14424
2024-02-01 (3242653)* 165836 + 29834 1181230 10245 79246 531803 8337 0 1629 4120 103 1858 172799 2024 8 204049 2043 42002 105240 1287 14437
2024-02-20 (10d0c37)* 165885 + 29901 1182750 10251 78915 531861 8343 1 1630 4043 114 1849 173461 2015 10 204251 2045 42357 105827 1286 14491
2024-03-01 (9ccfa0d)* 166045 + 29975 1182428 10255 78805 531778 8362 0 1638 4041 112 1854 173370 2030 24 203994 2037 42461 105848 1299 14520
2024-03-20 (460959f)* 166141 + 30055 1185611 10292 78621 532345 8424 0 1631 4237 116 1858 173672 2045 25 204545 2049 42870 106954 1278 14649

* Due to software issues, language detection wasn't working for this run.

Typo classification legend

Reporting symbol Explanation
Parse failure Mismatched punctuation; spell checker is unsure which words to ignore, so the whole page is skipped
A mAth
BC Bad Characters (not allowed by Manual of Style)
BW Bad Words (not allowed by Manual of Style)
C Chemistry words
D DNA sequence
H HTML/XML/SGML tag
HB Known bad HTML tag, like <font>
HL Bad HTML-like linking, like <http://...>
L Probable Romanization (transLiteration)
ME Probable coMpound, English (with and without dash) - need to be added to Wiktionary
N A-Z plus numbers and hyphens
P Patterns (e.g. rhyme schemes - Beland fixes these)
T/ Suspected
MOS:SLASH
violation
T1 Edit distance 1 from common English word
TE AI thinks it's trying to be English
TF AI thinks it's trying to be a non-English language (Foreign to English Wikipedia), sorted by language (e.g. TF+el)
TS Missing or extra whitespace or dash (or new compound). Currently included if there is a period (TS+DOT), comma (TS+COMMA), or extra space (TS+EXTRA). Missing bracket (TS+BRACKET) needs code improvements to be reliable, and the remainder of TS need sorting.
U URL
Z Decimal fraction missing leading Zero
I Definitely not English (International) due to accents or mixed with punctuation (other than hyphen)
MI Probable coMpound, non-English (International) in English Wiktionary (both A-Z and non-ASCII characters, with and without dash)
ML Probable coMpound, transLiteration
MW Probable coMpound, found in non-English Wiktionary
R Regular word (A-Z only) not near a common English word
T2 Edit distance 2 from common English word
T3 Edit distance 3 from common English word
W Not in English Wiktionary, in non-English Wiktionary
  • red = Probably need to fix
  • yellow = Unsorted - need code improvements to sort into likely vs. unlikely typos or subtypes that can be usefully processed.
  • blue = Probably OK (but may need to verify)
  • bold = actively working on fixing
  • grey = no longer used

Instructions for editors

Just like a regular spell checker, sometimes a word that's highlighted is really a misspelling and should be changed, but sometimes it is a correct spelling that needs to be added to the spell checker's dictionary (which in this case is the English Wiktionary and Wikispecies). For the below lists, here's how you can help: