Wikipedia:Wikipedia Signpost/2020-09-27/News and notes

Source: Wikipedia, the free encyclopedia.
News and notes

More large-scale errors at a "small" wiki

Large-scale errors at Malagasy Wiktionary

Growth of Malagasy Wiktionary, 99.23% due to bot edits

A small wiki audit of the Malagasy Wiktionary found that the wiktionary, which has the second largest number of entries (over 6,103,961), has had a large number of their pages automatically translated. Bot-Jagwar is a bot account run by Jagwar, the sole admin who has made edits. On the project, his bot has made more than 22 million edits (and counting). Jagwar also has a secondary bot account, Bot-Jagwar II which has made a further 6,976 edits. Another major bot contributing to mg.wikt, making the exact same type of edit, is Ikotobaity, with 2,456,748 edits run by Lohataona until 2017; the bot has been inactive since 20 October 2017. These three bots have created 6,076,769 new mainspace pages, which is 99.23% of all mainspace pages on mg.wikt. (Jagwar also ran bot edits on his main account, so the true number of bot-created entries is likely 50,000 higher.)

In this blog post, Jagwar detailed the history of his bot and mg.wikt. The bot began editing in 2010, at a rate of 50,000 edits per day, initially simply importing foreign words from other wiktionaries. After the wiki reached 200,000 pages in 2011, he wrote a script that "upload[ed] the word forms of that language", and propelled Malagasy Wiktionary to be the third largest. In 2012, Jagwar developed a more refined script. He uses NLP and automated translation in order to generate new entries, with no human intervention nor oversight. In the blog post, he wrote that translation errors were estimated at <5%, though he had "no precise idea" of it.

There is no active editing community, and Jagwar is the sole active admin on the site. Jagwar himself has only made 6 edits in the last 90 days, of which only 3 were in mainspace. The audit noted that there are various mistakes in the entries. Of a random survey of 100 non-Malagasy entries, the auditor concluded that 49 were "unusable", 29 "partially usable", and only 22 were "fully correct and usable" (though they may still have minor errors). Of Malagasy entries, the report noted that:

There are 41,902 entries categorised as lacking any definition, most of which seem to be Malagasy entries, and around 30,000 of which are the result of the definitions being removed due to copyright violation many years ago. Although there are 1,150,182 Malagasy entries in total, most of these are inflected forms, which can generally be safely created by bots. These definitionless entries are not strictly speaking incorrect, but a definition is the most central function of a dictionary, so these entries fail to be a useful part of the dictionary as a whole.

The bots also ran 218,156 edits at chr.wikt from 2012 to 2014 and 127,389 edits at ku.wikt from 2012 to 2013. The audit concluded that "Even an editing community of the size of the biggest Wiktionary, en.wikt, would not be able to clean up after these bots by hand". It strongly recommended deleting all non-Malagasy entries, removing translation sections, and telling the bot owners to cease automated creation of entries, and weakly recommended deleting all definition-less entries. – adapted by Eddie891 from Large-scale errors at Malagasy Wiktionary, written by Metaknowledge, with help from Surjection, AryamanA, Erutuon, and Smashhoof, along with input from a fluent speaker of Malagasy who wishes to remain anonymous.

Inline parenthetical citations deprecated

A

watchlist notice for the RFC was placed on 29 August after a discussion
determined that it was a sufficiently high-profile RFC.

In closing the discussion, Seraphimblade noted that roughly 71% of the community had supported the proposal and that there was only consensus to deprecate "parenthetical style citations directly inlined into articles", rather than {{

WP:CITEVAR guidelines needing an update, though as of The Signpost's publication deadline, what the update would look like was still under discussion. Before the RfC, CITEVAR specifically stated that "editors should not attempt to change an article's established citation style merely on the grounds of personal preference" and cited a 2006 Arbitration Committee decision that "Wikipedia does not mandate styles in many different areas", including citation style. E

More news

Brief notes