Wikipedia:Duplication detector

This is an information page.

It is not one of Wikipedia's policies or guidelines; rather, its purpose is to explain certain aspects of Wikipedia's norms, customs, technicalities, or practices. It may reflect differing levels of consensus and vetting.

Shortcut

WP:dupdet

The duplication detector is a tool used to compare any two web pages to identify text which has been copied from one to the other. It can compare two Wikipedia pages to one another, two versions of a Wikipedia page to one another, a Wikipedia page (current or old revision) to an external page, or two external pages to one another. Duplication detector locates passages in which the text on the two pages is the same. The number of words to match is variable, but set by default to 2.

Usage

The tool is frequently used in checking

accuracy

.

The tool is used by supplying URLs of two websites to compare (or, if using the advanced version, by uploading either document from your computer). It supports text, HTML, and PDF documents. For other types of documents, check Google's cache for an HTML version by doing a Google search for "cache:URL". To make the tool run faster for very large documents, increase minimum number of words to at least 3. For source documents containing scattered numerals, you may have to check "Remove numbers" to get the best matches. You have the option of removing quotations from matches.

Duplication detector can see article text hidden by templates like {{copyvio}}, since the text is still in the HTML page source, but cannot see text that has been removed. You need to use the URL of an old revision in this case.

For evaluating copyright or plagiarism

Duplication detector is best at finding literal duplication and larger strings of numbers are indicative of extensive passages copied verbatim. It can also be used to assist in detecting close paraphrasing. Human judgment is always required. If text matches light up, the passages with identical text can be read and compared to see if the copied passages are uncreative and set in text that is overall sufficiently rewritten.

WP:CV101

.

License

The

Simplified BSD License

.

External links

The Duplication Detector on Toolforge

Usage

For evaluating copyright or plagiarism

License

See also

External links