Spider trap

Source: Wikipedia, the free encyclopedia.

A spider trap (or crawler trap) is a set of web pages that may intentionally or unintentionally be used to cause a

web spiders, from which the name is derived. Spider traps may be created to "catch" spambots or other crawlers that waste a website's bandwidth. They may also be created unintentionally by calendars that use dynamic pages
with links that continually point to the next day or year.

Common techniques used are:

There is no algorithm to detect all spider traps. Some classes of traps can be detected automatically, but new, unrecognized traps arise quickly.

Politeness

A spider trap causes a web crawler to enter something like an infinite loop,[3] which wastes the spider's resources,[4] lowers its productivity, and, in the case of a poorly written crawler, can crash the program. Polite spiders alternate requests between different hosts, and do not request documents from the same server more than once every several seconds,[5] meaning that a "polite" web crawler is affected to a much lesser degree than an "impolite" crawler.[citation needed]

In addition, sites with spider traps usually have a

robots.txt telling bots not to go to the trap, so a legitimate "polite" bot would not fall into the trap, whereas an "impolite" bot which disregards the robots.txt settings would be affected by the trap.[6]

See also

References

  1. ^ ""What is a Spider Trap?"". Techopedia. 27 November 2017. Retrieved 2018-05-29.
  2. ^ Neil M Hennessy. "The Sweetest Poison, or The Discovery of L=A=N=G=U=A=G=E Poetry on the Web". Accessed 2013-09-26.
  3. ^ "Portent". Portent. 2016-02-03. Retrieved 2019-10-16.
  4. ^ "How to Set Up a robots.txt to Control Search Engine Spiders (thesitewizard.com)". www.thesitewizard.com. Retrieved 2019-10-16.
  5. ^ "Building a Polite Web Crawler". The DEV Community. 13 April 2019. Retrieved 2019-10-16.
  6. ^ Group, J. Media (2017-10-12). "Closing a spider trap: fix crawl inefficiencies". J Media Group. Retrieved 2019-10-16.