Spider trap

A spider trap (or crawler trap) is a set of web pages that may intentionally or unintentionally be used to cause a

web spiders, from which the name is derived. Spider traps may be created to "catch" spambots or other crawlers that waste a website's bandwidth. They may also be created unintentionally by calendars that use dynamic pages

with links that continually point to the next day or year.

Common techniques used are:

creation of indefinitely deep

Politeness

A spider trap causes a web crawler to enter something like an infinite loop,^[3] which wastes the spider's resources,^[4] lowers its productivity, and, in the case of a poorly written crawler, can crash the program. Polite spiders alternate requests between different hosts, and do not request documents from the same server more than once every several seconds,^[5] meaning that a "polite" web crawler is affected to a much lesser degree than an "impolite" crawler.^{[citation needed]}

In addition, sites with spider traps usually have a

robots.txt telling bots not to go to the trap, so a legitimate "polite" bot would not fall into the trap, whereas an "impolite" bot which disregards the robots.txt settings would be affected by the trap.^[6]

References

^ ""What is a Spider Trap?"". Techopedia. 27 November 2017. Retrieved 2018-05-29.
^ Neil M Hennessy. "The Sweetest Poison, or The Discovery of L=A=N=G=U=A=G=E Poetry on the Web". Accessed 2013-09-26.
^ "Portent". Portent. 2016-02-03. Retrieved 2019-10-16.
^ "How to Set Up a robots.txt to Control Search Engine Spiders (thesitewizard.com)". www.thesitewizard.com. Retrieved 2019-10-16.
^ "Building a Polite Web Crawler". The DEV Community. 13 April 2019. Retrieved 2019-10-16.
^ Group, J. Media (2017-10-12). "Closing a spider trap: fix crawl inefficiencies". J Media Group. Retrieved 2019-10-16.

This World Wide Web–related article is a stub. You can help Wikipedia by expanding it.

[1] ""What is a Spider Trap?"". Techopedia. 27 November 2017. Retrieved 2018-05-29.

[2] Neil M Hennessy. "The Sweetest Poison, or The Discovery of L=A=N=G=U=A=G=E Poetry on the Web". Accessed 2013-09-26.

[3] "Portent". Portent. 2016-02-03. Retrieved 2019-10-16.

[4] "How to Set Up a robots.txt to Control Search Engine Spiders (thesitewizard.com)". www.thesitewizard.com. Retrieved 2019-10-16.

[5] "Building a Polite Web Crawler". The DEV Community. 13 April 2019. Retrieved 2019-10-16.

[6] Group, J. Media (2017-10-12). "Closing a spider trap: fix crawl inefficiencies". J Media Group. Retrieved 2019-10-16.

[3]

[4]

[5]

[6]

Internet search
Types	Web search engine (List ) Metasearch engine Multimedia search Collaborative search engine Cross-language search Local search Vertical search Social search Image search Audio search Video search engine Enterprise search Semantic search Natural language search engine Voice search
Tools	Cross-language information retrieval Search by sound Search engine marketing Search engine optimization Evaluation measures Search oriented architecture Selection-based search Document retrieval Text mining Web crawler Multisearch Federated search Search aggregator Index/Web indexing Focused crawler Spider trap Robots exclusion standard Distributed web crawling Web archiving Website mirroring software Web query Web query classification
Protocols and standards	Z39.50 Search/Retrieve Web Service Search/Retrieve via URL OpenSearch Representational State Transfer Wide area information server
See also	Search engine Desktop search Online search

Politeness

See also

References