Spamdexing
![]() | This article needs additional citations for verification. (February 2021) ) |
Spamdexing (also known as search engine spam, search engine poisoning,
Spamdexing could be considered to be a part of search engine optimization,[4] although there are many search engine optimization methods that improve the quality and appearance of the content of web sites and serve content useful to many users.[5]
Overview
Search engines use a variety of
Common spamdexing techniques can be classified into two broad classes: content spam[5] (or term spam) and link spam.[3]
History
The earliest known reference
The problem arises when site operators load their Web pages with hundreds of extraneous terms so search engines will list them among legitimate addresses. The process is called "spamdexing," a combination of spamming—the Internet term for sending users unsolicited information—and "indexing."[2]
Content spam
These techniques involve altering the logical view that a search engine has over the page's contents. They all aim at variants of the vector space model for information retrieval on text collections.
Keyword stuffing
Keyword stuffing involves the calculated placement of keywords within a page to raise the keyword count, variety, and density of the page. This is useful to make a page appear to be relevant for a web crawler in a way that makes it more likely to be found. Example: A promoter of a Ponzi scheme wants to attract web surfers to a site where he advertises his scam. He places hidden text appropriate for a fan page of a popular music group on his page, hoping that the page will be listed as a fan site and receive many visits from music lovers. Older versions of indexing programs simply counted how often a keyword appeared, and used that to determine relevance levels. Most modern search engines have the ability to analyze a page for keyword stuffing and determine whether the frequency is consistent with other sites created specifically to attract search engine traffic. Also, large webpages are truncated, so that massive dictionary lists cannot be indexed on a single webpage.[citation needed] (However, spammers can circumvent this webpage-size limitation merely by setting up multiple webpages, either independently or linked to each other.)
Hidden or invisible text
Unrelated
Meta-tag stuffing
This involves repeating keywords in the
Doorway pages
"Gateway" or doorway pages are low-quality web pages created with very little content, which are instead stuffed with very similar keywords and phrases. They are designed to rank highly within the search results, but serve no purpose to visitors looking for information. A doorway page will generally have "click here to enter" on the page; autoforwarding can also be used for this purpose. In 2006, Google ousted vehicle manufacturer BMW for using "doorway pages" to the company's German site, BMW.de.[9]
Scraper sites
Article spinning
Article spinning involves rewriting existing articles, as opposed to merely scraping content from other sites, to avoid penalties imposed by search engines for duplicate content. This process is undertaken by hired writers[citation needed] or automated using a thesaurus database or a neural network.
Machine translation
Similarly to article spinning, some sites use machine translation to render their content in several languages, with no human editing, resulting in unintelligible texts that nonetheless continue to be indexed by search engines, thereby attracting traffic.
Link spam
Link spam is defined as links between pages that are present for reasons
other than merit.
Link farms
Link farms are tightly-knit networks of websites that link to each other for the sole purpose of exploiting the search engine ranking algorithms. These are also known facetiously as mutual admiration societies.[11] Use of links farms has greatly reduced with the launch of Google's first Panda Update in February 2011, which introduced significant improvements in its spam-detection algorithm.
Private blog networks
Blog networks (PBNs) are a group of authoritative websites used as a source of contextual links that point to the owner's main website to achieve higher search engine ranking. Owners of PBN websites use expired domains or auction domains that have backlinks from high-authority websites. Google targeted and penalized PBN users on several occasions with several massive deindexing campaigns since 2014.[12]
Hidden links
Putting
Sybil attack
A
Spam blogs
Spam blogs are blogs created solely for commercial promotion and the passage of link authority to target sites. Often these "splogs" are designed in a misleading manner that will give the effect of a legitimate website but upon close inspection will often be written using spinning software or be very poorly written with barely readable content. They are similar in nature to link farms.[15][16]
Guest blog spam
Guest blog spam is the process of placing guest blogs on websites for the sole purpose of gaining a link to another website or websites. Unfortunately, these are often confused with legitimate forms of guest blogging with other motives than placing links. This technique was made famous by Matt Cutts, who publicly declared "war" against this form of link spam.[17]
Buying expired domains
Some link spammers utilize expired domain crawler software or monitor DNS records for domains that will expire soon, then buy them when they expire and replace the pages with links to their pages. However, it is possible but not confirmed that Google resets the link data on expired domains.[citation needed] To maintain all previous Google ranking data for the domain, it is advisable that a buyer grab the domain before it is "dropped".
Some of these techniques may be applied for creating a
Using world-writable pages
Web sites that can be edited by users can be used by spamdexers to insert links to spam sites if the appropriate anti-spam measures are not taken.
Automated spambots can rapidly make the user-editable portion of a site unusable. Programmers have developed a variety of automated spam prevention techniques to block or at least slow down spambots.
Spam in blogs
Spam in blogs is the placing or solicitation of links randomly on other sites, placing a desired keyword into the hyperlinked text of the inbound link. Guest books, forums, blogs, and any site that accepts visitors' comments are particular targets and are often victims of drive-by spamming where automated software creates nonsense posts with links that are usually irrelevant and unwanted.
Comment spam
Comment spam is a form of link spam that has arisen in web pages that allow dynamic user editing such as
Wiki spam
Wiki spam is when a spammer uses the open editability of wiki systems to place links from the wiki site to the spam site.
Referrer log spamming
Countermeasures
Because of the large amount of spam posted to user-editable webpages, Google proposed a "nofollow" tag that could be embedded with links. A link-based search engine, such as Google's PageRank system, will not use the link to increase the score of the linked website if the link carries a nofollow tag. This ensures that spamming links to user-editable websites will not raise the sites ranking with search engines. Nofollow is used by several major websites, including Wordpress, Blogger and Wikipedia.[citation needed]
Other types
Mirror websites
A
URL redirection
Cloaking
Countermeasures
![]() | This section needs expansion. You can help by adding to it . (October 2017) |
Page omission by search engine
Spamdexed pages are sometimes eliminated from search results by the search engine.
Page omission by user
Users can employ search operators for filtering. For Google, a keyword preceded by "-" (minus) will omit sites that contains the keyword in their pages or in the URL of the pages from search result. As an example, the search "-<unwanted site>" will eliminate sites that contains word "<unwanted site>" in their pages and the pages whose URL contains "<unwanted site>".
Users could also use the Google Chrome extension "Personal Blocklist (by Google)", launched by Google in 2011 as part of countermeasures against content farming.[20] Via the extension, users could block a specific page, or set of pages from appearing in their search results. As of 2021, the original extension appears to be removed, although similar-functioning extensions may be used.
Possible solutions to overcome search-redirection poisoning redirecting to illegal internet pharmacies include notification of operators of vulnerable legitimate domains. Further, manual evaluation of SERPs, previously published link-based and content-based algorithms as well as tailor-made automatic detection and classification engines can be used as benchmarks in the effective identification of pharma scam campaigns.[21]
See also
- Adversarial information retrieval
- Index (search engine)– overview of search engine indexing technology
- TrustRank
- Web scraping
- Microsoft SmartScreen
- Microsoft Defender
References
- ^ SearchEngineLand, Danny Sullivan's video explanation of Search Engine Spam, October 2008 Archived 2008-12-17 at the Wayback Machine . Retrieved 2008-11-13.
- ^ a b c "Word Spy - spamdexing" (definition), March 2003, webpage:WordSpy-spamdexing Archived 2014-07-18 at the Wayback Machine.
- ^ ISBN 1-59593-046-9, archived(PDF) from the original on 2020-02-15, retrieved 2007-10-05
- ISSN 1468-4527.
- ^ ISBN 1-59593-323-9
- ^ "SEO basics: what is black hat SEO?". IONOS Digitalguide. Retrieved 2022-08-22.
- ^ Smarty, Ann (2008-12-17). "What Is BlackHat SEO? 5 Definitions". Search Engine Journal. Archived from the original on 2012-06-21. Retrieved 2012-07-05.
- ^ Montti, Roger (2020-10-03). "Everything You Need to Know About Hidden Text & SEO". Search Engine Journal. Archived from the original on 2021-11-22. Retrieved 2021-11-22.
- The NY Times. Archivedfrom the original on 2012-07-23. Retrieved 2012-07-03.
- ^ Davison, Brian (2000), "Recognizing Nepotistic Links on the Web" (PDF), AAAI-2000 workshop on Artificial Intelligence for Web Search, Boston: AAAI Press, pp. 23–28, archived (PDF) from the original on 2007-04-18, retrieved 2007-10-23
- ^ "Search Engines:Technology, Society, and Business - Marti Hearst, Aug 29, 2005" (PDF). berkeley.edu. Archived (PDF) from the original on July 8, 2007. Retrieved August 1, 2007.
- ^ "Google Targets Sites Using Private Blog Networks With Manual Action Ranking Penalties". Search Engine Land. 2014-09-23. Archived from the original on 2016-11-22. Retrieved 2016-12-12.
- OCLC 570440.
- OCLC 318353755.
- ISSN 0738-4602.
- .
- ^ "The decay and fall of guest blogging for SEO". mattcutts.com. 20 January 2014. Archived from the original on 3 February 2015. Retrieved 11 January 2015.
- ^ Mishne, Gilad; David Carmel; Ronny Lempel (2005). "Blocking Blog Spam with Language Model Disagreement" (PDF). Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web. Archived (PDF) from the original on 2011-07-21. Retrieved 2007-10-24.
- ^ "Sneaky redirects - Search Console Help". support.google.com. Archived from the original on 2015-05-18. Retrieved 2015-05-14.
- ^ "New: Block Sites From Google Results Using Chrome's "Personal Blocklist" - Search Engine Land". searchengineland.com. 14 February 2011. Archived from the original on 6 October 2017. Retrieved 6 October 2017.
- PMID 36346655.
External links
