When.com Web Search

Search results

  1. Results From The WOW.Com Content Network
  2. robots.txt - Wikipedia

    en.wikipedia.org/wiki/Robots.txt

    robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots ...

  3. MediaWiki:Robots.txt - Wikipedia

    en.wikipedia.org/wiki/MediaWiki:Robots.txt

    This page was last edited on 25 November 2024, at 18:30 (UTC).; Text is available under the Creative Commons Attribution-ShareAlike 4.0 License; additional terms may apply.

  4. Wikipedia

    en.wikipedia.org/robots.txt

    # robots.txt for http://www.wikipedia.org/ and friends # # Please note: There are a lot of pages on this site, and there are # some misbehaved spiders out there that ...

  5. Wikipedia : Controlling search engine indexing

    en.wikipedia.org/wiki/Wikipedia:Controlling...

    MediaWiki:Robots.txt forbids analytic tools from visiting sensitive or potentially sensitive types of pages, primarily in the Wikipedia namespace – for example deletion debates. A side effect of not visiting is normally that a page cannot be indexed. Where possible, you should in addition use __NOINDEX__ for those pages.

  6. Wikipedia:Reader's index to Wikipedia - Wikipedia

    en.wikipedia.org/wiki/Wikipedia:Reader's_index_to...

    Robots.txt file – specifies search engines that are not allowed to crawl all or part of Wikipedia, as well as pages/namespaces that are not to be indexed by any search engine; MediaWiki:Robots.txt – direct editing of robots.txt; Wikipedia:Talk pages not indexed by Google (feature request) Wikipedia:Requests for comment/NOINDEX; Tools:

  7. MediaWiki talk:Robots.txt - Wikipedia

    en.wikipedia.org/wiki/MediaWiki_talk:Robots.txt

    MediaWiki:Robots.txt provides the Robots.txt file for English Wikipedia, telling search engines not to index the specified pages. See the documentation of {{ NOINDEX }} for a survey of noindexing methods.

  8. Common Crawl - Wikipedia

    en.wikipedia.org/wiki/Common_Crawl

    The organization's crawlers respect nofollow and robots.txt policies. Open source code for processing Common Crawl's data set is publicly available. The Common Crawl dataset includes copyrighted work and is distributed from the US under fair use claims.

  9. Help:Using archive.today - Wikipedia

    en.wikipedia.org/wiki/Help:Using_archive.today

    The use of robots.txt for this purpose is essentially a hack that led to unintended consequences, for example domains that are hijacked or change ownership with the new domain owner adding a robots.txt which triggers archive providers to block the display of archives from the original site, even though the old site never had a robots.txt.