Last Updated on April 16, 2021 by Amit

What is a web crawler?

A Web crawler, also called a spider or spiderbot and often shortened to crawler, is a type of Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).
Search engines like google , yahoo use web crawling software that automatically reaches out to domains (that allow indexing) on world wide web to index them .

How to block web crawlers on my site?

If you do not want your domain to be available in search results or you want to keep your domain away from web indexing, then you can block user agents that visit your website for the purpose or crawling.

One of the quickest solutions to block web crawlers is URL rewriting. You can add a RewriteRule to your htaccess to do this.

Block “google web crawler” with RewriteRule

To block “google web crawler” you can use the following rule :

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} googlebot [NC]
RewriteRule .* - [F]

If you add this code to your htaccess in root folder then this will deny all requests to google web crawlers. Google bot will get a forbidden 403 error if it visits any page on your domain.

Block multiple web crawlers using RewriteRule

You can block more then one crawler bots using one single RewriteRule.

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} googlebot|yahoo|bing [NC]
RewriteRule .* - [F]

The above rule will block Google , Yahoo and Bing search bots from your site. To add more bots just separate them using a Pipe char | in the RewriteCond.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.