How to Block Bots to Access Website

Block Bots to Access Website

What are bots:-

For those not familiar with the term Bots, are basically computer programs that surf multiple websites to perform a variety of automated tasks. It’s short for robots.

Examples of bots include those used by the search engines. Those bots retrieve a copy of your web page so that they can include relevant terms from that page in their search index.

Not all bots are benign however. Some bots go through your website looking for web forms and email addresses to send you spam.

Other bots probe your website for security vulnerabilities.

Inorder to block all bots from accessing your site, you should create a robots.txt file with the following content.

User-agent: *
Disallow: /

Note: Blocking all bots will get your site deindexed from search engines.

Prevent malicious bots from visiting Website. 

Targeted attack towards website is common that malicious bots will access your web site attempting to gain access via brute force force.  This in turn will commonly result in multiple script executions and will greatly increase the resource usage of the hosting account.

Its always good to have protection in order to avoid issues caused by such unauthorized access.

  • Add additional plugin for your application that prevents such attacks. For example a nice option for WordPress is the Wordfence security plugin.
  • Use a set of .htaccess rules that will prevent malicious requests towards your website. A good one that you can use out of the box for pretty much any general case is the 6G Firewall.
    The exact rules you can add to your .htaccess are also included below:

# 6G FIREWALL/BLACKLIST
# @ https://perishablepress.com/6g/

# 6G:[QUERY STRINGS]
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{QUERY_STRING} (eval\() [NC,OR]
RewriteCond %{QUERY_STRING} (127\.0\.0\.1) [NC,OR]
RewriteCond %{QUERY_STRING} ([a-z0-9]{2000}) [NC,OR]
RewriteCond %{QUERY_STRING} (javascript:)(.*)(;) [NC,OR]
RewriteCond %{QUERY_STRING} (base64_encode)(.*)(\() [NC,OR]
RewriteCond %{QUERY_STRING} (GLOBALS|REQUEST)(=|\[|%) [NC,OR]
RewriteCond %{QUERY_STRING} (<|%3C)(.*)script(.*)(>|%3) [NC,OR]
RewriteCond %{QUERY_STRING} (\\|\.\.\.|\.\./|~|`|<|>|\|) [NC,OR]
RewriteCond %{QUERY_STRING} (boot\.ini|etc/passwd|self/environ) [NC,OR]
RewriteCond %{QUERY_STRING} (thumbs?(_editor|open)?|tim(thumb)?)\.php [NC,OR]
RewriteCond %{QUERY_STRING} (\’|\”)(.*)(drop|insert|md5|select|union) [NC]
RewriteRule .* – [F]
</IfModule>

# 6G:[REQUEST METHOD]
<IfModule mod_rewrite.c>
RewriteCond %{REQUEST_METHOD} ^(connect|debug|delete|move|put|trace|track) [NC]
RewriteRule .* – [F]
</IfModule>

# 6G:[REFERRERS]
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_REFERER} ([a-z0-9]{2000}) [NC,OR]
RewriteCond %{HTTP_REFERER} (semalt.com|todaperfeita) [NC]
RewriteRule .* – [F]
</IfModule>

# 6G:[REQUEST STRINGS]
<IfModule mod_alias.c>
RedirectMatch 403 (?i)([a-z0-9]{2000})
RedirectMatch 403 (?i)(https?|ftp|php):/
RedirectMatch 403 (?i)(base64_encode)(.*)(\()
RedirectMatch 403 (?i)(=\\\’|=\\%27|/\\\’/?)\.
RedirectMatch 403 (?i)/(\$(\&)?|\*|\”|\.|,|&|&amp;?)/?$
RedirectMatch 403 (?i)(\{0\}|\(/\(|\.\.\.|\+\+\+|\\\”\\\”)
RedirectMatch 403 (?i)(~|`|<|>|:|;|,|%|\\|\s|\{|\}|\[|\]|\|)
RedirectMatch 403 (?i)/(=|\$&|_mm|cgi-|etc/passwd|muieblack)
RedirectMatch 403 (?i)(&pws=0|_vti_|\(null\)|\{\$itemURL\}|echo(.*)kae|etc/passwd|eval\(|self/environ)
RedirectMatch 403 (?i)\.(aspx?|bash|bak?|cfg|cgi|dll|exe|git|hg|ini|jsp|log|mdb|out|sql|svn|swp|tar|rar|rdf)$
RedirectMatch 403 (?i)/(^$|(wp-)?config|mobiquo|phpinfo|shell|sqlpatch|thumb|thumb_editor|thumbopen|timthumb|webshell)\.php
</IfModule>

# 6G:[USER AGENTS]
<IfModule mod_setenvif.c>
SetEnvIfNoCase User-Agent ([a-z0-9]{2000}) bad_bot
SetEnvIfNoCase User-Agent (archive.org|binlar|casper|checkpriv|choppy|clshttp|cmsworld|diavol|dotbot|extract|feedfinder|flicky|g00g1e|harvest|heritrix|httrack|kmccrew|loader|miner|nikto|nutch|planetwork|postrank|purebot|pycurl|python|seekerspider|siclab|skygrid|sqlmap|sucker|turnit|vikspider|winhttp|xxxyy|youda|zmeu|zune) bad_bot
<limit GET POST PUT>
Order Allow,Deny
Allow from All
Deny from env=bad_bot
</limit>
</IfModule>

# 6G:[BAD IPS]
<Limit GET HEAD OPTIONS POST PUT>
Order Allow,Deny
Allow from All
# uncomment/edit/repeat next line to block IPs
# Deny from 123.456.789
</Limit>

To implement: include the entire 6G Blacklist in the root .htaccess file of your site. Remember to backup your original .htaccess file before making any changes. Then test your pages thoroughly while enjoying a delicious beverage. If you encounter any issues, please read the troubleshooting tips and the section on reporting bugs. As always, feel free to share feedback and ask any questions in the comment section