Forum Moderators: coopster & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

         

toolman

3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

SomeCallMeTim

1:40 am on Jan 9, 2003 (gmt 0)

10+ Year Member



does anyone in this thread ever thought about, that these apps all can change their user-agent-string and add a refferer bei theirselves?

Which is why we went with the Spambot Trap solution. The google cache of the page is at:

cached page [216.239.39.100]

It is working nicely for us.

hakre

3:33 am on Jan 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



nice trap. and these traps should also generate pages with dozens of wrong email adresses which will spam the databases of these robots. if they can't get enough, feed them to death ;-)

pmkpmk

8:17 am on Jan 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anybody already tried SugarPlum? www.devin.com/sugarplum/

"Sugarplum employs a combination of Apache's mod_rewrite URL rewriting rules and perl code. It combines several anti-spambot tactics, includling fictitious (but RFC822-compliant) email address poisoning, injection with the addresses of known spammers (let them all spam each other), deterministic output, and "teergrube" spamtrap addressing.

Sugarplum tries to be very difficult to detect automatically, leaving no signature characteristics in its output, and may be grafted in at any point in a webserver's document tree, even passing itself off as a static HTML file. It can optionally operate deterministically, producing the same output on many requests of the same URL, making it difficult to detect by comparison of multiple HTTP requests.

Friday, 09/27/2002: Sugarplum 0.9.8 is available. This is a major revision, based on a "two years hence" review of evolved spammer tactics, countermeasure viability, and various public feedback. This release is much quicker, easier to install and maintain, and about half the size. See the changelog for details. "

xlcus

2:47 am on Jan 12, 2003 (gmt 0)

10+ Year Member



Slightly off topic, but a related subject...
If you're trying to block crawlers and bots that rapidly hit your server and put it under heavy load, and you have access to PHP, you might want to take a look at the script I posted to this thread [webmasterworld.com].

It identifies, on the fly, WebCrawlers rapidly requesting pages without the need for a black-list of bots.

neslon

9:45 pm on Feb 8, 2003 (gmt 0)



What a great thread! I've incorporated your "latest and greatest" list into my own very out-of-date list of harvesters/bandwidth-suckers. I had a few that you didn't in the list of yours that I worked from, but these may be obsolete:

RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^SearchExpress [OR]
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR]

I was surprised that there was no mention of mod_throttle [snert.com] for those who run their own apache servers. I've only just started playing with it, but it seems to be an absolutely tremendous tool, even if only for pinpointing real-time, at a glance, who's eating all the bandwidth. But there are facilities for delaying/refusing requests from client IPs that make too many requests. The one downside seems to be slightly skimpy documentation.

At any rate, this is a great site (my first post here) and a tremendous resource. Thanks!

jatar_k

10:18 pm on Feb 8, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld neslon

andreasfriedrich

8:38 pm on Feb 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mod_throttle, Apache [httpd.apache.org]::SpeedLimit, et. al. are mentioned in quite a few threads around here. Try either the site search or Google to locate them :).

WindSun

4:41 pm on Feb 12, 2003 (gmt 0)

10+ Year Member



This seems to work for blocking most of the email harvesters, but I am not sure it is the most efficient way to do it (in .htaccess):

SetEnvIfNoCase User-Agent "EmailCollector/1.0" spam_bot
SetEnvIfNoCase User-Agent "EmailSiphon" spam_bot
SetEnvIfNoCase User-Agent "EmailWolf 1.00" spam_bot
SetEnvIfNoCase User-Agent "ExtractorPro" spam_bot
SetEnvIfNoCase User-Agent "Crescent Internet ToolPak HTTP OLE Control v.1.0" spam_bot
SetEnvIfNoCase User-Agent "Mozilla/2.0 (compatible; NEWT ActiveX; Win32)" spam_bot
SetEnvIfNoCase User-Agent "CherryPicker/1.0" spam_bot
SetEnvIfNoCase User-Agent "CherryPickerSE/1.0" spam_bot
SetEnvIfNoCase User-Agent "CherryPickerElite/1.0" spam_bot
SetEnvIfNoCase User-Agent "NICErsPRO" spam_bot
SetEnvIfNoCase User-Agent "WebBandit/2.1" spam_bot
SetEnvIfNoCase User-Agent "WebBandit/3.50" spam_bot
SetEnvIfNoCase User-Agent "webbandit/4.00.0" spam_bot
SetEnvIfNoCase User-Agent "WebEMailExtractor/1.0B" spam_bot
SetEnvIfNoCase User-Agent "autoemailspider" spam_bot
Order Allow,Deny
Allow from all
Deny from env=spam_bot

Panicschat

9:30 pm on Feb 14, 2003 (gmt 0)

10+ Year Member


I have been reading through this thread and have found it to be extreemly interesting and useful. I particuarly like the helpful content from Superman, Toolman and Key_Master.

I have been mucking around with my .htaccess for some months, trying to block people who have been doing various neferous things like hotlinking, downloading my content to display on other sites and grabbing my entire web site.

Hotlinking is taken care of. I have a little seperate .htaccess in each sub directory of the root directory that reads as follows:

RewriteEngine on

RewriteCond %{HTTP_REFERER}!^http://mydomain.org/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www.mydomain.org/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://myotherdomain.org/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www.myotherdomain.org/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www.myotherdomain.org/index.html/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www.mydomain.org/newindex.htm/.*$ [NC]
RewriteRule .*\.(jpg¦jpeg¦gif¦png¦bmp)$ http://www.mydomain.org/403.shtml [R,NC]

That works just fine. No problems with that at all. Note that it allows access to my pictures from both of my domains.

In my root directory I have the following .htaccess file. Obviously most of my neforous visitors are locals. Yes I am blocking out whole ISPs which will affect a huge number of visitors but that's okay, as it is part of my intention. I am working on reducing the number of Ip addresses listed by adding machine names that correspond to ip ranges. Trust me, that does work.

I have a couple of questions;
Is there a way to write this so that people do get to see the 403 error? Currently they don't see it.
I know I can use something like "deny from 61.95.30." but can I also use "deny from 61.95."? Note the second one just has two ip numbers.

Also, before I go, here's something useful for you all who are effectivly blocking out people who steal your web site content. Ever thought those people will just go to Google cache and steal content from there? Then this line in your HTML head will fix that;
<META NAME="ROBOTS" CONTENT="NOARCHIVE">

ErrorDocument 403 403.shtml

<Limit GET>
order allow,deny
deny from 61.95.30.
deny from 63.148.99.
deny from 64.12.183.
deny from 64.229.81.
deny from 65.92.21.
deny from 65.94.39.
deny from 65.95.181.
deny from 65.95.185.
deny from 128.250.6.
deny from 128.250.9.
deny from 128.250.15.
deny from 128.250.16.
deny from 129.78.64.
deny from 139.134.64.
deny from 144.135.25.
deny from 147.188.192.
deny from 195.239.232.
deny from 202.12.144.
deny from 203.40.140.
deny from 203.40.160.
deny from 203.40.161.
deny from 203.40.162.
....(many more of these starting with 203.)
deny from 204.83.211.
deny from 205.191.171.
deny from 207.44.200.
deny from 207.156.7.
deny from 207.172.11.
deny from 209.90.147.
deny from 209.178.220.
deny from 210.49.20.
deny from 210.49.21.
deny from 210.49.22.
deny from 210.50.16.
deny from 211.28.51.
deny from 211.28.96.
deny from 211.28.219.
deny from 212.95.252.
deny from 216.12.216
deny from 216.16.1.
deny from 216.218.129.
deny from .adnp.net.au
deny from .alphalink.com.au
deny from .comindico.com.au
deny from .csu.edu.au
deny from .da.uu.net
deny from .gil.com.au
deny from .iprimus.net.au
deny from .labyrinth.net.au
deny from .netspace.net.au
deny from .nsw.bigpond.net.au
deny from .optusnet.com.au
deny from .ozemail.com.au
deny from .sympatico.edu.ca
deny from .tmns.net.au
deny from .usyd.edu.au
deny from .vic.bigpond.net.au
allow from all
</Limit>

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon
RewriteRule /*$ http://www.crimestoppers.com.au/ [L,R]

lorax

5:12 am on Feb 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My gawd, forget to look at a thread and see what happens. Great stuff has happened since I last read this thread.

Looks like I got here too late to look at the Spambot trap.

This 243 message thread spans 25 pages: 243