Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.
Feel free to use this on your own site and start blocking bots too.
(the top part is left out)<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
does anyone in this thread ever thought about, that these apps all can change their user-agent-string and add a refferer bei theirselves?
Which is why we went with the Spambot Trap solution. The google cache of the page is at:
cached page [216.239.39.100]
It is working nicely for us.
"Sugarplum employs a combination of Apache's mod_rewrite URL rewriting rules and perl code. It combines several anti-spambot tactics, includling fictitious (but RFC822-compliant) email address poisoning, injection with the addresses of known spammers (let them all spam each other), deterministic output, and "teergrube" spamtrap addressing.
Sugarplum tries to be very difficult to detect automatically, leaving no signature characteristics in its output, and may be grafted in at any point in a webserver's document tree, even passing itself off as a static HTML file. It can optionally operate deterministically, producing the same output on many requests of the same URL, making it difficult to detect by comparison of multiple HTTP requests.
Friday, 09/27/2002: Sugarplum 0.9.8 is available. This is a major revision, based on a "two years hence" review of evolved spammer tactics, countermeasure viability, and various public feedback. This release is much quicker, easier to install and maintain, and about half the size. See the changelog for details. "
It identifies, on the fly, WebCrawlers rapidly requesting pages without the need for a black-list of bots.
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^SearchExpress [OR]
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR]
I was surprised that there was no mention of mod_throttle [snert.com] for those who run their own apache servers. I've only just started playing with it, but it seems to be an absolutely tremendous tool, even if only for pinpointing real-time, at a glance, who's eating all the bandwidth. But there are facilities for delaying/refusing requests from client IPs that make too many requests. The one downside seems to be slightly skimpy documentation.
At any rate, this is a great site (my first post here) and a tremendous resource. Thanks!
SetEnvIfNoCase User-Agent "EmailCollector/1.0" spam_bot
SetEnvIfNoCase User-Agent "EmailSiphon" spam_bot
SetEnvIfNoCase User-Agent "EmailWolf 1.00" spam_bot
SetEnvIfNoCase User-Agent "ExtractorPro" spam_bot
SetEnvIfNoCase User-Agent "Crescent Internet ToolPak HTTP OLE Control v.1.0" spam_bot
SetEnvIfNoCase User-Agent "Mozilla/2.0 (compatible; NEWT ActiveX; Win32)" spam_bot
SetEnvIfNoCase User-Agent "CherryPicker/1.0" spam_bot
SetEnvIfNoCase User-Agent "CherryPickerSE/1.0" spam_bot
SetEnvIfNoCase User-Agent "CherryPickerElite/1.0" spam_bot
SetEnvIfNoCase User-Agent "NICErsPRO" spam_bot
SetEnvIfNoCase User-Agent "WebBandit/2.1" spam_bot
SetEnvIfNoCase User-Agent "WebBandit/3.50" spam_bot
SetEnvIfNoCase User-Agent "webbandit/4.00.0" spam_bot
SetEnvIfNoCase User-Agent "WebEMailExtractor/1.0B" spam_bot
SetEnvIfNoCase User-Agent "autoemailspider" spam_bot
Order Allow,Deny
Allow from all
Deny from env=spam_bot
I have been mucking around with my .htaccess for some months, trying to block people who have been doing various neferous things like hotlinking, downloading my content to display on other sites and grabbing my entire web site.
Hotlinking is taken care of. I have a little seperate .htaccess in each sub directory of the root directory that reads as follows:
RewriteEngine on
RewriteCond %{HTTP_REFERER}!^http://mydomain.org/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www.mydomain.org/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://myotherdomain.org/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www.myotherdomain.org/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www.myotherdomain.org/index.html/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://www.mydomain.org/newindex.htm/.*$ [NC]
RewriteRule .*\.(jpg¦jpeg¦gif¦png¦bmp)$ http://www.mydomain.org/403.shtml [R,NC]
That works just fine. No problems with that at all. Note that it allows access to my pictures from both of my domains.
In my root directory I have the following .htaccess file. Obviously most of my neforous visitors are locals. Yes I am blocking out whole ISPs which will affect a huge number of visitors but that's okay, as it is part of my intention. I am working on reducing the number of Ip addresses listed by adding machine names that correspond to ip ranges. Trust me, that does work.
I have a couple of questions;
Is there a way to write this so that people do get to see the 403 error? Currently they don't see it.
I know I can use something like "deny from 61.95.30." but can I also use "deny from 61.95."? Note the second one just has two ip numbers.
Also, before I go, here's something useful for you all who are effectivly blocking out people who steal your web site content. Ever thought those people will just go to Google cache and steal content from there? Then this line in your HTML head will fix that;
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
ErrorDocument 403 403.shtml
<Limit GET>
order allow,deny
deny from 61.95.30.
deny from 63.148.99.
deny from 64.12.183.
deny from 64.229.81.
deny from 65.92.21.
deny from 65.94.39.
deny from 65.95.181.
deny from 65.95.185.
deny from 128.250.6.
deny from 128.250.9.
deny from 128.250.15.
deny from 128.250.16.
deny from 129.78.64.
deny from 139.134.64.
deny from 144.135.25.
deny from 147.188.192.
deny from 195.239.232.
deny from 202.12.144.
deny from 203.40.140.
deny from 203.40.160.
deny from 203.40.161.
deny from 203.40.162.
....(many more of these starting with 203.)
deny from 204.83.211.
deny from 205.191.171.
deny from 207.44.200.
deny from 207.156.7.
deny from 207.172.11.
deny from 209.90.147.
deny from 209.178.220.
deny from 210.49.20.
deny from 210.49.21.
deny from 210.49.22.
deny from 210.50.16.
deny from 211.28.51.
deny from 211.28.96.
deny from 211.28.219.
deny from 212.95.252.
deny from 216.12.216
deny from 216.16.1.
deny from 216.218.129.
deny from .adnp.net.au
deny from .alphalink.com.au
deny from .comindico.com.au
deny from .csu.edu.au
deny from .da.uu.net
deny from .gil.com.au
deny from .iprimus.net.au
deny from .labyrinth.net.au
deny from .netspace.net.au
deny from .nsw.bigpond.net.au
deny from .optusnet.com.au
deny from .ozemail.com.au
deny from .sympatico.edu.ca
deny from .tmns.net.au
deny from .usyd.edu.au
deny from .vic.bigpond.net.au
allow from all
</Limit>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon
RewriteRule /*$ http://www.crimestoppers.com.au/ [L,R]