Forum Moderators: coopster & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

         

toolman

3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

pmkpmk

11:35 am on Nov 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unfortunately I still have it not working (see earlier posts), but I hadn't had the time yet to look into it further (fate of a part-time-webmaster).

Bute there's a new issue: anyobody ever heard of a user-agent calling itself "GraphicBrain.com"?

This special agent seems to download the whole site (which - in theory - I don't mind) but it produces such long logfile entries that my logfile analyzer crashes :-(

Example of ONE(!) logfile line:
212.113.xx.yy - - [01/Aug/2002:06:13:40 +0200] "GET / HTTP/1.0" 200 7931 "-" "GraphicBrain.com" "visitid=3D48ADCB000031EB604E6FEE; KeyWordCookie=GIFTS%2CFLOWERS%2CTRAVEL; ASPSESSIONIDGGGGQRGV=GAOLJNCCJGCBKNPENHBGALON; ASPSESSIONIDGGGQGGDP=HKPPOOGDHBLFPPCILOFEAOHK; ASPSESSIONIDGQGQGNFK=ENFBNHFAEGFHDBNAAAINIKPO; ASPSESSIONIDGQQQGVUY=OJDHAPPBPLCDABCNFPGBJNAL; ASPSESSIONIDGGGQGVUY=HBDOOEECBEBPBADJPMFACJLD; ARPT=IQKKVWSINT3CKMYJ; ASPSESSIONIDQGGGQHOQ=NLAMCPEADENDOOMECNBCAPDO; CFGLOBALS=HITCOUNT%3D1%23LASTVISIT%3D%7Bts+%272002%2D07%2D31+23%3A53%3A13%27%7D%23 TIMECREATED%3D%7Bts+%272002%2D07%2D31+23%3A53%3A13%27%7D%23; CFID=426530; CFTOKEN=39863655; ASPSESSIONIDQGQGQMGG=MIHOHNGDKMOANCPDMCJNKKKE; ASPSESSIONIDQGQGGLCG=HEJLGBDCEGMHCKMEEIECDOBB; ASPSESSIONIDQGQGGWUC=EKDLICBAJCHGOCJIADFCHDLP; RQFW={9762A7AC-D44A-4B43-AA6D-6688B4D7C48B}; ASPSESSIONIDGGQQQMTK=DFNEFLLBHKAPIBACMJOKOCBH; ASPSESSIONIDQGGQGOBG=GFFNPGMCMGBKDOJICDCGMEMF; WEBTRENDS_ID=212.113.82.197-2086124832.29505808; EGSOFT_ID=212.113.82.197-591707536.29505809; SappiUserID=471577; ASPSESSIONIDQQQQQJCO=MBPICOPCHEHEAFOJHMPBLFBE; ASPSESSIONIDGGQQGOOY=OCPLHOPCPHDLHOGHKCAJLECE; ASPSESSIONIDQQQGQGAB=HGMMAHPBONOHNEGIJICNGLCL"

[edited by: jatar_k at 5:07 pm (utc) on Nov. 15, 2002]
[edit reason] fixed side scroll [/edit]

pmkpmk

11:51 am on Nov 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Andreas,

in reply to message number 161 [webmasterworld.com...]

Well, as soon as I have the .htaccess in the subdirectory of the virtual server, Apache won't reload the config or restart - it exists with error.

The requested URI should be on the same virtual server. Actually configwise I've taken the default config of Apache and all my modifications mostly were in the virtual hosts section.

I'm a bit reluctant to post uncensored configfiles and logfile exceprts here on this public space, but up to my best knowledge (which may not be much) I think I made it right.

I guess it's only one little configuration routine which is faulty or missing.

Since I have all root priviliges, I'm not limited to htaccess but can make changes to other parts of the config as well. As I mentioned in another post I'm only trying to block email harvesters.

So what would be your recommendation?

[edited by: jatar_k at 5:10 pm (utc) on Nov. 15, 2002]
[edit reason] fixed link and sidescroll [/edit]

andreasfriedrich

1:46 pm on Nov 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi pmkpmk,

I´m in a hurry right now and need to catch a train in half an hour. I´ll get back to you tommorrow unless somebody else already helped you solve your problem.

Andreas

pmkpmk

10:22 am on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks to Andreas Friedrich, we solved my mysterious problem!

Andreas found out, that I had a directive:

<Files index.html>
Options -FollowSymLinks +Includes
</Files>

in my httpd.conf. Even though according to the documentation the "Options"-line should be ignored, it actually isn't.

After removing the "-FollowSymLinks" from the statement, everything works as supposed.

pmkpmk

11:34 am on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Aren't you harassed by the "e-mail spyder" (www.emailspyder.com)?

I has the user-agent "Microsoft URL Control" and - well - spiders for email addresses.

RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [NC,OR]

pmkpmk

1:15 pm on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The second line of defense (slightly off topic, but worth mentioning it):

If you're lucky enough to run your own mailserver under your own control, you can add a second line of defense: the use of realtime blacklists (somtimes also called realtime blocklist or RBL's) in your mailserver allows you to block potential spam when the spammer tries to deliver it to you. On EACH incoming email, the mail-server checks at least one of these RBL's. If the senders IP-address tests positive on this list, email delivery is instantly cancelled even BEFORE the mail-data is transferred to your server. There's a multitude of RBL's out there. Our server checks EACH incoming message against 5 different RBL's. Some of our users - including myself - post-check their messages again against other RBL's. I - for example - have all messages coming from Russia/China/Korea/Malaysia etc. tagged with the prefix "**SPAM**". This second (and third) line of defense makes life a lot esier!

pmkpmk

11:52 am on Nov 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As mentioned above, thanks to Andreas Freidrich everything works fine now, and I "catch" typically 1-2 mail harvesters per day that way. I downloaded a few of those bugs myself to get a feeling how they work.

And now the $1.000.000 prize question is: what hinders a programmer of these bugs to "steal" the user-agent string of - say - IE5.0?

Am I right in thinking that a bot camouflaging itself as IE50 would be COMPLETELY invisible to .htaccess rewrite rules?

Andy_White

4:42 pm on Nov 28, 2002 (gmt 0)

10+ Year Member



Hi,

I've been reading through this thread and having used htaccess to secure areas of other websites I thought I'd test out the concepts on a dormant web site on my server.

But when I add the file which contains :-

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

I find I can't get into any of the pages.

The site is a virtual site on a server I have root access to and I've checked the httpd.conf to see that rewriteengine is on in each of the virtual sites.

Can anybody suggest what I'm doing wrong?

Andy

Later update

I've checked my server logs and I'm getting the message:-
RewriteEngine not allowed here

So now I'm really confused

Later still update

Solved it, I needed to amend the access.conf to allow overide on fileinfo

okidata

6:06 pm on Nov 29, 2002 (gmt 0)



Hi All,

I'm very impressed with the knowledge shown in this thread! I've read it at least once but I still have a question.

What if you want to ban certain countries using ReWriteCond? How do I do that?

Right now I'm using:

deny from .at
deny from .bg
etc...

The problem with that is that it even denies my error pages so I'd like to switch over to ReWriteCond instead so that I can give them a page with a reason why they can't reach my site.

Another question... does anyone know how I test to see if the country ban is working correctly? wannabrowser.com works great for referrers but has no provisions for testing from offshore or from a specific IP location.

Thanks for the help.

Cheers,
Dennis

upside

10:34 pm on Dec 3, 2002 (gmt 0)

10+ Year Member



okidata, I've been wondering something similar. My ban list takes the form of:

SetEnvIf Remote_Addr ^12\.40\.85\. getout
SetEnvIfNoCase User-Agent ^Microsoft.URL getout

<Limit GET POST>
order allow,deny
allow from all
deny from env=getout
</Limit>

This is working fine but how can I show a custom error message without implementing this all using mod_rewrite? Also how can I do a redirect if getout is set? Thanks.

This 243 message thread spans 25 pages: 243