Forum Moderators: coopster & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

         

toolman

3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

WolfHawk

6:29 am on Mar 7, 2003 (gmt 0)



Hello everyone...

I'm new to both this forum and perl & cgi scripts. I've been doing a lot of late night studing to learn it all quickly but it's just not possible to learn what I need to know in the short amount of time I have.

To make a long story short and to the point, I'm setting up my first web site and while searching for information about keeping nasty bots away from my site, I found this forum.

The information and knowledge I've come across in this thread is spectacular however, when I added the rewrite script I found here to my current .htaccess file I discovered that while it does a great job at keeping nasty bots away and provides me with an easy way to ban things not only by browser and name, but by IP address also, for some unknown reason every time I click the submit button on any of my online submition forms I get a 403 error message.

This is the only problem that adding rewrite rules appears to be causing. I went through a process of elimination by removing each rewrite rule one at a time until I only had the

RewriteEngine On
RewriteRule ^.* - [F,L]

portion left and I still kept getting a 403 error message every time I clicked on the submission button on any of my forms. Once I removed the remaining section of the rewrite script, my forms began functioning again.

Can any one help me out with this?

For your information when I first went into my .htaccess file I found the following content already in it which I'm aware was already brought up in this thread but I couldn't find any responce to the previous similar inquiry from veenerz...

# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName www.mydomainname.com
AuthUserFile /the /path/to/a/file.here
AuthGroupFile /and/the/path/to/another/file.here

The rewrite script was easy for me configure and use but this "mod_access" stuff with order deny, allow etc... I just can't understand or figure out.

Any and all assistance will be greatly appreciated.

Wolf

DerekT

5:24 am on Mar 8, 2003 (gmt 0)

10+ Year Member



If anyone would like to prevent "Web Copiers" or "Offline Browsers" without the need to update a .haccess file visit this thread for a great PHP solution.

[webmasterworld.com...]

It monitors page requests and if a user requests too many within a set timeframe, they are given a custom 503 message.

Initially I used a long .htaccess file to prevent these programs however, it didnt always work and I always had to add USER_AGENTS to the file when new programs were released. This also doesnt protect against these programs when people change their USER_AGENT to IE or Netscape.

Once I placed this script on my site, I caught 8 different people (unique) over a 24 hour period trying to leech my site. They all had normal browser USER_AGENT settings so a .htaccess wouldnt help. Since my site is all PHP and mySQL generated, this copying really hit my server hard. Some were requesting up to 17 pages a second!

Now that they are caught in realtime, my server is performing much better and my regular visitors are very happy.

If you visit the thread notice a few changes I added to ensure Googlebot is exempted from the limits and can request as many pages as it wishes.

StopSpam

5:31 pm on Mar 12, 2003 (gmt 0)

10+ Year Member



I had recently seen a post from some one who had written a perl code that could block a robot based on the amount or temps to connect to the server. Only problem with it . it blocked google as well for indexing etc...

i no longer can find tha post. Can anyone sent me the url of the post on this forum? i like to check the code again.

I have spent a whole day on trying to find it with site search but i cant find it ;-(

i saw it few days back ...

Oaf357

6:01 pm on Mar 12, 2003 (gmt 0)

10+ Year Member



Can someone display the "latest" version of their .htaccess file, please.

jatar_k

6:32 pm on Mar 12, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



was it this one StopSpam?
Blocking badly behaved runaway WebCrawlers [webmasterworld.com]

it is php though, I am not sure which one you mean.

and Welcome to WebmasterWorld WolfHawk. :)

StopSpam

6:57 pm on Mar 12, 2003 (gmt 0)

10+ Year Member



jatar_k thank you very mush....
this is indeed the post i were looking for i think
I had allready given up on finding it again ...

so really thx.
from now on i flag posts that i find intressting
so ican easy find them back

;-)

DerekT

7:35 pm on Mar 12, 2003 (gmt 0)

10+ Year Member



StopSpam

If you would have looked at my post you would have seen the reference to the code.

StopSpam

7:56 pm on Mar 12, 2003 (gmt 0)

10+ Year Member



You are right sorry credits for you as well you foudn it first

;-) i had read your message but i gues my mind were somewere els at the moment i wrote the reply and i had forgot you sorry

thx

i tryto make a code that blocks a bot on ip for multi conection to a site. but i dont wnat to use a saparated data file that works as counters ... i want to keep it in the code ...

DerekT

8:09 pm on Mar 12, 2003 (gmt 0)

10+ Year Member



StopSpam,

You could use a single flat text file and load it into an array, but I havent tried that. That would also probabily use more CPU threads than writing seperate files per IP. Another possibility would be a mySQL database but you would have even more overhead with reads/writes to the database under heavy load.

I have been hit really hard with these programs since my site hosts over 20,000 images and movies. I have changed the line in the code to have 4,096 IP MD5 hashes vice the 256 the script has by default and have had no preformance problems.

I even customized the 305 page that is displayed. The page explains why they are viwing the message, (to prevent leeching, slow performance for regular visitors, etc) and even has a javascript count down time that starts at 60 and when it reaches 0, forwards them to the page/image/movie they origionaly requested.

StopSpam

8:33 pm on Mar 12, 2003 (gmt 0)

10+ Year Member



wouw i am impressed how it works for you...

what i want is to make peach of perl code to stop brute force atacks on a passsword protected directory.
lets say after 10 wrong atemts script will take again against the user forwared different page or something like that

This 243 message thread spans 25 pages: 243