Forum Moderators: open

Microsoft, AS8075. Google, AS396982.

         

shawnb61

7:00 pm on Apr 8, 2026 (gmt 0)

Top Contributors Of The Month



I've been getting absolutely slammed by Microsoft & Google over the last few weeks. A mix of crawls & hack attempts. It even looks like they're trying to get past captchas with repeated attempts. Lots of the kiddie-scripter stuff, e.g., looking for wp files, especially unprotected utilities. Looking for variants of backup folders & files. Very high volume.

To my knowledge, these are not end-user ISPs.

Some of the Google AS396982 traffic self-identifies in the useragent as Palo Alto Networks. This is the useragent:
Hello from Palo Alto Networks, find out more about our scans in https://docs-cortex.paloaltonetworks.com/r/1/Cortex-Xpanse/Scanning-activity

Looking at the MS traffic, AS8075, it is 100%:
- AI crawlers, e.g., OpenAI
- hack attempts
- bingbot

Looking at my Google Cloud traffic, AS396982, it is 100%:
- really poorly written, buggy bots, e.g., ShapBot, SleepBot
- hack attempts
- I have one active user accessing our site from within GoogleCloud; must be using their work computer

I have read that AS396982 is sometimes used by googlebot, though I haven't seen it.

In all seriousness... Why should we let ANYBODY from AS8075 or AS396982 access our sites, other than googlebot or bingbot? Looks like a pretty good way to drop a lot of AI bots & hackers...

[edited by: not2easy at 7:16 pm (utc) on Apr 8, 2026]
[edit reason] disabled UA link [/edit]

SumGuy

1:26 am on Apr 9, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



PaloAlto networks, as well as Cato Networks, Zscaler and Netscope, are so-called "zero-trust" cloud security providers, I think mainly or exclusively for corporate use. You can probably add cloudflare and fortinet to that list. From a webmasters POV, you'll see hits from their IP space and you don't really know where or who the end-users are, or if they are bots or people. I've been in the habbit lately of blocking their IP space because I've seen rogue / malicious hits from them, more than I've seen legit use.

But hits from general IP space of Google and MSFT like dozens of requests for strange files associated with wordpress hacks have been happening for a long time, I block entire /16 IP blocks when I see them (the blocks are permanent). Outside of google bot, bing bot, all their IP space is of no consequence for me, maybe most website operators. OpenAI and Anthropic's Claudbot use Microsoft IP's, I forget who's IP's Perplexity uses. Of course Apple's bot is using Apple's IP addresses - I never see abuse from Apple's IP range - but then again they don't really rent their IP space do they? Grokbot is a total mystery.

shawnb61

3:55 pm on Apr 9, 2026 (gmt 0)

Top Contributors Of The Month



Thanks. Very helpful.

I don't think I've seen this volume of hackish behavior from Google & Microsoft until recently. I generally ignore the noise.

I think I'm taking a "zero-datacenter-trust" stance going forward. If they don't serve up actual users, block the whole ASN. Why sit around & wait for them to misbehave further before blocking them? No users there anyway...

Then the real problem remains where the traffic is mixed - where I have valid, registered, active users - and bots. Contabo, Datacamp. Oh yeah, Brazil...

All I wanna do is run our little guitar FX website. Sheesh...

SumGuy

12:36 am on Apr 10, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



When it comes to the 40-odd million IP's (IPv4 that is, my server doesn't operate on IPv6) when it comes to the 40 million IP's that google or microsoft have, you just can't block them entirely or you'll be blocking bingbot, googlebot, chatGPT/openAI, maybe claudebot and perplexity.

Yes there's contabo and hetzner and datacamp and digital ocean and dozens of others that are safe to block.

Brazil - so far the only country that I've gone out to specifically identify and block an entire country. I do all IP blocking in my router, so when I block brazil, that means I've closed the door on spam, port scanning, http, everything that's coming from there trying to knock on my door. The number of ASN's assigned to Brazil is off the charts. Why do they have so many internet network entities?

shawnb61

3:51 am on Apr 10, 2026 (gmt 0)

Top Contributors Of The Month



Not impossible, though. Two ways you can block a whole ASN...

The first is by identifying all the CIDRs. AS8075 has ~967 ipv4 CIDRs & AS396982 has ~515 ipv4 CIDRs. So, yeah, that requires hundreds of lines in your .htaccess. In some cases, these can be reduced - sometimes by a lot - by using a cidr list cleaning script that removes overlaps & properly handles consecutive ranges. Some of these ASN CIDR lists are grossly inefficient; it depends on the source.

The second method is by the ASN itself. If your host allows geo-blocking by country, I'd double-check all the available env variables. They will/can often expose the ASN as well; if they don't, ask them. My host does.

If you have access to the ASN in an env variable, it only takes one line in your htaccess to block the whole ASN.

Either way, you can block the ASN but allow googlebot/bingbot in .htaccess. I just implemented that today, & will monitor going forward.

# Your normal list of scumbags...
BrowserMatchNoCase scumbag1 bad_bot
BrowserMatchNoCase scumbag2 bad_bot
SetEnvIf MM_ASN "^210644$" bad_bot
SetEnvIf MM_ASN "^211590$" bad_bot
# Updated 4/9/26 - Block ms & goo, other than bingbot & googlebot
BrowserMatch Googlebot good_goo_bot
SetEnvIf MM_ASN "^396982$" bad_goo_asn
BrowserMatch bingbot good_ms_bot
SetEnvIf MM_ASN "^8075$" bad_ms_asn
<RequireAll>
Require all granted
Require not env bad_bot
<RequireAny>
Require env good_goo_bot
<RequireAll>
Require all granted
Require not env bad_goo_asn
</RequireAll>
</RequireAny>
<RequireAny>
Require env good_ms_bot
<RequireAll>
Require all granted
Require not env bad_ms_asn
</RequireAll>
</RequireAny>
</RequireAll>

(Note: I had tried doing this in a much simpler fashion via expressions, but it didn't work in all my environments, wamp & unix. The above worked in all my environments...)

SumGuy

1:49 pm on Apr 10, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



I seem to recall, but I could be wrong, that googlebot or bingbot or both don't always identify themselves in the user-agent. Sometimes they use a generic browser UA.

shawnb61

2:59 pm on Apr 10, 2026 (gmt 0)

Top Contributors Of The Month



No errors reported in GSC, so I'm running with it.

shawnb61

2:09 am on May 14, 2026 (gmt 0)

Top Contributors Of The Month



Note I had to add a few new things to this logic, so here's the latest & greatest:
- Exceptions for GoogleImageProxy, Google-InspectionTool, DuckAssistBot
- Added AS15169

This really has stopped a LOT of unidentified crawlers, AI bots & kiddie scripters. No impacts to Google, Bing or DuckDuckGo.


# Substitute your normal list of scumbags, ID'd as bad_bot here...
BrowserMatchNoCase scumbag1 bad_bot
BrowserMatchNoCase scumbag2 bad_bot
SetEnvIf MM_ASN "^210644$" bad_bot
SetEnvIf MM_ASN "^211590$" bad_bot
# Updated 4/23/26 - Block ms & goo, other than bingbot & googlebot & duckduck
BrowserMatch Googlebot good_goo_bot
BrowserMatch GoogleImageProxy good_goo_bot
BrowserMatch Google-InspectionTool good_goo_bot
SetEnvIf MM_ASN "^396982$" bad_goo_asn
SetEnvIf MM_ASN "^15169$" bad_goo_asn
BrowserMatch bingbot good_ms_bot
BrowserMatch DuckAssistBot good_ms_bot
SetEnvIf MM_ASN "^8075$" bad_ms_asn
<RequireAll>
Require all granted
Require not env bad_bot
<RequireAny>
Require env good_goo_bot
<RequireAll>
Require all granted
Require not env bad_goo_asn
</RequireAll>
</RequireAny>
<RequireAny>
Require env good_ms_bot
<RequireAll>
Require all granted
Require not env bad_ms_asn
</RequireAll>
</RequireAny>
</RequireAll>