Forum Moderators: open

what do they want?

         

lucy24

9:23 pm on Jan 29, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Granted, not so much an ID question but--my favorite subject--a robot psychology question:

For many months now, I've noticed a particular robotic behavior: a cluster of requests for some random page (html only), in the range of 5-20 in rapid succession, all from different IP and UA. I tend to doubt any kind of DDoS exploit, as there would be more of them, closer together, likely resulting in a 429* code. Best guess: infected human machines, as most come from broadband IP ranges all over the world--with a slightly higher proportion of countries that I don't ordinarily see much of--with the occasional colo/server thrown into the mix. Since there is no unifying feature, no distinctive headers, all I can do is temporarily block the IP (generally /24 for 3 months), with the happy result that at least half of any given cluster gets a 403.

Question: What the ### do they want? Why isn't it enough to request a page just once?

* The 429 response only started showing up in logs a couple of years ago, probably as a byproduct of one of the host's periodic server changes. I've never asked, but I think they are true 429, “too many requests”, as would happen if you bombarded a lot of different sites living on the same server.

Kendo

1:40 am on Jan 30, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is the request query string the same in each case, or are they using different variables?

lucy24

2:57 am on Jan 30, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



n/a, they're just html pages, no parameters. So it's the identical request every time, with the twist that they often start by requesting
/directory/subdir
and then upon receiving the standard redirect, immediately follow with
/directory/subdir/
(Does this mean that extensionless URLs are now so common in-general that they take this as the default, even though they have no reason to think this is the correct form?)

tangor

7:00 am on Jan 30, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



By going expressionless these new ecologically ethical bots are conserving scarce electron resources.

What do they want? Whatever they can get!

lucy24

5:47 pm on Jan 30, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Call me old-fashioned, but when I see an extensionless URL my first impulse is to tell them to go back in the server and put some clothes on.

Kendo

11:42 pm on Jan 30, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



my favorite subject--a robot psychology question

Nothing ground breaking, but there are articles on Robopsychology

Kendo

11:45 pm on Jan 31, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



robopsychology

Some things that a psycho sleuth might ask:

1. what is your favorite color?
2. what is your favorite food?
3. what is your favorite band?
4. what is your favorite pastime?
5. how was your childhood?
6. do you come from a small or large family?
7. did you have both brothers and sisters?
8. what was your highest level of education?
9. why did you leave school?
10. what was your ambition?
11. what is your political preference?
12. what is your sexual preference?
13. what is your favorite sex position?
14. what is your star sign and time of birth?

lucy24

12:11 am on Feb 1, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sadly, I know perfectly well that when I speak of robot psychology, I'm really speaking of botrunners’ motivation, whether pecuniary or other.

Kendo

10:04 pm on Feb 2, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I doubt if such bots would have a personality.

SumGuy

2:09 am on Feb 3, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



So you're seeing multiple requests for the same page, from different IP's with different UA's.

Do these hits have any referers? What about the accept-language string - anything common about that? For the UA's that are chrome, are you seeing chrome version numbers that are not current? Current right now is 144. I treat anything under 142 as a bot. I see a lot of really old chrome versions, but the really hard ones to deal with right now are 143 and 144.

Bots are using residential proxy's / VPN's. Or at least one bot is. I see these hits daily, my detection rate is pretty good, they get my "I think you're a bot" page.

My situation is that the proxified bots are way more interested in pdf files than html files and they tend to have a null referer. I do some testing on the accept-language, specifically looking for certain ones that contain zh or zh-cn (yes that's another common fingerprint for what I'm seeing).

Try throwing some of your IP's at spur.us and see if they come back as being proxy's.

lucy24

3:06 am on Feb 3, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do these hits have any referers? What about the accept-language string - anything common about that? For the UA's that are chrome, are you seeing chrome version numbers that are not current? Current right now is 144. I treat anything under 142 as a bot. I see a lot of really old chrome versions, but the really hard ones to deal with right now are 143 and 144.
Generally no referer, but sometimes google (I assume for verisimilitude, as I've always seen this in some robots).

Language is often
Accept-Language: en-US,en;q=0.9
which unfortunately isn't robot-diagnostic. Same goes for things like headers in the Sec-Ch group: they all tend to be the same ... but so are plenty of legitimate humans.

I tend to be fairly generous with browser version, since there are legitimate reasons for a human not to upgrade. (No, I do not allow, say, MSIE-anything, or two-digit Firefox.) But lately robots have been fond of claiming to be very recent Chrome. In fact, if they're infected human machines they probably are very recent Chrome. Makes it easier for the botrunner, since they don't have to run out and keep buying a new disguise.

Come to think of it, that's how many (biological) viruses work, isn’t it. Hide inside something that has every right to be there, and you're home free.

SumGuy

3:56 am on Feb 3, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



They're not infected machines. Lots of proxy's offer credits or even cash for using your residential IP. Others just use (ie - resell) your IP's when you join their "anonymity" network. The bots using these proxy's are generating their own UA's. No need to infect a residential system when they willingly join. I've seen some IP's that are part of 6, 8 different proxy networks.

Test a few of your IP's on spur, or post them here and I'll tell you what proxy networks they're part of.

lucy24

7:09 am on Feb 3, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They're not infected machines. Lots of proxy's offer credits or even cash for using your residential IP.
File under: Today I Learned ... and very much wish I hadn't. There are hundreds of them, probably thousands by now. f'rinstance, in the last few days of January I had from 73 alone (officially the whole /8 is comcast):

73.84.42.12 (twice, which is unusual)
73.90.127.251
73.118.94.254
73.128.54.99
73.229.212.246
73.237.0.109
73.238.12.124

... except, uh, isn't there a Forums rule about posting the full /32 ?

:: looking around apprehensively ::

In any case, the original question still stands: what do the robots gain by scooping up multiple copies of the same html? Or is it that they expect many requests to be blocked, so they deploy the shotgun method?

SumGuy

11:52 pm on Feb 3, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



These are all "callback" proxies, one of them is also a geo-mismatch proxy - user is in a different country, in this case Indonesia and I find that to be a common user country when I see it. I router-block all Indonesian IP's.

I've obfuscated the IP's, you can do the same in your post if you like.

As a test I went back to spur with these IP's a second time, changed one of the octets (usually the last) by a single number and queried it, they all came back as negative.

Here are the proxy networks associated with each of your IP's:

73.84.42.x 1, 2, 3, 4, 5
73.90.127.x 1, 3, 4, 5, 6
73.118.94.x 3, 7, 1, 4, 8, 5, 2, 9
73.128.54.x 4, 10, 1, 11, 3
73.229.212.x 3, 1, 4, 2, 5, 8 (geo-mismatch Indonesia)
73.237.0.x 3, 4, 1
73.238.12.x 4, 3, 1

1 IPIDEA_PROXY
2 LUMINATI_PROXY
3 NETNUT_PROXY
4 OXYLABS_PROXY
5 PLAINPROXIES_PROXY
6 PROXYRACK_PROXY
7 YILU_PROXY
8 NIMBLEWAY_PROXY
9 INFATICA_PROXY
10 RAYOBYTE_PROXY
11 EARNFM_PROXY

lucy24

7:37 am on Feb 4, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Holy ### that's a lot. (A lot of what, though? Greedy humans who are also too stupid and/or too amoral to care what their IP is being used for?) And where did they all spring up from? This particular pattern* only showed up about 4-6 months ago.

changed one of the octets (usually the last) by a single number and queried it, they all came back as negative.
Meaning that in each case it's a specific /32 that's misbehaving, while the other 255 tenants of the /24 will be perfectly legitimate and law-abiding?


* To say nothing of two other themes that showed up around the same time, also vexing but less common: (1) robots that request all supporting files except images--sometimes even the favicon--though they don't act on javascript, and (2) robots that only request the html, but are then immediately followed by something in the 34 range (blocked) requesting all supporting files including images, for all the good it does them.

SumGuy

2:18 pm on Feb 4, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



Yes, I wanted to test spur to see if it was being overly broad in classifying the IP in question as a proxy. As in perhaps grouping them into /31 or /30 or even /24 bins. But no, it's very specific, ie if a.b.c.45 is a proxy then a.b.c.46 was not. This would be expected for residential IP's where customers are practically always assigned a single IP.

And thanks to today's residential IP assignment strategy, these "dynamic" assigned IP's rarely change and can essentially be treated as static. And I do see a fair amount of IP's being spaceX, but many more being comcast, verizon, shaw, bell, BT (UK) and DT (German), Italian, etc.

The monetization of these proxy schemes I suspect depends heavily on selling access to the network to third parties (perhaps several layers of third parties) where the end-use could be corporate cloaking of employee internet activity (but there are "cloud security" providers for that) but probably more likely these are black hats and scrapers (I've seen these things try POST and HEAD). I suspect various actors in China rely heavily on them to access western web-sites.

Likely a lot of content scraping by undocumented / unknown AI systems for training and what-not. I keep asking if anyone's ever seen Xai / grok bots or Deepseek show up as user-agents...

lucy24

5:05 pm on Feb 4, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks. Now that I know this, I can block by exact /32--i.e. the very thing that we normally tell people is a waste of time--and not worry about the offender's law-abiding neighbor being undeservedly blocked. Except, of course, where the IP turns out to be either a server farm or a country I have no interest in. Those can happily remain at /22 on up to, oh, /14 or so. (Anything bigger and it's safe to say they are already blocked.)

SumGuy

12:22 am on Feb 5, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



My opinion - don't block any of these if they're coming from residential IP space. It's a waste of time - you'll never get a second hit from any of these IP's. These proxy outfits boast of having 10's of millions of IP's in their network, some will say over 100 mil.

Server farms or data centers, yes, I block entire ASN's and go fishing into their peers for good measure.

lucy24

2:17 am on Feb 5, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



you'll never get a second hit from any of these IP's
You would think so, but when I re-check after three months, up to half of the offenders have visited again, sometimes repeatedly. I mean, of course, the ones from human broadband ranges; the colos/server farms are permanently blocked and I never think of them again.

SumGuy

4:28 am on Feb 5, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



" but when I re-check after three months, up to half of the offenders have visited again, sometimes repeatedly"

Maybe it's a question of volume, and I don't see it because I don't get a lot of traffic.

The ones that have made multiple hits over 3 months, do they have the same UA? Or maybe the same except for a chrome version that has inched up over that time?

For me, these proxies are hitting my PDF files. It's easy enough to have a general block for UA's containing chrome versions 139 and under, but I'm also now specifically blocking requests for pdf files where there is no referer AND the UA includes Chrome/144. Naturally I'm teasing out googlebot and bingbot so they don't get hit with this strategy.

lucy24

6:28 am on Feb 5, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



do they have the same UA? Or maybe the same except for a chrome version that has inched up over that time?
Meh. Never bothered to check. But Chrome does seem to be the current favorite.

where there is no referer AND the UA includes Chrome/144
Yup. I've blocked some exceedingly recent Chrome versions if they have neither a referer nor a piwik cookie. And I must say, Chrome isn't helping with their current gimmick of only giving version numbers in the form 144.0.0.0, which just screams out fake. (Firefox seems to have picked up the habit too, at least starting from “141.0” which might be 141.anything.)

SumGuy

12:19 am on Feb 6, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



Here's today's example, checking the over-night logs this morning. 7 hits from widely different IP's, all using this exact UA:

Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1

From these networks:

NTT DOCOMO BUSINESS (ipoe.ocn.ne.jp)
Bravoport Ltd (Ukraine) AS44827 (this AS is going into router block-list)
dyn.orange.be
business.telecomitalia.it
flora-cable-modem.wabash.net
Cablevision (dyn.optonline.net)
fixed.kpn.net

They are all associated with proxy networks. Most of them only 1 or 2 networks, the common network being Oxylabs, but a couple of them were not. That's why I don't think it's the network or the proxy-IP owner (or their proxy device) that determins the UA - it's the client using it. My rule for detecting these as bots was looking for "Version/13.0.3 Mobile/15E148" in the UA.

I have a single hit using that UA in 2022, 5 in 2023, 24 in the next few years, but the vast majority have come since last November and are somewhat common this month. That is clearly a bogus UA regardless if you test for the entire string or just the Version / Mobile part I mention above.

lucy24

1:49 am on Feb 6, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



iPhone OS 13_2_3
Thirteen?! Aren't they somewhere in the twenties by now?

:: quick run to logs ::

Aha. Both iOS and iPhone OS jumped straight from 18 to 26.* There are plenty of humans at 18; down into the lower teens becomes increasingly rare. So I guess it depends on how many of your legitimate human users are stuck with older devices.

* My laptop, which is newer than the desktop, recently went to 26, as did my new iPad. The desktop remains at 13.

SumGuy

2:32 pm on Feb 7, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



Check out this story (dated just a few days ago):

[cybersecurefox.com...]

Google’s Threat Intelligence Group (GTIG), working with multiple industry partners, has disrupted IPIDEA, one of the world’s largest residential proxy services. The operation disabled key command-and-control domains, disrupted traffic routing through infected devices, and exposed the malicious SDKs used to silently conscript user devices into a vast proxy botnet.

lucy24

5:56 pm on Feb 7, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yay! ! !

Now let's see if my logs get at all smaller over the coming weeks. (Yes, that's how bad it is. On my small site, Apache access logs normally top out around 1MB. Lately it's been 2 and up, with a to-date record of
:: shuffling papers ::
6.1 MB near the end of January.)

tangor

9:20 pm on Feb 7, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I, too, have larger log files but since the vast majority are 403s my BANDWIDTH has remained fairly constant. Filtering out 403s and looking back (Jan 2024, Jan 2025, Jan 2026) shows a "growth" of about 5% for 2024-2025 and 11% for 2026. Historically the 5% is near average for this 30 year old site. Obviously I have not yet caught all the bad actors. (Sigh)

blend27

4:35 pm on Feb 14, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@SamGuy
For the UA's that are chrome, are you seeing chrome version numbers that are not current? Current right now is 144. I treat anything under 142 as a bot. I see a lot of really old chrome versions, but the really hard ones to deal with right now are 143 and 144.


Normal Chrome sends "priority" header on Windows 10 machines, check it out : [chromestatus.com...] ....

Just my rusty 2c observations....

lucy24

6:46 pm on Feb 14, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Normal Chrome sends "priority" header

:: business with logged headers ::

Also other browsers, including non-chrome webkit (Safari, iOS) and Firefox. Proportion of requests including a Priority: header ranges from less than 1/3 (on a normal day) to more like 2/3 (on a bloated, bot-heavy day), suggesting there may be something diagnostic in that header.

Content of the header varies among
Priority: (u=[0-7], )?i
“u=\d” by itself (without i) exists, but is exceedingly rare; I see it only in requests from, trala, bad_range (i.e. blocked anyway). u=3 is supposed to be the default, but is very rare, as are the other odd numbers.

Here is Mozilla's version [developer.mozilla.org] which explains more about what it's supposed to mean.

Note: Servers are expected to ignore directives on this header that they do not understand.
Seems eminently reasonable--not just here but for all headers--though I reserve the right to simply block nonsense headers. In fact I have a bot_header environmental variable for this very reason.

blend27

1:42 pm on Feb 15, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



--RE: from MDN: If the header is not specified in the request, a default priority is assumed.....

Assumption is a mother of fun-403s, But others!, Bot Others, Also other browsers... hey, they do, they do, don't they?

What I am typing is the wild bots pretending to be 1[1-4][2-9] Chrome on Win10 bots currently are hug-able by the header!

Oh, and we love us some Tencent at AM hours.

Look much more.

added: btw full /32 thingy was it!

SumGuy

3:38 pm on Feb 17, 2026 (gmt 0)

5+ Year Member Top Contributors Of The Month



Regarding the request header priority flag, I've added it to my logging config a week ago, the only thing I'm seeing is "u=0,i" and I don't see it much.

I'm seeing that with Mac OSX user agents with chrome and firefox and I guess (when you don't see those) then it's safari.

I'm also seeing it with Firefox (147) on Win-10 but otherwise not with Win-10 + chrome.

But more to the point, the priority flag is of zero use in identifying a proxy. I've had a major proxy hit yesterday, I blocked about 25% of their requests but the Luminati proxy is evading detection. It's the Chrome 143 / 144 / 145 on Windoze thats getting past me. The hits using Mac UA's I can handle.

With the setup I have I don't see the raw http requests, so I don't know what other request headers I could be looking for. I have to add each header to my logging list, I can choose from a standard list but if it's not in the list then I have to spell it out. Is anyone looking at this in detail? Is there an X-forwarded-for header that's in use?
This 81 message thread spans 3 pages: 81