checking my logs on one of my customers and have noticed an increase in activity from unusual addresses. Looked up the IP adrresses and they appear to be bingbot.
The strange thing is that they are using URL's including querystrings which are only generated from behind a logon ie the user is logged in.
Fortunately every call into the system has a check for a logged in user before actioning.
My big issue here is that these URL's can only have come from extracting links from a users browser - are they relly doing this now rather than just crawling??
I suppose what we now need to watch for are any calls that maybe don't check for a valid user before actioning as Microsoft could be triggering all sorts of actions !!
Anyone seen this happening and any guidance on blocking it ?
I don't think that's what's happening. Altough Microsoft surely has access to browser history and navigations from Edge and IE, I doubt they share that information - people would be pretty incised about that and that's easy to catch as it has to go out over a network connection. I think we'd know about that and I seriously doubt that's where those links are generated from.
My guess is there's a leak in the application somewhere with a back link into the private links, or perhaps somebody shared links somewhere public.
+++ Rick ---
That was my initial thought also.
However when I look at the Session records and request log the IP address shows Microsoft Bingbot as ownership,
Also looking at the Browser in the Request I am seeing this :
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
My main concern is these aren't random hits to any page they are fully formed requests that are only possible by clicking links in pages that would have been returned only to a signed in user.
I just can't get my head round how this is possible without it being activity being represented from earlier user activity . I don't think its transferring from the client pc.
The wierd thing if it was from an IP address belonging to a random user other than the couple of Ip addresses registered to Microsoft I may accept foul play.
Looking yesterday the particualr client was closed however hundreds of requests were fired at the server all from Microsoft Ip Addresses.
Still confused !
The first thing to do is add a
robots.txt that excludes any virtuals you don't want included in robot searches.
If it's a legitimate Microsoft bot it'll respect that as will other major search engines.
+++ Rick ---
It sounds to me like Rick said, someone posted an the url of the page to some public space. Now Bing is trying to learn all it can from the address. The suggestion of directing robots.txt is a very good next step.
Michael Hogan (Ideate Hosting)
In addition to what Rick suggested - (which you should DEFINITELY do first) - I have found that many browser helper plug-ins share user accessed pages with their respective search engines. I got bitten by that behavior years ago, when one of my customers complained that their 'private' documents were accessible via search engine.
I replaced all direct links to files with a webconnect process that check's credentials before sending files.