wcerrors inquiry - West Wind Technologies Support

Web Connection

wcerrors inquiry

wcerrors inquiry
Richard Kaye
All

Jun 6, 2024 @ 09:42am

Hi Rick,

Periodically I go strolling through the wcerrors log and was curious about events such as this:

2024-06-05 14:00:37.510 - 283725983912_25 - Server Execution Error
An item with the same key has already been added.
---
   at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
   at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
   at Westwind.WebConnection.ComServerManager.GetServer(WebConnectionHandler handler)
   at Westwind.WebConnection.ComProcessor.ProcessRequest() - /login.usm

I've got 90 of these on one of my servers, and 78 on mthe other, going back to April. Is this something I should be digging into more deeply?

TIA

re: wcerrors inquiry
Rick Strahl
Richard Kaye

6X20X0O23

Jun 6, 2024 @ 03:24pm

It's an internal error message - there's nothing you can do about that. I suspect this is caused when servers are hanging or 'multi' loading.

Might be related to the issue of the forced reload we discussed some time ago perhaps?

Just for kicks I checked the logs in my 3 running apps and I see none of these errors. I do believe there were some code fixes around an issue some time ago, so perhaps you're not on latest?

+++ Rick ---

re: wcerrors inquiry
Richard Kaye
Rick Strahl

6X305J4EG

Jun 7, 2024 @ 02:34am

Thanks. Smells like some sort of race condition to me as well.

I'm using 7.40. I found that the experimental 7.41 DLL caused more problems than it solved and shared that info with you.

re: wcerrors inquiry
Rick Strahl
Richard Kaye

6X30DI6U3

Jun 7, 2024 @ 06:18am

That issue would have been fixed before the experimental stuff.

I'm running on 7.41 and haven't run into any issues but then I only run two server instances on all of my applications. Any issues that crop up are more likely to occur when you have many servers where the shutdown can take a long time. I think the only place this would be a problem is with code updates where the timing between shutdown and startup is critical.

+++ Rick ---

re: wcerrors inquiry
Richard Kaye
Rick Strahl

6X30JU0JP

Jun 7, 2024 @ 09:15am

FWIW I run on 2 servers with 8 instances on each. After running with the 7.41 DLL for a couple of weeks I rolled back to 7.40. I sent you a whole bunch of stuff at the time but never heard back so don't know if what I shared was helpful or garbage. I can go hunt that stuff down again if you want.

re: wcerrors inquiry
Rick Strahl
Richard Kaye

6X30PS5GH

Jun 7, 2024 @ 12:01pm

I didn't have time to look at that at the time. Did now but there's nothing really helpful in your description unfortunately.

I took another look at the code, and in all reality there was a bug that effectively didn't change anything 😄 I accidentally released the list of servers before calling the ForcedUnload functionality - so that would never have seen any instances and so done nothing.

I've added that back in and now the shutdown process is blazing fast - and actually kills any instances that may still be running. I've also parallized the COM unload operation so that should be much quicker with many instances - instead of running one after the other they now release side by side. I've also added a little more of a wait before killing so that COM servers have chance to complete the COM shutdown before going to the hard process kill operation.

public bool UnloadServers()
{
    Parallel.For(0, ServerCount, (x) =>
    {
        try
        {
            UnloadServer(x);
        }
        catch { }
    });

    // unload other servers running
    ForcedUnloadServers();

    Servers.Clear();

    return true;
}

So it now waits for a second (in addition to the processing time before actual forced release goes out).

Now when I run my server with 15 instances and load/unload, and upload new code it always works.

There was definitely some weirdness before, but I think that was always the case because it relied purely on the COM shutdown process which does not always work if a server is still running or hanging.

I've updated the WebConnectionExperimental.zip give that a shot.

re: wcerrors inquiry
Richard Kaye
Rick Strahl

6X30R5CPG

Jun 7, 2024 @ 12:39pm

Sorry that what I sent you was so incomprehensible. I will just say that based on what I saw at the time, there was some condition that resulted in dozens of EXEs being spawned.

My current update workflow with 7.40 and 2 servers is:

switch one server into file mode
wait a few seconds for the file instances to unload
upload the new EXE
swap
switch back to COM mode.

Rinse and repeat on the second server.

My assumption has been that the file instances drain naturally when I do this. I will say that I still occasionally see 9 EXEs in the process list after an update. If this is a dumb way to push an update, I'm wide open to any recommendations you have.

I'm a little concerned about the sledgehammer approach you mention as sometimes there is a relatively long running process which I would prefer to finish naturally before the EXE is terminated. How one can tell is, of course, the hard part.

I will check the new release next week. Not going to go rolling this out on a Friday night. 😃

Thanks again for taking the time to dig into this. As always, it's much appreciated.

re: wcerrors inquiry
Rick Strahl
Richard Kaye

6X30SRHA1

Jun 7, 2024 @ 01:25pm

If you're going to do that the way you can do this more cleanly is:

Upload (this doesn't affect anything)
Explicit set Hold Requests
Wait a few seconds
Swap Servers
Unset Hold Requests (most likely automatic because the swap unsets it)

The hold requests is done by the swap, but it's immediately before the unload so it may not have run out servers that might still be processing. This is usually the issue: Servers that are processing won't unload and they tend to hang up the unload process and they may die in weird ways.

The new version also does individual exception handlers, so if one fails the others are still processed instead of just exiting on the first fail. That likely was part of the problem.

IAC, give the new one a try it fixes a number of issues that were a problem before. I don't think the 7.41 version actually made things worse because it in effect didn't change anything. Maybe just coincidence that it failed.

I ran about 50 tests today where the server is under full load, then unload/reload servers and upload new versions. I didn't hang up once. It did before.

The only thing I saw hanging up twice was when I hot swapped the Web Connection module in place. Weird because that should recycle the App Pool but something locked the server when the server was uber busy.

Give the new DLL a shot and see what you get.

re: wcerrors inquiry
Richard Kaye
Rick Strahl

6X60VWGIC

Jun 10, 2024 @ 04:53am

Happy Monday, Rick.

I've got the new DLL on my stage environment and trying out a few things first. So far so good although it's not really an apples to apples comparison. My stage env only has one server with 4 instances.

Thanks for the update workflow. My concern with that is it puts up the maintenance message when requests are held. You'd be amazed at how that can trigger a user if they happen to see that, even if the maintenance window is only a few seconds. 😃 OTOH if that's the cleanest approach I may just have to put up with cranky customers.

BTW I noticed that experimental zip still has the wwpdf update we had discussed back in April. I don't know if you've made any changes to that since but there is a potential bug (imo) in the FINALLY (line 1265). The call to the Finalize method of the oPDF object should be bracketed inside IF this.lSuccessful. Or it needs a var test since there are other reasons that lSuccessful could be .f. Or it should be moved into the TRY block. Or... 😃

I'll update this thread again once I have the v8 DLL in place in my production environment.

re: wcerrors inquiry
Rick Strahl
Richard Kaye

6X60W6BWN

Jun 10, 2024 @ 05:00am

As I was going through this I actually noticed that the Hot Swap Process internally triggers the server hold, so it's always been there when you do the update - just a lot quicker because it basically just wraps the unload/copy operation.

If there are problems with updates due to race conditions (?) then manually holding requests first, waiting a couple of seconds and then updating will improve the likelyhood that everything is shut down and there are no requests coming while we're updating.

Ultimately doing hot swapping always carries the risk of requests getting thrown out or getting cut off (if they are too slow). If that's a worry then you have to do a more traditional A/B swapping mechanism.

+++ Rick ---

re: wcerrors inquiry
Richard Kaye
Rick Strahl

6X60WEPGT

Jun 10, 2024 @ 05:07am

Somebody's up early or perhaps in Oregon instead of Hawaii. 😃

Understood. Thanks again.

re: wcerrors inquiry
Richard Kaye
Rick Strahl

6X61BBC4X

Jun 10, 2024 @ 12:04pm

I've deployed the 8.0 DLL. I will mention that on one server I saw 41 COM instances right after copying in the new DLL. I switched it to file mode via the admin UI which cleared them out. I then switched back to COM mode and 9 instances spun up. The other server did not display this seemingly momentary oddity.

I have an EXE update which I will push out sometime tomorrow.

re: wcerrors inquiry
Rick Strahl
Richard Kaye

6X70J7AUW

Jun 11, 2024 @ 08:57am

Hmmm...

That's possible if you were updating and the existing instances were still running because the in place update may cause the app to shutdown without properly releasing instances. It's supposed to as the AppDomain Process shuts down, but I've found that in update sitatuations it often doesn't fire all the events.

The way the new updater works - unload should always unload all EXEs now unless for some reason the unload hangs up completely.

Process goes like this:

Unload called
Servers are collected
Parallel COM Unload using equiv of COM Release
When all have completed Kill Servers is called
Kill servers grabs the COM ProgId, finds the EXE name then kills all EXEs that match

The last step should always kill ALL instances of the COM (and any other) EXEs that are running.

re: wcerrors inquiry
Richard Kaye
Rick Strahl

6X70JBXJ7

Jun 11, 2024 @ 09:01am

First updates rolled out today. I wish I had a better way to illustrate what I observed other than this lengthy word salad.

First update I did I followed my old process (switch to file mode, wait for drain, switch to COM). On the first server I ended up with 10 EXEs running. I used the Load Com Servers button to get it back down to the expected number. On the second server when I switched back to COM mode, it only loaded up 3 instances instead of 8. (In my experience, when the first COM request comes in WWC starts up the number as set in the server count.) I reloaded the COM instances to get it to the proper count.

Had another update to rollout shortly after so I tried the suggested workflow (use Upload button, return to admin, put requests on hold, wait a few seconds and then swap). After clicking the Update Server button on the first server I clicked the Return to admin link and got an error. (Sorry I don't have that documented better.) When I checked the EXE count it was over 40. Used the browser back button to get back to the admin page and tried switching to COM mode; it hung. I used the back button again and all the instances drained. I was then able to switch back to COM mode and got it settled down.

Maybe the reason I'm having so much fun is "hardware" related? FWIW this is the VM spec:

Anyway, I'm going to see if I can document this better when I push my next updates.

re: wcerrors inquiry
Rick Strahl
Richard Kaye

6X70JG4DD

Jun 11, 2024 @ 09:04am

One other thing that perhaps can be done with this is to limit the period of time waiting for the COM release to fire. Now that the release calls are parallel maybe I can see about allowing say a max of 5 seconds to wait for shutdown, and if that fails, just go on with the hard shutdown.

If you still see issues with shutdown the problem most likely is caused by

The fixes from the last update should fix a number of things that were definitely wrong before:

A failed server shut down jumped out of the shutdown sequence which would result in the rest of the servers not getting shut down. Duh! Now if a server fails we log it, and move on to the next.
Failed to call the Kill Process functionality. Duh!

FWIW, this was the old behavior that has been that way for a long time so this has always been a problem. The bail out after the first fail probably was the worst issue responsible for many left over servers.

All that said, hard shut downs are hard on the COM system. In theory COM servers should shut down when a COM release (Marshal.Release()) is requested. If that's not happening due to hanging references inside of the server instances, each time they are hard killed they leave behind some COM debris taken from a special pool of STA memory that is used to manage the STA instances (a system pool that COM controls). If that pool gets depleted things get weird. Under normal operation that's never a problem, but if it happens often enough without the app getting recycled there can be problems...

That might be why I can't duplicate the behaviors you've described (even before the changes) because I typically run relatively small servers and run them for short run high load tests. Because servers are not heavy and do a lot they are unlikely to have internal memory leaks or unreleased references that might cause the server to not shut down, and the servers never run long enough to cause a problem. As a general rule it's a good idea to have your Application Pools configured to recycle at least once a day (during off-peak hours preferrably) to forestall any issues.

re: wcerrors inquiry
Richard Kaye
Rick Strahl

6X70PM1X4

Jun 11, 2024 @ 11:57am

Thanks again. I don't know if there are any other clues to be gathered from Windows logs, etc but I'll just continue to pay closer attention each time I push an update.

As to recycling app pools, I've got users from the US to Europe to Australia so it's a little tough to find time outside of everybody's work day. 😃

re: wcerrors inquiry
Richard Kaye
Rick Strahl

6X70PPAQG

Jun 11, 2024 @ 11:59am

FWIW I just checked the app pool recycle settings and it is set to recycle every 1740 minutes. (Not a clue where that number comes from...)