Hi Rick,
On occasion I run into situations where my app ends up with more file instances loaded than COM instances; presumably due to some app error which doesn't properly terminate the request. The main pain from this is when using the swap button for updates from the COM page, the EXE cannot be successfully swapped due to the orphaned instance(s). Leaving aside how that happens, when last we spoke you had mentioned some possible changes to the admin COM page that will more gracefully handle orphaned file instances and I was wondering if that was still in your plans.
TIA
Orphaned instances can occur if IIS hard crashes or fails to fire the termination events that the handler keys off to shut down instances. Should be rare but it can happen and it does unfortunately.
The easiest way to do this now is:
- Switch to File Mode
- Unload all Servers
- Switch back to COM
File mode unload physically kills the EXE instances. I don't remember what we talked about but I think what I must have meant 😄 was to allow explicitly killing the EXEs in a separate way. I think that logic is already built into the COM shutdown routine, but I think it's not working quite correctly. An explicit Kill operation would have to be used most likely.
+++ Rick ---
That's exactly what you meant. I just expressed it poorly. 😃
My current theory is this happens when XFRX fails at creating a PDF. This is based on a 0 byte file with a hard lock being left behind. In order to get rid of that file I have to flip over to file mode and kill the instance. It would be great if the COM UI and swap server functionality could include this explicit kill handling. This raises another question; why does the COM UI allow terminating an individual instance but the file UI does not?
It also makes me wonder about how Martina handles exceptions and perhaps there's a way to throw those exceptions back to WWC in a way that allows it to end a request cleanly.
This raises another question; why does the COM UI allow terminating an individual instance but the file UI does not?
Because we know the individual Process IDs in COM, we don't in file mode. COM captures the ProcessId as part of the object creation and stores it. File mode just create processes and we have no idea of which instance is what.
I have to look at the COM shutdown code again but if I remember correctly, the unload functionality first tries to release the COM instance and if that fails then uses a hard kill operation on the process id.
The problem is that the hung instances are usually not the ones that are actively running but are orphaned. So the only way to kill those is kill all processes that share the same EXE name. File mode basically does that for unload, but COM mode does not because in theory we always have an instance and a process id to kill. Unfortunately, shutdown is not always orderly and it can happen that IIS/ApplicationPool don't shut down properly.
So for COM I think we'd need another option to explicitly K-k-kill all servers (as in K-ken is c-coming to k-k-kill me 😄)
+++ Rick ---
Now that I think about another option might be to run through the 'normal' COM shutdown and then at the very end of it get a process list and kill all the remnants just like in file mode.
The only issue with that is that it might cause problems in reload scenarios because the kill operations would be in the background. I have to play with that.
Definitely agree it would be nice to solve this issue - I run into this from time to time as well although it's pretty rare these days. Most commonly I think this is caused by server AppPool timeouts and the server shutting down before Web Connection is done shutting everything down.
If I can be helpful in any way, just ask.
I've created a pre-release version of the Web COnnection module and Web Connection server that release servers in COM and then explicitly kill any of the processes.
The module and Web Connection server are in the Web Connection Experimental update here:
- [Web Connection Experimental](https://west-wind.com/files/WebConnectionExperimental.zip]
I've tested here locally. The easiest way to test is locally by:
- Run the server (I'm using the Web Connection Server)
- Turn on COM mode
- Run a request
- Servers pop up
- Kill the server Console Window
- Re-launch Web Connection (restarts the server)
- Hit a request
- Existing instances unload, new instances start up
You can try different variations of this with Load/Unload etc. In some situations the servers auto-kill on start up as cached requests will trigger a new load cycle.
If you use IIS you can recycle the App Pool to simulate the Kill the Server Console but that will actually shut down the instances. You can kill w3p3.exe
in task manager to simulate a hard crash but there may be multiples running if other Web sites are active. Finding the right one can be tricky (you can look at the Command Line with Process Explorer).
In my tests this is working well. If this works this will be a welcome enhancement to Web Connection.
Let me know if you run into any problems.
+++ Rick ---
Thanks, Rick! It's a bit tricky with a production environment, of course, but I'll let you know.
My tests in my stage environment appeared to work as expected. To recap what I did, my stage server runs 4 COM instances. I loaded a 5th by launching the EXE on the server, then used the now slightly mis-labelled "Unload Com Servers" button. 😃 I also tested by uploading a new build and swapping; again with an extra EXE launched from the server itself. All 5 went away in both tests. This is great and will reduce the potential problems when deploying updates. Thanks again!
I've rolled out several updates today. Other then the initial push of the DLL when it seemed to go a little bonkers on one of my web servers, it's worked flawlessly.
Cheers!
All in this seems to work well. I have on a handful of occasions seen EXE instances seemingly multiplying like rabbits after a swap. Sometimes reloading COM instances works but other times I have to flip back to file mode and use that to really clear them down. I'm afraid I'm not always noticing when this happens so I don't have any other details I can correlate with. I can check the standard logs if you think that might offer you some clues.
[C:\webconnectionprojects\rfc3]rfc3proclist.bat
pslist v1.3 - Sysinternals PsList
Copyright (C) 2000-2012 Mark Russinovich
Sysinternals - www.sysinternals.com
Process information for prd-rfc3-01:
Name Pid Pri Thd Hnd Priv CPU Time Elapsed Time
Rfc3 8084 8 7 224 4624 0:00:00.109 0:00:08.717
Rfc3 4212 8 7 224 4636 0:00:00.125 0:00:08.627
Rfc3 4888 8 7 224 4620 0:00:00.125 0:00:08.535
Rfc3 10016 8 7 224 4620 0:00:00.109 0:00:08.443
Rfc3 3164 8 7 224 4608 0:00:00.140 0:00:08.350
Rfc3 1668 8 7 224 4628 0:00:00.125 0:00:08.258
Rfc3 8316 8 7 224 4636 0:00:00.109 0:00:08.165
Rfc3 15116 8 7 224 4624 0:00:00.125 0:00:08.071
Rfc3 5620 8 7 224 4640 0:00:00.140 0:00:07.975
Rfc3 12320 8 7 224 4604 0:00:00.125 0:00:07.881
Rfc3 9872 8 7 224 4612 0:00:00.125 0:00:07.785
Rfc3 8380 8 7 224 4620 0:00:00.125 0:00:07.691
Rfc3 14572 8 7 224 4624 0:00:00.109 0:00:07.595
Rfc3 15160 8 7 224 4612 0:00:00.125 0:00:07.497
Rfc3 14228 8 7 224 4620 0:00:00.125 0:00:07.399
Rfc3 11968 8 7 224 4608 0:00:00.109 0:00:07.298
Rfc3 7480 8 7 224 4624 0:00:00.125 0:00:07.199
Rfc3 10224 8 7 224 4604 0:00:00.125 0:00:07.099
Rfc3 13000 8 7 224 4628 0:00:00.125 0:00:07.000
Rfc3 13040 8 7 224 4640 0:00:00.140 0:00:06.899
Rfc3 8064 8 7 224 4640 0:00:00.140 0:00:06.787
Rfc3 9196 8 7 224 4604 0:00:00.125 0:00:06.687
Rfc3 5148 8 7 224 4612 0:00:00.125 0:00:06.587
Rfc3 12352 8 7 224 4612 0:00:00.125 0:00:06.484
Rfc3 1120 8 7 224 4620 0:00:00.125 0:00:06.377
Rfc3 11156 8 7 224 4604 0:00:00.140 0:00:06.272
Rfc3 14148 8 7 224 4608 0:00:00.171 0:00:06.169
Rfc3 10976 8 7 224 4612 0:00:00.109 0:00:06.062
Rfc3 15688 8 7 224 4620 0:00:00.125 0:00:05.955
Rfc3 14620 8 7 224 4608 0:00:00.140 0:00:05.850
Rfc3 13480 8 7 224 4616 0:00:00.125 0:00:05.743
Rfc3 6560 8 7 224 4616 0:00:00.125 0:00:05.634
Rfc3 14224 8 7 224 4612 0:00:00.109 0:00:05.528
Rfc3 16300 8 7 224 4624 0:00:00.140 0:00:05.422
Rfc3 13972 8 7 224 4616 0:00:00.125 0:00:05.310
Rfc3 6720 8 7 224 4612 0:00:00.125 0:00:05.200
Rfc3 11460 8 7 224 4616 0:00:00.109 0:00:05.091
Rfc3 16124 8 7 224 4652 0:00:00.125 0:00:04.981
Rfc3 15376 8 7 224 4620 0:00:00.187 0:00:04.869
Rfc3 9112 8 7 224 4628 0:00:00.171 0:00:04.760
Rfc3 10668 8 7 224 4616 0:00:00.140 0:00:04.645
Rfc3 10060 8 7 224 4628 0:00:00.156 0:00:04.535
Rfc3 5240 8 7 224 4620 0:00:00.125 0:00:04.423
Rfc3 15628 8 7 224 4632 0:00:00.140 0:00:04.309
Rfc3 8720 8 7 224 4624 0:00:00.125 0:00:04.194
Rfc3 15884 8 7 224 4636 0:00:00.140 0:00:04.080
Rfc3 14812 8 7 224 4632 0:00:00.140 0:00:03.966
Rfc3 12956 8 7 224 4620 0:00:00.140 0:00:03.851
Rfc3 9812 8 7 224 4624 0:00:00.140 0:00:03.735
Rfc3 10312 8 7 224 4636 0:00:00.140 0:00:03.621
Hi Rick,
After using this for a couple of weeks I do think I'm running into that race condition you described earlier in the thread. Respawning the COM instances is getting into a tussle over killing orphaned EXEs. I sent a direct email with logs, etc a couple of days ago.