So I believe I've found an (known) issue with 2003 Windows Network Load Balancing and VMware.
VMware reports that WNLB on Windows 2003 Servers does not behave as expected here. Basically the article says the the NLB will point to one of the servers, not all of them, when running in unicast. They give you two fixes: use multicast; or reconfigure your Port Groups (or vSwitches) to prevent RARP Packet Transmissions. Interesting thing, with the current environment these servers are hosted in, I am unable to either run multicast or disable Switch Notify. I'll have to take that up with the Network Team, or perhaps investigate some NLB hardware.
So here's where I'm left with - these servers are not supposed to go down (hence the NLB), but they are everyday at 4AM (fun call). This is when backups are running, so I'm adjusting the time to see if the issue follows.
What appears to be happening is that when snapshots are taken for backups, the NLB seems to freak out at the one dropped ping. Currently the backups all run at once, which makes them all hiccup at the same time, killing the NLB for a good reported 45 minutes (no idea why so long). If the issue follows the backups, perhaps staggering the backups might solve the problem (let the NLB roll from one server to another).
I'll keep you posted on what I find.