LISTSERV 16.0 - OUTAGE Archives

At 10:47PM CST, on 9 Nov 2012, one host, vmf4, in our VDC cluster #1 had
an unrecoverable hardware fault detected, and it halted.

VMware HA kicked in and restarted whatever virtual-machines affected
by the hardware dying back up on other hosts in the cluster
automatically, and they would have been back in service as quick as
they could boot.

We'll keep machines off vmf4 while we run hardware tests on it this
weekend, and bring it back into the cluster at the end of the testing.

I have a feeling this is a one time cosmic ray failure type event,
with all the redundancy in the world, there is still some one event
that seems to take down hardware from time to time.

If you have any problems or questions please let us know at
[log in to unmask], or call us up at 612-337-6340.

Thank you.

--
Doug McIntyre                            <[log in to unmask]>
          -- ipHouse/Goldengate/Bitstream/ProNS --
       Network Engineer/Provisioning/Jack of all Trades