LISTSERV 16.0 - OUTAGE Archives

Starting today at 11:50AM, 1/16/2015 and during periods lasting until
13:30 CST, our inbound & outbound email would have been slow and
delayed due to several events that happened at once.

The primary cause is that we lost one of the VMWare hosts handling
our server clusters due to some hardware issues.

Normally this wouldn't cause a blip as we've engineered capacity for
handling these sorts of problems, but for some reason, this event and
subsequent VMWare vmotion events took offline two of our database
clusters that should have been redundant and stayed up through all the
hardware problems.

Once one inbound only database cluster was restored, the subsequent
vmotion events took out the primary identity database cluster. As of
13:30, we had enough database nodes cleaned and online to support our
server needs.

One current project is to evaluate migrating the older cluster onto
newer more resilient technology, as we've seen some issues with
the older cluster technology being sensative to vmotion events. 

As of 13:30, all email is running clean and dequeued into where it needed
to go, and everything was running optimally.

If you still have any problems or questions please let us know at
[log in to unmask], or call us up at 612-337-6340.

We'll be monitoring the system to ensure it doesn't have any further
issues.

Thank you.
--
Doug McIntyre                            <[log in to unmask]>
          -- ipHouse/Goldengate/Bitstream/ProNS --
       Network Engineer/Provisioning/Jack of all Trades