Services affected (warnings or errors):
Virtualization services hosted on any of our platforms saw an
interruption of disk input/output at approximately 9:36pm Monday
night, May 28th, 2012. This brief interruption lasted > 30
seconds while fail-over occurred between the two storage
controllers.
Reason for service degradation:
bug #1: mpt_sas driver failure caused kernel panic on controller
#1 which resulted in a kernel core dump
bug #2: HA failure resources (storage, shared IP address) were not
released from controller #1 to controller #2 until *after* the
system dumped core (it takes a bit to write out a large system
core dump)
Tegile has addressed both of these bugs with a rapid release of
new patches to the controllers. Controller #1 was patched last
night. Controller #2 will be patched tomorrow night (May 30th,
2012, after 11:15pm).
step 1: graceful fail-over will be done from controller #2 to
controller #1 which will interrupt disk I/O for ~3-6 second
(VMware will take care of disk I/O queue during this fail-over)
step 2: controller #2 will be patched and rebooted
Please note: bug #2 caused the underlying storage (and networking
for said storage) to be offline for > 30 seconds which can cause
disk I/O timeouts that may require a reboot. Most server
operating systems were unaffected. RHEL 5/6, Ubuntu 10.04/12.04,
and Windows Server 2008 were all fine.
Normal fail-over (tested repeatedly earlier this year) is between 3
and 6 seconds in length and should not adversely affect the
availability of any system connected to this storage.
Support can be reached Monday thru Friday from 8:00am until 6:00pm via
phone at 612-337-6340, or via email at [log in to unmask]
--
Mike Horwath ipHouse - Welcome home! [log in to unmask]
The universe is an island, surrounded by whatever it is
that surrounds universes. - Berkeley Fortune
|