On 21 May 2017, starting at 00:15AM CDT, one of our NetApp storage
filers started to have have some major issues. eg. many bad disks
reported all at once, FC loop path down issues.
This would have affected primarily two systems. Some web hosting sites
would have been slow to non-responsive. And retrieving email would
have had some issues during the early hours of Sunday morning.
All email coming in would be queued up while the storage system was
having problems, and then delivered normally when it could.
We dispatched engineers to the datacenter, who worked on it overnight
to get it to a stable point. We suspect a bad drive went really bad
and was actively corrupting the buses. We have that drive isolated and
things are much better. We are currently resilvering the storage to
make it fully redundant again. It will take some time to resilver
everything, so we'll be around the datacenter off and on the rest of
the weekend watching over the process.
The storage system got back to working well starting about 6:00AM CDT,
and we have all services restored now. The affected web hosting sites
and email are all flowing normal volumes right now, and things look good.
We'll be watching over everything closely until this storage system is
back to 100%.
We'll be monitoring everything as we normally do. If you have any
questions please let us know at [log in to unmask]
Doug McIntyre <[log in to unmask]>
-- ipHouse/Goldengate/Bitstream/ProNS --
Network Engineer/Provisioning/Jack of all Trades