Services affected (warnings or errors):
Virtualization customers with SV-series virtual machines, vmForge
VDC, and Enterprise Hosting customers using VMware.
Reason for service degradation:
Process repeatedly core dumping filled up root volume on the active
controller which caused some system services to fail.
This caused NFS and iSCSI to stop serving traffic and caused some
VMs (anything not Windows Server 2008 R2) to mark their root
volumes read-only.
Pager went off and I was on the systems in under 2 minutes and
intervened by failing over to the standby controller.
Part of the reason for this issue was that I had enabled snmpd
(for monitoring) and that process that was core dumping (SNMP is
currently not supported). What I didn't realize was that the core
dumps were being saved and could possibly be part of the problem
of filling the root volume but we can't be 100% sure of that.
I'll take most of the blame as I should not have (in hindsight)
enabled an unsupported service without also looking for potential
failure scenarios.
As we saw (and patched up) on the controllers recently, things
should have failed over without human intervention and this is
being investigated by our vendor.
I apologize for the service interruption and I won't turn on snmpd
again until it is considered to be supported.
I went through and checked (I don't have access to everything) all VMs
I could to make sure they were operational. In many cases a simple
reboot bring the services back online but a few VMs needed their own
manual intervention. I emailed the few customers affected by this.
Support can be reached Monday thru Friday from 8:00am until 6:00pm via
phone at 612-337-6340, or via email at [log in to unmask]
--
Mike Horwath ipHouse - Welcome home! [log in to unmask]
The universe is an island, surrounded by whatever it is
that surrounds universes. - Berkeley Fortune
|