Print

Print


Services affected (warnings or errors):

    Virtualization customers with SV-series virtual machines, vmForge
    VDC, and Enterprise Hosting customers using VMware.

Reason for service degradation:

    Process repeatedly core dumping filled up root volume on the active
    controller which caused some system services to fail.

    This caused NFS and iSCSI to stop serving traffic and caused some
    VMs (anything not Windows Server 2008 R2) to mark their root
    volumes read-only.

    Pager went off and I was on the systems in under 2 minutes and
    intervened by failing over to the standby controller.

    Part of the reason for this issue was that I had enabled snmpd
    (for monitoring) and that process that was core dumping (SNMP is
    currently not supported).  What I didn't realize was that the core
    dumps were being saved and could possibly be part of the problem
    of filling the root volume but we can't be 100% sure of that.
    I'll take most of the blame as I should not have (in hindsight)
    enabled an unsupported service without also looking for potential
    failure scenarios.

    As we saw (and patched up) on the controllers recently, things
    should have failed over without human intervention and this is
    being investigated by our vendor.

I apologize for the service interruption and I won't turn on snmpd
again until it is considered to be supported.

I went through and checked (I don't have access to everything) all VMs
I could to make sure they were operational.  In many cases a simple
reboot bring the services back online but a few VMs needed their own
manual intervention.  I emailed the few customers affected by this.

Support can be reached Monday thru Friday from 8:00am until 6:00pm via
phone at 612-337-6340, or via email at [log in to unmask]

-- 
Mike Horwath      ipHouse - Welcome home!       [log in to unmask]
        The universe is an island, surrounded by whatever it is
        that surrounds universes. - Berkeley Fortune