Hi all - An update to this outage.  

The original outage was confirmed to be a utility maintenance that hit power in our suite for 5 seconds.  We are working with the site to have our feeds moved to new facility UPS(s) so we might have to schedule some downtime later this week or early next week depending on how they plan to do this.  In addition, we are searching for a 208/220v UPS rackmount that we can afford to protect at least one of our feeds to help give us some piece of mind going forward.  I'll send an update out if we will indeed need to schedule downtime, I should know more today.

-Mark


---------- Forwarded message ----------
From: Mark Calkins <[log in to unmask]>
Date: Sun, Apr 23, 2017 at 9:22 AM
Subject: Outage - IX-Denver - Apr 23 2017
To:


Hello IX-Denver members,

Around 2am MDT this morning IX-Denver lost power to both of its power feeds at the same time at the 910 15th st site.  This outage, therefore took down both our switching and server infrastructure for a brief period of time.   It is unknown at this time how both A and B feeds were down at the same time, we are working with the site to get a root cause.  We will update when we have an answer.

The outage persisted for clients of our route server A until early this morning.  The reason this VM did not recover automatically was eventually found to be due to an old setting going all the way back to when the VM for route server A was installed.  Our hypervisor had a configuration to permit the VM to be reachable via VNC, and the subnet this command was referencing no longer existed on this hypervisor after some hypervisor networking architectural changes earlier this year. So the VM wouldn't boot and also did not output a whole lot of information about what 'device'  failure was preventing the start of the VM.  This was a miss on our part and we apologize, we haven't had to boot this VM in well over a year.  This is should be a one off issue since not only is this configuration removed now, but we also have been using a different hypervisor schema for all new VMs for some time, and this should not be possible again.

The more lengthy route server A outage does highlight a need for at best more bi-lateral peering between our members, and 2nd best, more members connected to all three route servers. We have not done a great job of socializing the new redundant route servers, but we have been soaking them for stability for the last couple of months and it has gone well.  Route servers B and C were also both online immediately after boot as expected.  So I encourage all to look into bringing up sessions to the other route servers, there is information on our technical page ( http://ix-denver.org/?q=technical ) about the new features we support on B and C, if you have the minimum requirement of an IRRDB as-set then there is already a session built and ready to go for your network.

Please feel free to contact me if you have any questions.
Thank you,
-Mark



To unsubscribe from the IXDENVER list, click the following link:
&*TICKET_URL(IXDENVER,SIGNOFF);