Print

Print


On 4/7/2018 starting after midnight CDT, two of our four main web
clusters failed to restart Apache properly when the maintenance job
rolled the log files for the day.  Both of these clusters have been
restored by 1:40AM CDT.  The main web pool and the web-adv pool were
the two affected. The old blog and new blog clusters (or other
customer clusters) were not affected.  This caused an outage for some
of our web hosting platforms for our web customers.

The root cause was that	internally in production we run two different
OS releases, FreeBSD 11.1-RELEASE and FreeBSD 10.3-RELEASE.  Both of
these clusters haven't been upgraded yet to 11.1 and are on the	older
10.3-RELEASE which will	be shortly end-of-lifed.

The recent Apache security update was tested and rolled	out successfully
on 11.1-RELEASE, but testing was overlooked on 10.3-RELEASE,
and the	10.3 build of the recent Apache security update failed to load
and deploy a critical module.

We have	hand deployed the critical module across the board, and
systems are now working at 100% as expected. 	

In the upcoming	week, we will be doing more rolling upgrades of	the
systems to get onto 11.1-RELEASE across the board which	have already
been field tested on other clusters.

More testing will be done to ensure that all required modules and
code are built and deployed correctly.	

Thank you.

--
Doug McIntyre                            <[log in to unmask]>
         -- ipHouse/Green Cloud Technologies --
      Network Engineer/Provisioning/Jack of all Trades