Starting about 06:30PM CDT with some initial indications, but mostly
starting as the day started today on 7 May 2015, and lasting until
about 3:15PM CDT, ipHouse experienced a major DSL authentication failure
that affected most of our DSL customers if they attempted to login.
Since it was raining heavily, this had several small power outages and
brownouts and DSL line retrains all around the twin cities exacerbating
Since the timing coming off the DSL grooming cuts yesterday were so close,
it was initially assumed to be some fallout due to that, and one team here
was talking to CenturyLink about system wide ATM DSL network problems.
In the end, this wasn't a problem at all, and all DSL cuts that had
happened was correct and operating perfectly fine.
In the end, it appears that our central database was acting up. In our
troubleshooting, our hand queries into the database all returned
correct information, but the RADIUS authentication daemons were not
getting the same data we were. Further issues came up that we were
just on the edge of filling up the max data space within our database.
We increased the space the database could handle, but that took a
rolling reconfiguration/reboot of all the nodes, which is an hour
long process after each change, and it required a few changes.
Eventually we had the whole database complete and answering correctly
across the board, with the RADIUS daemons handing out the correct info
all the time instead of just a small percentage of connections.
Our server design shouldn't have allowed that to happen, but just
about anything can fail, no matter how redundant it is built.
If you still have any problems or questions please let us know at
[log in to unmask], or call us up at 612-337-6340.
Doug McIntyre <[log in to unmask]>
-- ipHouse/Goldengate/Bitstream/ProNS --
Network Engineer/Provisioning/Jack of all Trades