Print

Print


On Apr 26, 2017, at 9:19 AM, Mike Horwath <[log in to unmask]> wrote:
> 
> On Wed, Apr 26, 2017 at 08:07:45AM -0500, Andrew Hoyos wrote:
>> I would suggest that perhaps we look into filtering BGP (tcp/179)
>> with an ACL prior to maintenance start on those specific ports being
>> moved.  Many other IXs are doing this for maintenance as a way to
>> gracefully take things down, and let bilateral and RS sessions time
>> out without killing active traffic. As we've noticed, not all
>> members being moved are bothering to shut down sessions prior, which
>> causes impact to/from those members.  (i.e.:
>> https://ripe67.ripe.net/presentations/374-WH-IXPMaintReduce.pdf)
> 
> Don't even need ACLs.
> 
> Just take down the route servers for the 2 hour period.
> 
> Bilateral are unaffected and they can arrange things anyway with their
> peers.

I’d disagree. The maintenance currently taking place affects more than just the route servers. Plenty of people are doing bi-lateral peering on MICE, and that *IS* affected by maintenance events like these.

Adding an ACL to the port ensures graceful shutdown/end of traffic, rather than an abrupt drop and hold timer fun.
I’d much rather that someone running the maintenance and in control of the ultimate link up/down events be the one deciding when things are starting/ending and re-enabling traffic gracefully.

> Adding another step to the process creates more complications as well,
> and another point of failure if you screw up along the way.

Disagree, adding an ACL to a port is pretty trivial. Add (pre-existing) ACL to port 10 minutes before maintenance starts. Remove when complete. 
Script up into copy/paste thing with port numbers for bonus points and less changes of failure.

> Clean shutdown of bird is easier, quicker, and will for sure make the
> multilateral peering not be further affected by bouncing repeatedly.

Yes, great for MLPA, but not for bilateral. 

Lastly, In this *specific* case, this presents issues with other members ports who are *NOT* affected by the maintenance and a loss of traffic for them if they are doing MLPA. Why break everyone and cause a total route server outage, when it’s not necessary at all? Yesterday’s maintenance only affected a portion of members. ACL’s on member ports would be the cleanest way to minimize outage duration for all members with the least impact to the IX as a whole.