I’ll also add, this is currently a draft with the IETF grow WG:
https://tools.ietf.org/html/draft-ietf-grow-bgp-session-culling-01
(and one which I fully support)
--
Andrew Hoyos
[log in to unmask]
> On Apr 26, 2017, at 9:34 AM, Andrew Hoyos <[log in to unmask]> wrote:
>
> On Apr 26, 2017, at 9:19 AM, Mike Horwath <[log in to unmask]> wrote:
>>
>> On Wed, Apr 26, 2017 at 08:07:45AM -0500, Andrew Hoyos wrote:
>>> I would suggest that perhaps we look into filtering BGP (tcp/179)
>>> with an ACL prior to maintenance start on those specific ports being
>>> moved. Many other IXs are doing this for maintenance as a way to
>>> gracefully take things down, and let bilateral and RS sessions time
>>> out without killing active traffic. As we've noticed, not all
>>> members being moved are bothering to shut down sessions prior, which
>>> causes impact to/from those members. (i.e.:
>>> https://ripe67.ripe.net/presentations/374-WH-IXPMaintReduce.pdf)
>>
>> Don't even need ACLs.
>>
>> Just take down the route servers for the 2 hour period.
>>
>> Bilateral are unaffected and they can arrange things anyway with their
>> peers.
>
> I’d disagree. The maintenance currently taking place affects more than just the route servers. Plenty of people are doing bi-lateral peering on MICE, and that *IS* affected by maintenance events like these.
>
> Adding an ACL to the port ensures graceful shutdown/end of traffic, rather than an abrupt drop and hold timer fun.
> I’d much rather that someone running the maintenance and in control of the ultimate link up/down events be the one deciding when things are starting/ending and re-enabling traffic gracefully.
>
>> Adding another step to the process creates more complications as well,
>> and another point of failure if you screw up along the way.
>
> Disagree, adding an ACL to a port is pretty trivial. Add (pre-existing) ACL to port 10 minutes before maintenance starts. Remove when complete.
> Script up into copy/paste thing with port numbers for bonus points and less changes of failure.
>
>> Clean shutdown of bird is easier, quicker, and will for sure make the
>> multilateral peering not be further affected by bouncing repeatedly.
>
> Yes, great for MLPA, but not for bilateral.
>
> Lastly, In this *specific* case, this presents issues with other members ports who are *NOT* affected by the maintenance and a loss of traffic for them if they are doing MLPA. Why break everyone and cause a total route server outage, when it’s not necessary at all? Yesterday’s maintenance only affected a portion of members. ACL’s on member ports would be the cleanest way to minimize outage duration for all members with the least impact to the IX as a whole.
>
>
>
|