Print

Print


I did get more information from Arista related to the bug that caused the other cards to restart/ports to flap a month ago.  It is still in progress, so not everything is published in its final state yet.  It comes down to swapping in a R2 card in a chassis that has mostly R1 cards in it.

BUG 916990 
Release Note: When Linecard of a SKU is swapped with Linecard of different SKU, SandFap agent corresponding to other Linecards on the chassis may restart 1 time. 
There is no workaround, recommendation is the perform the SKU swap during a maintenance window

In our situation, this is the summary:
Agent restarted when we replaced 7500-R with 7500-R2 because the SPPID binding was already present for a different VOQ. Agent restart happens to clear out this binding to allocate it properly with the 7500-R2 linecard.


More information about the bug: 
The forwarding chip has a Traffic Manager that does destination lookup table that provides a destination based on a globally known ID called systemPhyPortID (SPPID). 
This SystemPhyPortID (SPPID) uniquely identifies the Virtual Output Queues (VOQ) that the packet will get inserted into and the final forwarding chip port the packet will be sent towards.
The software binding of Traffic Manager's destination lookup table corresponding to SPPID of MeshDrop port wasn't cleaned up during removal of 7500-R card.

When 7500-R2 card was inserted this SPPID got allocated to the front panel port. When programming on an existing slice agent (the cards which were already present on the switch) triggered for this Front panel port, the code asserted about binding already present for this SPPID with a different VOQ (MeshDrop port's VOQ), causing the slice agent to restart.

Jeremy

-----Original Message-----
From: Jeremy Lumby <[log in to unmask]> 
Sent: Wednesday, February 21, 2024 4:54 PM
To: 'MICE Discuss' <[log in to unmask]>
Subject: RE: [MICE-DISCUSS] MICE interface flap

A quick update.  I just heard from Arista support that the interface flap last week was the result of a new bug that they are currently working out the details of.  At the moment I have no further information about the details.

-----Original Message-----
From: Jeremy Lumby <[log in to unmask]> 
Sent: Friday, February 16, 2024 9:44 AM
To: 'MICE Discuss' <[log in to unmask]>
Subject: RE: [MICE-DISCUSS] MICE interface flap

It will be interesting when Arista responds.  It sounds very similar.  The week before we upgraded to 4.30.5M to prepare for the new card (7500R2-36CQ-LC).

-----Original Message-----
From: MICE Discuss <[log in to unmask]> On Behalf Of Frank Bulk
Sent: Friday, February 16, 2024 9:03 AM
To: [log in to unmask]
Subject: Re: [MICE-DISCUSS] MICE interface flap

Thanks for sharing, we definitely saw this, too.

We had a *somewhat* similar issue last year that when we slotted a card into our 7504 and the other card rebooted. Turned out to be some kind of mesh bug. Bug ID 765840 resolved in 4.30.1.

Frank

-----Original Message-----
From: MICE Discuss <[log in to unmask]> On Behalf Of Jeremy Lumby
Sent: Wednesday, February 14, 2024 4:57 PM
To: [log in to unmask]
Subject: Re: [MICE-DISCUSS] MICE interface flap

I am still sorting out the details.  But it was related to the card swap scheduled for today.  The card was in up and running without an issue for several minutes, however when I programmed/moved the first link to it, all of the other cards in the core switch started rebooting themselves.  I am opening a ticket with Arista since we seem to be stable at the moment.

-----Original Message-----
From: MICE Discuss <[log in to unmask]> On Behalf Of Chris Wopat
Sent: Wednesday, February 14, 2024 4:56 PM
To: [log in to unmask]
Subject: [MICE-DISCUSS] MICE interface flap

WiscNet's remote switch flapped twice within the last half hour:

Feb 14 16:34:16.657 2024  s-minneapolis-hub mib2d[8448]: 
%DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 540, ifAdminStatus up(1), 
ifOperStatus down(2), ifName et-0/0/23

Feb 14 16:39:29.456 2024  s-minneapolis-hub mib2d[8448]: 
%DAEMON-4-SNMP_TRAP_LINK_DOWN: ifIndex 540, ifAdminStatus up(1), 
ifOperStatus down(2), ifName et-0/0/23

Aggregate traffic definitely shows a hit:

https://link.edgepilot.com/s/5abcf6c9/_Ox0o9rFAUimx0FLu4zJ0g?u=http://micelg.usinternet.com/export/graph_2625.html

Curious if others experienced flaps or could reveal logs from the core 
Arista?

-- 
Chris Wopat
Network Engineer, WiscNet
[log in to unmask]   608-210-3965



Links contained in this email have been replaced. If you click on a link in the email above, the link will be analyzed for known threats. If a known threat is found, you will not be able to proceed to the destination. If suspicious content is detected, you will see a warning.