• July 23, 2019, 04:45:33 PM
Welcome, Guest. Please login or register. Registration is free.
Did you miss your activation email?

Author Topic: Corporate LAN down due to Cisco bug  (Read 1325 times)

0 Members and 1 Guest are viewing this topic.

Offline Flintstone

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 958
Corporate LAN down due to Cisco bug
« on: May 17, 2019, 06:09:33 AM »
Hi Guys,

Last week at 07:00 we lost our Corporate LAN at our HQ.  On arrival to work the car park barrier was up, which is never a good sign at 08:00 in the morning.  On entering our Office we were inundated with faults and couldn't connect to the Network.  We went to our 'Comms Room' and our Cisco VSS core switches were running at 99% CPU utilisation. We suspected a loop somewhere so started to disconnect links into the core until the CPU recovered.  Once we had a stable LAN we reconnected the links one by one until we hit the issue again.  All links were reconnected except to one of our outbuildings on the Campus.  Corporate LAN
recovered at 08:45.  At the outbuilding we could see that the core switches and edge switches were still running at 99% CPU.  We removed the links to the core switches until we identified a single edge switch.  The outbuilding had an edge Cisco 3850 stack of two switches with two port channels connecting into the core switches of that outbuilding.  We rebooted the edge Cisco 3850 stack and disconnected the links into the core of that outbuilding.  We now had a stable LAN at the outbuilding.  On further investigation we discovered we had hit a bug on our Cisco 3850 switches:

bug CSCul30426 (https://quickview.cloudapps.cisco.com/quickview/bug/CSCul30426)

Conditions:
This defect is seen when trying to build a port-channel between a 3850 and any other switch with the following conditions:

1. The native vlan is in the suspended state
or
2. The native vlan is not present in the vlan database.

Workaround:
Once you created the layer two vlan or change it to an "active" state, the port-channel will form as expected.


In our case the native Vlan was not present in the Vlan database on the edge switch and instead of having a single logical (port-channel); with two associated bundled links, we had two separate links connecting into the core switch of that outbuilding creating a loop and associated broadcast storm which then affected the corporate VSS core switches; causing CPU of 99%, and associated Network instability.  To resolve the issue we initially only used single links in the port-channel and later created the native Vlan into the Vlan database and reconnected the second link of the port-channel.

CheerZ :)
« Last Edit: May 17, 2019, 06:12:46 AM by Flintstone »


Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 3840
    • michaelfmcnamara
    • Michael McNamara
Re: Corporate LAN down due to Cisco bug
« Reply #1 on: May 17, 2019, 11:32:51 AM »
That's an interesting bug... if I read correctly though there was an initial mis-configuration that lead to the bug, correct?

Cheers!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline Flintstone

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 958
Re: Corporate LAN down due to Cisco bug
« Reply #2 on: May 20, 2019, 05:22:58 AM »
Yes, there was a mis-configuration when the switch stack was deployed over a year ago.  It just took that long for the bug to manifest itself.  So everything was working as normal and a year later 'kapow' the Network was down.  I've also noticed that Cisco haven't officially released the bug details to the public?

CheerZ