• May 21, 2012, 08:12:55 AM
Welcome, Guest. Please login or register. Registration is free.
Did you miss your activation email?

Author Topic: ERS 5510e switch stack is now running like a hub!  (Read 2294 times)

0 Members and 1 Guest are viewing this topic.

Offline za_mkh

  • Rookie
  • **
  • Posts: 13
ERS 5510e switch stack is now running like a hub!
« on: January 15, 2010, 08:49:53 AM »
Hi Folks,

First post ... hope you can help me solve this riddle.

Have two Nortel 5510e switch stacks running FW v5.05.020

Stack A - 7 5510e + 2 BPS 2000 uplinked into STACK A
Stack B - 5 5510e

Stack A and B are connected via a 3 x Fibre MLT (via VLAN 1)

We have multiple VLAN's configured across them with ESX servers connected to each stack via their own dedicated MLT uplinks. All has been working well for a long time. All of a sudden this week, we get calls that people can't print, or copy files, etc. Connectivity is there and when connected to a server for example, copying files of the server to a local desktop occurs in seconds, but if we try to put the same files back on the server from the same desktop, it take 15 minutes before timing out! But on another computer, there are no issues. All very strange. IF we move the server via vMotion to another server,  some of the PCs experiencing the issue can then communicate normally, while another 1 or 2 then become very slow. Trying to isolate it to a switch is impossible because plugging the user into another port causes the problem to remain. However if they use another port that is working, then all is fine. We have looked at cables, nics, PCs, everything.

Fired up Wireshark on Stack A and find that I am now seeing TCP packet betweens servers and other clients in addition to the normal broadcast/multicast traffic. We don't have port mirroring enabled so I should not be seeing TCP traffic between other PC's and servers Our default VLAN on Stack A is VLAN 1 and through wireshark, I am also seeing traffic between other servers and PC's on the other VLANS which leads me to believe the entire Stack seems to be acting like a hub or maybe one of them have gone faulty.

Running wireshark on a port connected to Stack B (VLAN 2) shows that Stack 2 is operating correctly. I see normal broadcast/multicast traffic in addition to the normal PC traffic. Don't see any other traffic.

We have rebooted both stacks including hard power off, but the problem remains. We have not changed anything in the last week nor can we think of anybody mispatching somehting, though we had a look and every looks good.

Can anybody give us a pointer on where I could look.

Spanning tree is disabled completely on both Stacks.

Your help will be greatly appreciated.


Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 5510e switch stack is now running like a hub!
« Reply #1 on: January 15, 2010, 09:26:46 AM »
It sounds like perhaps one of your MLT links is misbehaving that would explain why some folks can communicate without issue yet others are having issues.

Things to examine; Have you looked at the stats for your MLT ports? Any errors? Is the MAC/FDB table normal? If you are seeing traffic you shouldn't see in a packet trace it might indicate that the MAC/FDB table is all messed up.

I would suggest that you disable (unplug) one of the MLT links and see if your problem clears.

It's possible that the switch(s) are confused and may need a restart/reboot.

Let us know how you make out.
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline za_mkh

  • Rookie
  • **
  • Posts: 13
Re: ERS 5510e switch stack is now running like a hub!
« Reply #2 on: January 15, 2010, 09:38:40 AM »
Hi Michael,

To the rescue again! :D

We have rebooted both switch stacks twice. Problem has persisted which is why I'm worried. But just can't work out what it is?

MLT links are not showing any errors whatsoever (using Web Console -> Ethernet Errors Stats) on both stacks

How do I know if the MAC/FDB is normal ... when I look at it ... it looks sort of fine?

Thanks once again

Offline za_mkh

  • Rookie
  • **
  • Posts: 13
Re: ERS 5510e switch stack is now running like a hub!
« Reply #3 on: January 15, 2010, 09:44:46 AM »
Hi Michael,

What I can add is that using DM, the switch stack sometimes goes gray, etc. It's like it has too much too process ... We had this problem a long time ago, but install 5.0.5 firmware sorted it out for us.

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 5510e switch stack is now running like a hub!
« Reply #4 on: January 15, 2010, 02:56:52 PM »
With a large stack you can sometimes get SNMP timeouts which caused Nortel's Java Device Manager to go grey.

Are you sure you don't have a loop somewhere in the topology?

Is the output rate high (look at ifOutOctets on the port status) on any of the ports?

Did you try unplugging (admin-down) some of the MLT links? If you have rebooted the switch you most probably don't have a software problem, especially if you were running fine for some previous amount of time.

What you need to-do is isolate one specific case and/or problem. Take the source and destination address, get their IP addresses, map their MAC address on all switches, map their IP/ARP address on all switches. Diagram out the possible path for the packets between the source and destination. If you are traversing a MLT, disable on the paths - does the problem clear, restore that path and disable the remaining path/paths - does the problem clear?

You could have a bad MLT path but that would quickly be discovered if you disable that path and all your problems clear.

Good Luck
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline za_mkh

  • Rookie
  • **
  • Posts: 13
Re: ERS 5510e switch stack is now running like a hub!
« Reply #5 on: January 16, 2010, 12:27:18 PM »
I took your advice after the your first reply and individually disabled MLT port members and then enabled them again (while still keeping one enabled to allow for connectivity). Don't know what that did, but it definitely reduced the number of 'rogue' packets that I was seeing in that the wireshark capture then only start showing me the broadcast/multicast packets and the host/network packets. There were still one or other packets not destined for my PC that wireshark picked up.

We thought that a specific switch had / has gone faulty and tried to map all the rogue ports (by looking at ethernet / port error summaries) and they are all across the switch stack. Was hoping they would all point to one switch but alas they did not.

I was thinking there must be a loop somewhere, and tried to activate spanning tree on Stack A (but I cannot create it!) The problem is definitely not with Stack B. Stack B is now running Spanning tree (except across the MLT links). When I get to the office on Monday I will look at the ifoutoctets on the port status.

What I have done from reading your blog post on the 5520 is to enable rate limiting across the switch in the hope that it alleviates the problem. As i was not the one that configured this, it is dificult to understand what the original network  implementation ideas were and why STP is somehow disabled (but cannot be enabled on Stack A). Says STP1 does not exsist and when I try to create it, it fails.

Thanks for your help. I will update this thread when we find the cause or solution.

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 5510e switch stack is now running like a hub!
« Reply #6 on: January 16, 2010, 12:49:28 PM »
I would tend to believe that you have a loop somewhere and perhaps rate-limiting is keeping your network from completely crashing.

You said that you're seeing unicast packets in WireShark that should not be seeing on that switch port (where you are performing the packet trace). When you look at the MAC/FDB table does it indicate that the switch believes that device is physically attached to the port where you are performing your packet trace? The only time you should see those unicast packets is when the switch doesn't have an entry for the device in the MAC/FDB table and floods the packet out to all ports in that VLAN. You already know that you'll see broadcast traffic so again there's something wrong somewhere.

Your comment about STP not enabling/creating is very interesting. Perhaps that switch is running RSTP/MSTP in which case the commands would be different, it should be easy to see from JDM.

Hang in there... you just need to find the answer.
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline za_mkh

  • Rookie
  • **
  • Posts: 13
Re: ERS 5510e switch stack is now running like a hub!
« Reply #7 on: January 19, 2010, 04:09:19 PM »
Hi Michael,

Thought I would give you an update on the issue. Firstly, NDM says STP mode is Nortel so not MSTP or RSTP. Still don't know what happened to STP1 to make it disappear on Stack A. However, we are going to do a firmware upgrade tomorrow and we will see if the STP issue is sorted out.

We also no longer experience the 'hub like' behaviour, but we have ports that experience seriously slow network performance on certain servers, etc. We think we have isolated it to some (3) BPS switches that are uplinked to the 5510e  ( stack A via 3 Port MLT). It appears when both Stacks were rebooted, the BPS switches were never rebooted. So I think the issue is there. However moving users out of the BPS and into the 5510e stack resolves some of their problems. It's a wierd one on the whole.

Stay tuned and many thanks again.
But we shall see what a complete reboot of "ALL"
« Last Edit: January 22, 2010, 03:31:38 AM by za_mkh »

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 5510e switch stack is now running like a hub!
« Reply #8 on: January 19, 2010, 07:09:27 PM »
Hmmm.... I didn't think that you could stack a Business Policy Switches (BPS).

I would suggest a reboot before you upgrade the software to see if the STP issue is resolved. The default mode for Nortel switches is Nortel STP. Nortel's STP is still interoperable with any other device running Spanning Tree, Nortel just behaves differently on MLT links so they call it Nortel STP.

If you want to post the output from the following commands, I'll try to offer some feedback;

 - show spanning-tree config
 - show spanning-tree op-mode
 - show spanning-tree port
 - show spanning-tree vlan
 - show spanning-tree stp 1 config
 - show spanning-tree stp 1 port
 - show spanning-tree stp 1 vlan

 - show autotopology nmm-table

Good Luck!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline za_mkh

  • Rookie
  • **
  • Posts: 13
Re: ERS 5510e switch stack is now running like a hub!
« Reply #9 on: January 22, 2010, 04:07:02 AM »
Hi Michael,

The saga is nearly over.

Apologies for the late reply. As will be made clear soon, I was dealing with the fallout yesterday, and now have found some time to update you.

On Wednesday evening, we upgraded the firmware/sw to 5.1.2.035 on two stacks (STACK B and STACK C) with no issues. Upgrading the firmware on Stack A, would not work. Stack A accepted the diags image but not the software image. We then decided to make another unit in Stack A the BASE and tried that. The new base unit accepted the new software and seemed to push it to only 2 of 7 switches. Suffice to say, it all came crashing down. The stack could not be seen. What we noticed was that when the stack 'was broken' the spanning tree option appeared again. In the end we had to upgrade each switch individuallly, and we also rebuilt the entire stack, by reconfiguring the uplinks at the back of the switches so that everything cascaded from the new base unit down. We did this by adding one switch at a time to the stack. We were hoping that this would have isolated the dodgy switch. But in this case, everything worked even after a full reboot of the stack.

Few things stood out:
Switch 1 (OLD base) did not accept software image initially. After reboot of switch (and disconnected from stack) it did. This made us believe that this switch was faulty. But it appears not.

Switch 4 and 5 had huge amounts of latency. Ping requst to switch IP (when disconnected from stack) had 300-500ms response times. On Switch 4, took 7 minutes just to transfer the diags image across (via TFTP). Though when it rebooted it was all fine! After our experience on switch 4, we did a reboot of switch 5 and it upgraded normally (i.e. quickly!)

Once the stack had been rebuilt, the original config still stood, but we lost all MLT information (8 MLT Trunks on Switch A) and Switch 5 lost all its config (rate limiting, etc) I think the MLT link information was lost since we were adding one switch at a time to the stack, and since ab MLT could be across three switches, the nortel software was clever enough to know it could not be built.

We decided to rebuild the MLTs from scratch (using our backup config file as reference) and all seems to be fine now! performance issues seem to have disappeared. 

We also now have spanning tree on Stack A. Spanning tree has been activated on all switch ports except for MLT links.

It took 7 hours to get this working! We left the office in the early hours of the morning!

Though I have a few more questions that I would like your advice on
..

1) On your blog post http://blog.michaelfmcnamara.com/2009/01/hp-nic-teaming-with-nortel-switches/ you say that MLT links for servers to the switch is now considered "old tech". Our ESX servers have 6 NICs of which 3 are dedicated for virtual machine guest traffic. These three NICs are bonded into an MLT (Trunk) uplinked to the switch since our ESX servers could host guests belonging to 1 of 3 VLANS. Of course when we had the issue we had, my biggest fear was the need to maybe reconfigure all our ESX servers if we had to permanently remove a faulty switch, hence me thinking MLT links are not the way to go?  Should I be using Link Aggregation Groups?


2) Our stacks are connected via MLT (fibre) in the following methods
   STACK A -> STACK B (3 Fibre MLT)
   STACK A -> STACK C (2 Fibre MLT)

Our load balance type is Basic. Would we be better off using Advanced for these ISLs? (Our ESX MLTs are Advanced as per the nortel docs). I say this as I notice that when there is heavy traffic between the stack A and B, 1 MLT link member seems to be used more then the others. This could therefore be overloading that switch?

Many thanks once again for all your help.

Kind regards

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 5510e switch stack is now running like a hub!
« Reply #10 on: January 22, 2010, 07:29:10 PM »
Hi Za!

I'm happy to hear that you were finally able to resolve the issue.

We use LACP/LAG with our servers because it provides a few benefits over static MLT or PortChannels (Cisco). When using LACP/LAG each link participates in the LACP protocol so if there is a problem with one side the other is sure to find out about it. With MLT unless you loose link you won't know there is an issue on the other side of the connection. It's not a big change but since LACP is now a fairly established standard it makes sense to use it where possible. If your MLT configuration is working for you and you haven't had any issues I would probably leave it as is.

Your hub and spoke design is fine;  B <-> A <-> C

There are some caveats to trunking/bonding multiple links together in a Layer 2 configuration. Your probably using the the default MLT algorithm that keeps all traffic from the same conversation on the same links, this helps prevent out of order packet sequencing and other problems. It's less efficient but guarantees less problems. In this configuration if you FTP a large file from your PC to a server on the other side of an MLT you'll see the traffic basically loaded all onto just 1 of the paths in the MLT.

There's an awful lot of bandwidth in a 1000Mbps connection. I highly doubt you are taxing the hardware or the network, you need a lot of machines to tax a Gigabit network. If you are curious I would advise you to perhaps look into MRTG, it can help graph the utilization of your switch ports and show you just how busy those ports are. We have stacks that are comprised of 8 switches with 200+ devices connected to the stack and we run at 1% or 2% utilization on our 2 1000BaseSX uplinks. Don't get me wrong, there's a lot of bursting traffic - but on average the network has a lot of bandwidth to offer and the majority or machines aren't siting around cranking out packets, unless their infected with a virus/trojan which has happened more than once.

Glad to hear your back up and working again!

« Last Edit: January 28, 2010, 12:26:48 PM by Michael McNamara »
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline za_mkh

  • Rookie
  • **
  • Posts: 13
Re: ERS 5510e switch stack is now running like a hub!
« Reply #11 on: January 28, 2010, 12:09:21 PM »
Hi Michael!

I was too quick off the mark! It's a week later and we still experience the issue when some ports have serious connectivity issues to other ports in the same stack!

Eg. 6/7 has very slow performance writing a 120KB file (over 3 minutes) to a file server located on Switch 4/40. However, switch 6/8 has no issue. All ports are identical and on the same vLAN! But 6/15 could have a similar issue. If I take the user connected to 6/7 and 6/15 and move to 7/2 and 7/8, their problem is resolved. No reboots / etc required. But then a user on 7/36 could experience a similar issue.

I've cleared the mac-address table and when I look at the cpu stats on the stack - the avergage utilisation is 23% and the same on Stack B which is having no problems? You said your utilisation is around 2% . Could you think of a reason why ours is so high. We don't run QOs, or any routing protocols.

I've looked at the tech support logs and nothing looks untoward. Everything looks normal? STP is operating fine - nothing is in blocked mode.


I'm suspecting it must be the backplane links connecting the switches togther. But I don't know how the traffic gets sent down the backplane .. Does switch port 6/7 when trying to reach 4/30 go via switch 5 and then 4 or does it go to 7 then 1 then 2 then 3 then 4. How does it makre that decision?

Any ideas would be most welcome.

Thanks once again for reading!

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 5510e switch stack is now running like a hub!
« Reply #12 on: January 28, 2010, 01:00:57 PM »
I'm sorry to hear that you're still having problems. I've been there on more than one occasion and it can be very frustrating.

I see in the thread that you upgraded to 5.1.2.x software, I've never had any issues with that software although I know that 5.1.5.x is available.

Are you sure you don't have any duplex mismatches on the PCs or servers? If you look at ports 6/7 and 4/40 are there any InErrors for those ports? What's the average per second ifOctectsIn and IfOctetsOut for those ports?

I was speaking about the average actual uplink/downlink utilization between the switches. Here's the CPU utilization on a stack I have running 5.1.3.x software;

ERS-5520-PWR#show cpu-utilization
----------------------------------------------------------------
                      CPU Utilization
----------------------------------------------------------------

Unit/ Last 10 Sec, 1 Min, 10 Min, 60 Min, 24 Hrs, System Boot-Up
----------------------------------------------------------------
1          27%     35%    27%     26%     26%     27%
2          35%     33%    25%     24%     24%     26%
3          34%     34%    26%     24%     24%     26%
4          24%     32%    25%     24%     24%     25%
5          29%     32%    25%     25%     25%     26%
6          25%     31%    25%     25%     24%     25%
7          30%     32%    25%     25%     25%     26%
8          24%     32%    25%     25%     24%     25%


How many QoS queues do you have configured and what size buffers do you have set?

ERS-5520-PWR#show qos agent
QoS NVRam Commit Delay: 10 seconds
QoS Queue Set: 2
QoS Buffering: Large
QoS UBP Support Level: Disabled


You can also look at the port statistics for the stack-up and stack-down ports (the cascade ports).


ERS-5520-PWR#show stack port-statistics up-stack unit 1
ERS-5520-PWR#show stack port-statistics down-stack unit 1


You would need to go through every switch in the stack 1 - 8 (or how many ever switches you have).

I've had similar issues from time to time but they effected all connectivity to/from various switches in the stack and only when I've added a switch to a stack on the fly or had a switch drop out of the stack for whatever reason and return later. The solution was always a software restart of the entire stack.

Have you opened a support ticket with Nortel?
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline za_mkh

  • Rookie
  • **
  • Posts: 13
Re: ERS 5510e switch stack is now running like a hub!
« Reply #13 on: January 29, 2010, 06:53:48 AM »
Hi Michael,

Thanks for your continued support and encouragement.

We would have opened a support ticket with Nortel however, our support providers are BT and they have been a nightmare to deal with so we are not taking our chances with them!

I've run the commands you've mentioned and annoyingly ... nothing out of the ordinary. Our QoS output matches yours.

I've decided to look at the In Errors and Out Errors across the stack. Most are at 0, a few at 2 or 10, and but 2 ports :1/22 has "in errors" of 81098 and 2/25 with 644. They have no OutErrors. I will look at those. The switch stack was rebooted last friday so the stats are for 1 week. All servers are connected at the correct speeds.

We are placing blame on Switch 5 in the stack since it was the only one to lose all its config when we did the upgrade. We have three virtual servers running off an MLT split across switches 4 and 5 and having performance issues across the stack in accessing them. We wil be remove switch 5 mlt port members from that server to see if resolves application performance for those three servers. If it does, then it is definitely switch 5 playing 'silly buggers' (I don't know what that means!)

Will let you know how this saga pans out!

Offline za_mkh

  • Rookie
  • **
  • Posts: 13
Re: ERS 5510e switch stack is now running like a hub!
« Reply #14 on: January 20, 2012, 05:47:07 AM »
2 years have passed since this call and like most posters. I did not update this post when the solution was found. As I have just received a PM asking me what the solution was, I thought I should at least update this post.

After hours of troubleshooting, we finally found that it was a backplane cable that was causing the problem. When we broke the 'redundancy' of the switch up/down backplane links by removing what we thought was the faulty cable and the problem disappeared. We continued to run like that for 4 months with no issues.

We then replaced what we thought was the dodgy backplane cable, the switch continued to work in the normal manner.

In the last two years due to building moves, etc the switch stacks were completely rebuilt and I have not looked at network configs since then!

Thanks Michael once again for your help :D