• May 22, 2012, 09:03:53 PM
Welcome, Guest. Please login or register. Registration is free.
Did you miss your activation email?

Author Topic: Stack problem after main unit boot  (Read 603 times)

0 Members and 1 Guest are viewing this topic.

Offline gomeiban

  • Rookie
  • **
  • Posts: 13
Stack problem after main unit boot
« on: November 21, 2011, 03:40:35 PM »
Hi,
we have a customer with 2 5520 with sw v6.2.0.200 in stack like network core.
!Unit# Switch Model     Pluggable Pluggable Pluggable Pluggable SW Version
!                         Port      Port      Port      Port             
!----- ---------------- --------- --------- --------- --------- ----------
!1     5520-24T-PWR     (21) None (22) None (23) None (24) None v6.2.0.200
!2     5520-24T-PWR     (21) None (22) None (23) None (24) None v6.2.0.200
!
Several switches 4550T-pwr like access.  These switches are in 3 stacks of 2 units and others alone.

The problem is:
- Sometimes (every 4 or 6 months) 5520 base unit restart and when we watch the log appears a Cold Start Trap, then the second unit takes control and everything is fine, but when the main unit came back and try to take the stack control, begins a fight between them trying to take control.  Base led in both units begin to flash intermitently.
- Solution, restart both units and the stack start up fine.
- This problem occured with the previous version and we did an upgrade in april.  Last Friday happend the same problem again.
- Last Friday, besides happend the same problem with an stack of 4550T-pwr, then both units try to take the control.  Our customer restarted both units and teh stack started up fine.

Somebody knows what we can do in order to have  a full redundant network without restart all devices?

Regards
Patricio



Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2517
    • Michael McNamara
Re: Stack problem after main unit boot
« Reply #1 on: November 21, 2011, 05:51:06 PM »
When you look at the logs is there any mention regarding the cause of the reboot?

There were a number of software fixes in 6.2.1 that resolved issues that caused software expections which can lead the switch to watchdog (reboot itself) so you'll probably be advised to upgrade to 6.2.3 software (latest available release) to see if that helps any.

http://www.michaelfmcnamara.com/files/avaya/ers5000_621rn_01.01_ReadMe.pdf

I've seen base units through software exceptions and reboot. However I've never seen what you describe as "fight between them trying to take control". When the base unit goes offline the stack (or remaining switches) will elect a temporary base unit. This switch will remaining the temporary base unit until the stack is rebooted or it goes offline. The flashing base led is meant to indicate that this switch is acting as a temporary base unit.

Your configuration can impact the fail over between switches... what features are you using on your stacks? Layer 2 only, Layer 3, MLT, DMLT, PIM, etc. For example there's was a known bug regarding DMLT... after a base unit failed and then recovered a DMLT would continue to have issues communicating with the upstream switch where packets would be lost. I believe this specific issue has been addressed in the latest software release.

What software are you running on your ERS 4500s?

Good Luck!

We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline gomeiban

  • Rookie
  • **
  • Posts: 13
Re: Stack problem after main unit boot
« Reply #2 on: November 22, 2011, 02:25:04 PM »
Hi,
well this stack is working like Core network, using L3, some vlans in L2 and DMLT to 45xx.

45xx are using version 5.1.2.005

About the issue that you talk about 5520s, this problem occurs with this version and the last one.

About the reset, we don't have any message, just Cold start trap.
Today night I'm going to perform a complete clean maintenance, after that I'm going to perform some simulations, forced reset on unit 1, soft reset unit 1, etc.

Regards

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2517
    • Michael McNamara
Re: Stack problem after main unit boot
« Reply #3 on: November 22, 2011, 06:08:24 PM »
There's usually a message, it's just not always saved to NVRAM so it doesn't survive the reboot.

If you can attach a PC or dumb terminal (with scrollback buffer) to the console port. After you have an issue go back and look at all the output from the console port.

Good Luck!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline gomeiban

  • Rookie
  • **
  • Posts: 13
Re: Stack problem after main unit boot
« Reply #4 on: November 23, 2011, 08:04:11 AM »
Hi,
last night we perform a complete  clean maintenace to the 5520s.
After that, we perform some test:
- unplug/plug AC power cord in the main unit, stack came back fine.
- unplug/plug AC power cord in the second unit, stack came back fine.
- Soft reset of main unit, stack came back fine.
- unplug main unit, everything works fine(redundance).
- plug main unit, stack came back fine.

Before maintenance, I checked log, audit and last-exception.  Logs are fine, but there is a last-exception

Core-stack#show system last-exception

  Last Saved Exception - Unit# 1
--------------------------------------

bld version: 6.1.0.7 time: (20/May/09 16:04:10) view: (productio)
sysUpTime: 132 days, 22:56:28
Registers:

     R00      R01      R02      R03      R04      R05      R06      R07
00ec6ba8 05856e50 00000000 00000000 00003032 00000000 00000000 05849890
     R08      R09      R10      R11      R12      R13      R14      R15
00000005 00000002 05849890 05856de0 00000000 00000000 00000000 00000000
     R16      R17      R18      R19      R20      R21      R22      R23
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     R24      R25      R26      R27      R28      R29      R30      R31
00000000 00000000 0000006b 027e02ac 00000000 069f8798 00000002 aaaaaaaa

Exception type: Data Access
Task Name "PP"
  KrnlSt 0, IntCnt 0, TskLckCnt 0, DAR 0xaaaaaaf2, PC 0x00d86a44, SP 0x05856d90
- Exception Stack Trace
  + PC 0x00ec68c0
  + PC 0x00ec6ba8                           
  + PC 0x00ebf3c8
  + PC 0x00eb8388
  + PC 0x00d88fd4
= Total 204 Bytes =

Can you help me with this message???
I think this is the problem, then we lost the stack and both units fight for be base.

Regards.

Offline stauftm

  • Full Member
  • ***
  • Posts: 69
Re: Stack problem after main unit boot
« Reply #5 on: November 23, 2011, 03:34:56 PM »
Hi gomeiban

Take a look at my post, you'll see that I was going through something similar with a stack of 5530's that were running in our core also:

http://forums.networkinfrastructure.info/nortel-ethernet-switching/5530-base-unit-rebooting/

I was also getting SW exceptions, but mine happened much more frequently. Sometimes it would reboot within a day, or go as long as a week without a reboot. I work in healthcare and we are open 24x7 so needless to say this was not a fun time troubleshooting.

If you have warranty it's definitely worth a unit replacement along with the cascade cable(s). Latest SW and FW is always something to get to also.

I've had a couple units get SW exception like you are seeing, and now we don't even think twice - we replace the units.

Thanks,

Todd

Offline Dominik

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 662
Re: Stack problem after main unit boot
« Reply #6 on: November 24, 2011, 04:35:38 AM »
Hi gomeiban,

I had also a stacking problem but in my case I have no entrys in the show system last-exception.

In fact you have an output in the last-exception you can send this output to the Avaya support.
With the last-exception entrys the developers can figure out what went wrong on your device an can make a fix for this particular bug.

Good Luck
Itīs always the network...

Offline TankII

  • Jr. Member
  • **
  • Posts: 49
Re: Stack problem after main unit boot
« Reply #7 on: December 01, 2011, 08:11:18 AM »
I had this issue - once.
Now, I make sure when I build a stack fresh, units 1 and 2 are sorted by MAC.  The newest MAC gets to be the Base unit.
Stacking starts with the switch, then elects by MAC address if the switches are not set.

The others in the stack don't matter as much as the top two units.

In the BPS days, the HW version also mattered.  If you had units that were adjacent to each other that were more than 4 HW revs apart, it would cause a problem post re-boot of any kind.  Say you had an HW03 unit in the field, added a HW05 unit later (Stack of two) then added a HW11 unit, upon reboot the HW11 unit would try and assert itself as master, even though it was unit 3 in the stack.  Lots of flashing Master lights!

TankII

Offline KT

  • Full Member
  • ***
  • Posts: 67
Re: Stack problem after main unit boot
« Reply #8 on: December 01, 2011, 09:15:10 AM »
why they don't make our life esay! good to learn and I have no problems with existing stack our customers.

cheer!!!!!!!!!!!