• November 23, 2020, 10:56:25 PM
Welcome, Guest. Please login or register. Registration is free.
Did you miss your activation email?

Author Topic: Avaya 4548GT unresponsive - why?  (Read 5941 times)

0 Members and 1 Guest are viewing this topic.

Offline dstegall

  • Rookie
  • **
  • Posts: 18
Avaya 4548GT unresponsive - why?
« on: May 23, 2013, 12:57:52 PM »
Hello.

Hoping you guys might be able to help me solve a problem.  I had an unresponsive switch in a 3-stack behave strangely this morning, and I'm unable to figure out what happened and why.

Some background
I have a 3-stack of 4548GTs that has been in operation for years.  Rock solid performance for the most part.  Currently running FW:5.3.0.3 and SW: v5.3.0.008 - so a bit out of date.  Switch #1, the top of the stack physically, is the base.  Port 48 on switch 1 and switch 2 are in an MLT to a pair of 8606s.  Switch 1 is about 50% utilized, port-wise.  The clients attached to it are mostly a mix of run of the mill Windows 7 and Mac OS computers.

The problem
This morning all the users connected to switch 1 lost their network connection.  No connectivity, could not even ping their local gateway, or other devices on the same VLAN.

Investigation
All three switches were powered, cascade cables were in place and secured, there have been no recent configuration changes.  Port lights on switch one were blinking normally.  It appeared that traffic was moving.  I could ping the switches management IP and both 8606's saw a good SMLT link to the stack.  Both uplink ports did not display any port errors, discards, etc.

However, I could not get a console connection on switch 1.

Getting a console on switch 2 and running "show stack-info" revealed switch 1 was missing.  The stack was now only two members, switches 2 and 3.  "show log sort-reverse" had nothing that appeared relevant, except some old messages about being unable to contact an NTP server.  A quick look at spanning tree revealed no ports in blocking mode, though I'm far from an expert there so maybe I missed something.  With nothing else to go on and pressure from users to restore service, I did what all inexperienced admins do and rebooted the entire stack.

This worked. Whatever the problem was with switch 1 was cleared.  I was able to get on switch 1's console and we were back in business.

So, my question is - what happened? 

What things should I be looking at to figure out why anyone connected to switch 1 had no network connectivity for a brief period?


Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 3842
    • michaelfmcnamara
    • Michael McNamara
Re: Avaya 4548GT unresponsive - why?
« Reply #1 on: May 23, 2013, 06:33:28 PM »
Where the stack lights all green on the front of the switches?
Was the base/master switch blinking?

You most likely experienced a software bug which cased the 1st switch to fall out of the stack and caused the 2nd switch to be elected the temporary base.

Good Luck!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline dstegall

  • Rookie
  • **
  • Posts: 18
Re: Avaya 4548GT unresponsive - why?
« Reply #2 on: May 23, 2013, 06:41:07 PM »
Hi!  Thanks for the reply.

The stack lights were not all green.  The UP stack light on switch 1 was amber, which initially made me think one of the cascade cables was loose or damaged.  I don't recall if base/master was blinking.  I don't think so, but I'm not sure.

Assuming it was a software bug, what kind of conditions might trigger it?  Something from a client/host? 

I've been monitoring the stack post reboot all day and it seems fine.  I have to write a post-mortem on the outage though, and I'm kind of stuck writing "I don't know what happened" at this point, which stinks. 

If it was a software bug, would there be any evidence that might support that?  A log somewhere?  A counter?  Anything?

Thanks for all the help!  I have about 70 4548's and this is the first time anything like this has ever happened.

Offline dstegall

  • Rookie
  • **
  • Posts: 18
Re: Avaya 4548GT unresponsive - why?
« Reply #3 on: May 23, 2013, 06:42:41 PM »
Also, I'd be interested in how any of you more experienced folks would have handled this.  What would you have done?  What commands would you use to ascertain the problem? 

Offline cacids

  • Rookie
  • **
  • Posts: 21
Re: Avaya 4548GT unresponsive - why?
« Reply #4 on: May 23, 2013, 09:44:07 PM »
hi dstegall,

I wouldn't be surprised if nortel logs are not displaying any helpful information. Long time back, there was issue with our nortel, it didn't generate any alerts but the neighbor device Cisco generated alert that nortel is down. I think you did right thing by rebooting and that's the most common remedy for nortel devices. However, it's a good idea to list current software version in a spread sheet, study released notes and suggestion by Michael. List out the current software bugs and upgrade it to the latest version. Then, you can raise potential risk based on current firmware to management and ask them if they can allow upgrade or not. This will not guarantee to occur this kind of issue again but might add some relief from existing software bugs.  :D

We are having v5.5 and here is the release notes [check out what they have fixed in v5.3] but I am sure latest version should be around.
http://downloads.avaya.com/css/P8/documents/100136935


Update: Out of interest, they have release v5.6 and it looks good. check out this release notes and presentation
http://downloads.avaya.com/css/P8/documents/100153547

www.catalysttelecom.com/medialibrary/.../avaya/avaya-ers_4800_v5.ppt‎



cheers,
« Last Edit: May 23, 2013, 10:04:30 PM by cacids »

Offline Johan Witters

  • Sr. Member
  • ****
  • Posts: 252
    • BKM Networks
Re: Avaya 4548GT unresponsive - why?
« Reply #5 on: May 27, 2013, 07:18:04 AM »
As the serial connection gave no sign of life, it sounds like the control process died..

You might want to try the "show system last exception unit all" to look if the unit registered a software exception. It's possible the switch doesn't accept the command on the release you are running though..

I'm afraid the stack reboot will have cleared most of the logging, but you can always check if something was recorded in NVRAM
Kind regards,

Johan Witters

Network Engineer
BKM NV

Offline dstegall

  • Rookie
  • **
  • Posts: 18
Re: Avaya 4548GT unresponsive - why?
« Reply #6 on: June 07, 2013, 12:54:17 PM »
Thanks everyone for the replies.  I haven't been able to determine the cause, but thankfully the problem has not returned.  While I'd very much like to know what caused it, I'm ok with it being a one-off thing that I'll unlikely see again.