• May 21, 2012, 08:07:00 AM
Welcome, Guest. Please login or register. Registration is free.
Did you miss your activation email?

Author Topic: Error messages at the console of a 4500 switch  (Read 693 times)

1 Member and 1 Guest are viewing this topic.

Offline normski

  • Jr. Member
  • **
  • Posts: 36
Error messages at the console of a 4500 switch
« on: November 16, 2011, 11:51:49 AM »
Hi

Ive logged into the console of one our 4500 switches and got loads of these messages:

_soc_xgs3_mem_dma: failed(NAK)
_soc_xgs3_mem_dma: unit 1 mem L2_ENTRY.ipipe0 index 4864-4991 buffer 0x1bf7438
_soc_xgs3_mem_dma: failed(NAK)
_soc_xgs3_mem_dma: unit 1 mem L2_ENTRY.ipipe0 index 4864-4991 buffer 0x1bf7438
_soc_xgs3_mem_dma: failed(NAK)
_soc_xgs3_mem_dma: unit 1 mem L2_ENTRY.ipipe0 index 4864-4991 buffer 0x1bf7438
_soc_xgs3_mem_dma: failed(NAK)
_soc_xgs3_mem_dma: unit 1 mem L2_ENTRY.ipipe0 index 4864-4991 buffer 0x1bf7438
_soc_xgs3_mem_dma: failed(NAK)

Its part of a 2 unit stack. When I plug into the other unit I get a normal prompt.

Any ideas?

Thanks

Normski

 
« Last Edit: November 16, 2011, 06:32:49 PM by normski »
I'd much rather be hillwalking.


Offline Jon Hurtt

  • Sr. Member
  • ****
  • Posts: 125
Re: Error messages the console of a 4500 switch
« Reply #1 on: November 16, 2011, 01:03:55 PM »
Sounds like an RMA to me... i would contact Avaya Support and look to RMA process. (Looks like something wrong with the Memory Allocation, but who knows)

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: Error messages at the console of a 4500 switch
« Reply #2 on: November 16, 2011, 07:17:23 PM »
As Jon suggests you should probably contact Avaya (if you have a support contract) but you could certainly try a cold boot to see if the problem is an operational one (bug - potentially memory leak).

I would advise you to be careful though, if it's not a bug but a hardware issue the switch might not pass the POST tests and might fail to boot leaving you with the task of replacing it immediately if you want to restore service.

Cheers!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline normski

  • Jr. Member
  • **
  • Posts: 36
Re: Error messages at the console of a 4500 switch
« Reply #3 on: November 17, 2011, 08:35:38 AM »
Thanks for the feedback

Ive contacted my support provider who said to do the following in roughly this order

reboot the switch;
reset the config then reapply;
upgrade to 5.5;

I am concerned that the switch might not come back properly so its happening during our at risk period next week.

I googled bits of the message and came up with this from a Force10 list

https://puck.nether.net/pipermail/force10-nsp/2011-January/000123.html

Im wondering if force10 and avaya get their asics/memory from the same supplier or perhaps they
share code?

Normski
I'd much rather be hillwalking.

Offline Jon Hurtt

  • Sr. Member
  • ****
  • Posts: 125
Re: Error messages at the console of a 4500 switch
« Reply #4 on: November 17, 2011, 10:30:41 AM »
Thanks for the feedback

Ive contacted my support provider who said to do the following in roughly this order

reboot the switch;
reset the config then reapply;
upgrade to 5.5;

I am concerned that the switch might not come back properly so its happening during our at risk period next week.

I googled bits of the message and came up with this from a Force10 list

https://puck.nether.net/pipermail/force10-nsp/2011-January/000123.html

Im wondering if force10 and avaya get their asics/memory from the same supplier or perhaps they
share code?

Normski

Most get their ASICs from same provider. I wouldn't doubt that Force 10 and Avaya use similar if not same chips. I would agree with the support suggestions... if all of those fail to solve the problem, demand a replacement. They want to make sure its not a software/faulty config before replacing hardware... good luck.

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: Error messages at the console of a 4500 switch
« Reply #5 on: November 17, 2011, 09:01:33 PM »
reboot the switch;
reset the config then reapply;
upgrade to 5.5;

Yes, reboot the switch...

Ugh, reapply the configuration? Only the in the most dire of situations, I've only had to do this myself maybe once or twice and only when it became clear the configuration was corrupt and only with Ethernet Switch 470s in a stack, never the ERS 5000 series.

Yes, upgrade if available...

Good Luck!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline normski

  • Jr. Member
  • **
  • Posts: 36
Re: Error messages at the console of a 4500 switch
« Reply #6 on: November 29, 2011, 04:08:35 AM »
Hi

Rebooted the switch in question and the console messages went away.
For the record I was investigating one way calls between stacks - that problem went away too.

Thanks

Normski
I'd much rather be hillwalking.

Online bylie

  • Sr. Member
  • ****
  • Posts: 120
Re: Error messages at the console of a 4500 switch
« Reply #7 on: November 29, 2011, 09:12:16 AM »
We're having problems which might be similar to what you're seeing, although in our case it's not on our ERS 4500's but on our ERS 2500's. It's currently up to 6 ERS 2500's which were spewing out the following error message on their serial console (nothing in the logs or when we connected through telnet/SSH, if we could still connect that is):

Code: [Select]
...
Tucana MMU DRAM CG0.M1 CRC error at 0x000010ab
Tucana MMU DRAM CG0.M1 CRC error at 0x00001102
Tucana MMU DRAM CG0.M0 CRC error at 0x00000ff5
...

We RMA'ed all 6 of them and are in the process of getting fresh ones from Avaya. The frustrating part is that we cannot check this remotely and that we essentially have to wait until the connectivity to the switch dies (and our monitoringsystem notifies us) or until the users behind the switch start complaining because of (very) bad network performance. My guess is that Avaya have gotten a bad batch of DRAM chips from a 3rd party vendor and some of those switches are (slowly) dying and exhibiting this behaviour (we currently have 70+ ERS 2500's on our network so I'm really keeping my fingers crossed that we're not facing a massive swap operation).
A couple of months ago, when I saw this the first time, I also couldn't make any sense of it but we immediately needed those ports so I just tried a reboot which seemed to clear up these error messages (although it failed the internal loopback test on rebooting) and left it at that because we already had plans to clean out that wiring closet. A couple of weeks ago we actually did the makeover of the wiring closet in question and swapped out the bad switches (2 of them which still failed the internal loopback test) and started an RMA for them. So in hindsight these switches actually still worked (kinda I guess) and we never received any more complaints but ofcourse we kept a close eye on it.
The two things we currently do to test this is to first try a reboot and see if any of the POST checks fail, if this is all OK but we still suspect the switch I just create a loop on it (in a closed lab environment) to easily generate some traffic after which some of them again started to spew out the above error messages on their serial console (so it might also be traffic and/or buffer/queue related).

Where I'm getting at with all of this is that if I were you I'd keep a close eye on that switch and if there would be any doubt just replace and RMA it. This networking stuff is already hard enough sometimes, we really don't need flaky hardware making it even harder ;).
« Last Edit: November 29, 2011, 09:59:48 AM by bylie »

Offline normski

  • Jr. Member
  • **
  • Posts: 36
Re: Error messages at the console of a 4500 switch
« Reply #8 on: November 29, 2011, 12:12:18 PM »
Hi

Thanks for the heads up. The reboot has worked in this case so Im leaving it for now - besides thats what our support provider has recommended. And I can only get a RMA if i go through their procedures, which to be fair is common sense trouble shooting.

What worries me more is that the only way to see errors like these is to connect to the serial port on the switch itself. We had an issue in the summer which may or may not have been caused by this switch involving wireless and dhcp. I could have solved myself a few more grey hairs if I had just logged into the console........

Who knows!!!

Normski
I'd much rather be hillwalking.

Online bylie

  • Sr. Member
  • ****
  • Posts: 120
Re: Error messages at the console of a 4500 switch
« Reply #9 on: November 29, 2011, 05:37:48 PM »
...
What worries me more is that the only way to see errors like these is to connect to the serial port on the switch itself.
...

I totally agree that this makes it hard and frustrating to troubleshoot! I'm currently left wondering if the actual running SW is "aware" or at least somehow receives some kind of "message" when these hardware errors happen? The fact that it only shows on the serial console somehow gives me the impression that these errormessages do not come from the running SW itself but from something else much more lowlevel. Otherwise it would seem trivial to me that these would also be logged using the normal logging facilities. Maybe the way this works is comparable to the way the kernel sometimes spews messages onto the user's console in the various *nixes and so in this case this might be more of a VxWorks level event?
« Last Edit: November 29, 2011, 05:39:22 PM by bylie »