• May 22, 2012, 08:42:19 PM
Welcome, Guest. Please login or register. Registration is free.
Did you miss your activation email?

Author Topic: Packetloss running SMLT, resolve by adding host route to other Def GW..HELP  (Read 866 times)

0 Members and 1 Guest are viewing this topic.

Offline Phil Crawford

  • Rookie
  • **
  • Posts: 9
Hi All,
I am new to this forum although have used many times for very usefull info and assistance
We are supporting a medium sized network running 2 x 8600, single CPU, IST running SMLT to roughly 15 Cabs with a mixture of 2 - 6 switches per 5520 stacks.
Latest SW revisions on all switches.

The problem we have been getting for roughly a year now is that a specific host loses connection to a server or host on the network, best way to demo is ICMP. We check the default GW owner is Core2 (.254). If i add a host route to point to .253(Core1), ICMP responds. Take host route out, ICMP fails... Back in works, back out doesnt.
After a period the ICMP response comes back with no intervention at all. This could be Voice traffic, could be informational display screen stops working, standard host network access etc.. its very intermittent.

Usually the quick fix by our customer is to disconnect 1 of the SMLT link from the edge or Core and all traffic is subsequently forced down the other route.
Unfortunatley we have not had time to gather Show tech or FDB/ARP tables etc when the problem occurs because the customer pulls the link to get quick resolve in place. This customer is major airport and they are considering ripping our the Avaya Network and replacing with Cisco.

I had a dig around and i can see something similar happening on earlier versions of code but will not post now as not to steer thoughts down that route. I really need fresh expert eyes to confirm or deny my thoughts.
Avaya have been pretty useless in support but then i suppose they need outputs when the problem is occurring. Numerous TAC case have been opened previously.

I look forward to your input.
Please let me know if you require further info.

Phil


Offline Dominik

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 662
Hi Phil and welcome to the Forum,

is yor problem shown on L2 and L3 ?
Do you can reach yor ERS8600 if you are in the same Subnet during the problem?
Do you have the problem if you are driectly connected to one of the ERS8600 ?

Wich SW do you run on your ERS5520 and ERS8600 ?
If I understand you right, the problem can be fixed with disabling one of the Uplinks of your Edge Switches, is that correct ?
Is that problem shown on all Edge Switches or only on some particular Switches ?


Cheers
Itīs always the network...

Offline Phil Crawford

  • Rookie
  • **
  • Posts: 9
Ok to start, the issue seems to be layer 3 as conneting to anything layer 2 never seems to be an issue.
Any subnet could be affected in theory although we have only really noticed this on a handful of them.

Software:
8600 - 5.1.5.1
Baystack's - 6.2.1

We have no access ports on the 8600's so have not tested direct on them.
If we disable/disconnect 1 of the SMLT links from the stack all traffic is then forced down 1 route and resolves the issues.
The problem seems to be more apparent in particular areas but these areas seem to be more densly populated or at least increase traffic, could be a red herring though.
If i remember rightly we can still get to the 8600's.

I suspect FDB/SMLT/VRRP issue perhaps even timers.

Cheers

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2517
    • Michael McNamara
Hi Phil and welcome to the forums!

In general 5.1.5.1 software for the ERS 8600 is known to be very stable and doesn't have many issues if any at all. On the other hand 6.2.1 software for the ERS 5520 stacks is another story all together.

Have you reviewed the design and configuration of the entire network?

Are you using VRRP or RSMLT? If you are using VRRP are you using VRRP Backup Master?

I'm assuming from your description above that when the problem occurs the edge device can not ping the default gateway, is that correct?

You really need additional diagnostic output from both ERS 8600 switches in order to move the troubleshooting process forward.

ERS 8600
show config
show tech
show sys topology


ERS 5520
show running-config
show autotopology nnm-table


If you provide that information to Avaya along with the MAC address of the edge device they should be able to verify the MAC/FDB and ARP table entries (assuming you don't do it yourself without calling them).

If left alone does the problem resolve itself after 5 minutes (300 seconds - default MAC/FDB aging timer)?

Good Luck!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline Phil Crawford

  • Rookie
  • **
  • Posts: 9
Hi Michael,
We are running VRRP.
I have sent all these ouputs multiple times but not while the issue is there, as i say the customer usually pulls a links or disables a link to get a quick resolve.

I was looking at a previous post regarding FDB and ARP timers and increasing the FDB timer to +1 second.
The thing is im not sure that the problem resolves after 5 mins or not.

In the following post:
newbielink:http://blog.michaelfmcnamara.com/2008/07/ers-8600-software-4160-buggy/ [nonactive]

This looks very similar although pointing to a software issue.

We are running backup master too.
We are inclined to change the timers but, i would have thought all this would be cleared up in the software?

Also what potential issues do we have with the 5520 SW?
We upgraded under advice from Avaya.

Cheers

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2517
    • Michael McNamara
The issues in 4.1.6.x have long been resolved although I did observe some issues in 4.1.8.x, I have not yet seen any issues in 5.1.5.1 software similar to all the MAC/FDB and ARP issues when running in an IST/SMLT configuration.

How many VLANs do you have? Are you using unique VRRP IDs?

I would suspect some other issue might be at play here... seen it before but as you allude to it can be difficult to track down. I was able to use a CentOS Linux server to script the collection of data. You might want to try that if you have access to a Linux server on the network that has 'Expect' and a few other Unix utilities.

As a start to troubleshooting I would advise a reboot of the ERS 8600 core. In the past the issues with 4.1.6.x and 4.1.8.x were generally seen after a new VLAN was created or configuration changed. A reboot of the ERS 8600 core (one at a time is fine) usually resolved the issue for any existing VLANs until you altered the configuration again.

Good Luck!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline Phil Crawford

  • Rookie
  • **
  • Posts: 9
Roughly 30 VLAN's, pretty much all use VRRP with unique VRID's.
We have agreed with the client to get outputs for us now when the problem re-occurs again so hopefully we will get so valid data.

Im surprised Avaya have not seen this before somewhere, we surely cannot be the only support company seeing this, we have had another client, a large hospital experience this too, not on such a scale but they have experienced this and know what outputs to provide us if the issue occurs again. We have raised this with TAC previously.

Cheers

Offline Dominik

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 662
Havenīt seen a problem as you discribe it on a ERS8600 with SW 5.1.5.1.

Did the Avaya support give you any hints where the issue could be releated with ?

Maybe a possible step would be to use RSMLT instead of VRRP.
RSMLT can provide you the same as VRRP and works very stable on SW 5.1.5.1.
At least it can not make your situation more bad as it is.

Cheers
Itīs always the network...