• May 21, 2012, 08:14:59 AM
Welcome, Guest. Please login or register. Registration is free.
Did you miss your activation email?

Author Topic: ERS 8600 LACP trouble with Cisco UCS fabric extenders  (Read 761 times)

0 Members and 1 Guest are viewing this topic.

Offline dstegall

  • Rookie
  • **
  • Posts: 8
ERS 8600 LACP trouble with Cisco UCS fabric extenders
« on: January 20, 2012, 06:34:16 PM »
Hello,

Hoping someone can help me troubleshoot an issue I'm having with LACP and Cisco's UCS.  I am trying to create uplink ports between our Cisco UCS fabric interconnects and our ERS 8600 (code 5.1.2.0) using LACP.  I'm experiencing significant, but random, packet loss on these links.

Here is the config I am using on the ERS 8600 side:

Core1:5# config ethernet 1/12,1/14,2/11,2/13 perform-tagging enable
Core1:5# config ethernet 1/12,1/14,2/11,2/13 lacp key 16
Core1:5# config ethernet 1/12,1/14,2/11,2/13 lacp aggregation true
Core1:5# config ethernet 1/12,1/14,2/11,2/13 lacp timeout short
Core1:5# conf mlt 16 create name "UCS FAB-A LACP LAG"
Core1:5# conf ethernet 1/12,1/14,2/11,2/13 lacp enable
Core1:5# conf mlt 16 lacp enable

Core1:5# config ethernet 1/24,1/26,1/23,1/25 perform-tagging enable
Core1:5# config ethernet 1/24,1/26,1/23,1/25 lacp key 17
Core1:5# config ethernet 1/24,1/26,1/23,1/25 lacp aggregation true
Core1:5# config ethernet 1/24,1/26,1/23,1/25 lacp timeout short
Core1:5# conf mlt 17 create name "UCS FAB-B LACP LAG"
Core1:5# conf ethernet 1/24,1/26,1/23,1/25 lacp enable
Core1:5# conf mlt 17 lacp enable

Core1:5# conf lacp enable

Unfortunately, I can't paste a config from UCS, but the uplink ports are in a LACP bundle using all UCS defaults.

Anyone see any problems with the ERS side?  I think you can create LAGs without specifying an MLT by just specifying the same adminkey on each port in the LAG, but that didn't seem to work at all.

Any help appreciated.


Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #1 on: January 20, 2012, 10:27:44 PM »
Hi dstegall and welcome to the forums!

You have a single ERS 8600... you are trying to create two separate LACP trunks?
Are you not trying to create a single LACP trunk (group)?

I'm not sure that Cisco supports LACP short timers, you should try configuring them for long timers.

Good Luck!
« Last Edit: January 23, 2012, 05:37:18 PM by Michael McNamara »
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline dstegall

  • Rookie
  • **
  • Posts: 8
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #2 on: January 20, 2012, 11:02:37 PM »
Hi Michael,

Thanks for the reply!  I really appreciate it!

See the attached drawing. It might help explain what I'm trying to do better than words.  Basically, we have two Cisco fabric extenders and I want to create two separate LACP LAGs between them and one of our 8600's.  We have two 8600's in an IST pairing, with our IDF stacks in RSMLT links (if that matters here).  The second 8600 is in a remote location and will not be connected to the UCS system.



(EDIT: Just realized that I made an error in my graphic.  Port Channel "A" goes with Extender "A" and same with the "B" side.  I reversed it in my graphic.)

What I was seeing after trying the config I posted above is intermittent, significant packet loss on those LACP uplinks. 

I will try long timers.  I am working remote at the moment and have limited access to my network.  What is the CLI syntax for long timers?  "lacp timeout long"?

BTW, this is a fantastic resource for Nortel/Avaya owners.  Can't believe I just discovered this.  I'm sure you don't remember this, but I believe we've actually spoken before. We were one of the first customers to run 8648 GBRS modules and we had some issues.  You invited me to a beta tester group and were extremely helpful in solving our problem!
« Last Edit: January 20, 2012, 11:23:32 PM by dstegall »

Online Dominik

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 661
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #3 on: January 23, 2012, 06:58:38 AM »
Hi dstegall,

you can change the config for the LACP timers with this commands:

config ethernet x/x lacp timeout <long/short>
>>Sets the time-out value to either long or short for this port.

With normal Cisco IOS Catalyst switches it is not needed to change these timers.

Do you have enabled discard untagged frames on your ERS8600 ?
If you have enabled that all the native VLAN packets would be shown as drops in the statistics.

You can also try to use a static LinkAggregation instead of the LACP trunk.
Not sure if it possible to configure a etherchannel on Cisco UCS Fabric Extender.

Good Luck
It´s always the network...

Offline dstegall

  • Rookie
  • **
  • Posts: 8
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #4 on: January 23, 2012, 10:45:13 AM »
Hi Dominik,

Thanks for the reply.  Appreciated.

I do not have Discard Untagged Frames enabled on the ports I wish to group in to LAGs.  Can you explain why that might help?

Can you explain why link aggregation might work better than LACP here?  And what doc should I be looking at for sample configs?  I have the ERS 8600 Link aggregation doc, but it's a bit confusing to me.  It lists Link Aggregation as MLT with LACP, which is what I was trying in the config I posted earlier.  Can you elaborate or detail a sample config?

Hi dstegall,

you can change the config for the LACP timers with this commands:

config ethernet x/x lacp timeout <long/short>
>>Sets the time-out value to either long or short for this port.

With normal Cisco IOS Catalyst switches it is not needed to change these timers.

Do you have enabled discard untagged frames on your ERS8600 ?
If you have enabled that all the native VLAN packets would be shown as drops in the statistics.

You can also try to use a static LinkAggregation instead of the LACP trunk.
Not sure if it possible to configure a etherchannel on Cisco UCS Fabric Extender.

Good Luck

Online Dominik

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 661
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #5 on: January 23, 2012, 11:14:55 AM »
Take a look at the Avaya / Cisco Interoperability Technical Configuration Guide:

http://support.avaya.com/css/P8/documents/100123888

Here you can find a sample configuration. In the guide they used a Cisco Catalyst device. Maybe your UCS has a different behaiviour in some points, not sure about that.

Sometimes you see high counters at dropped packets in fact of "discard untagged frames" when the other side is sending tuntagged packets wich is in Cisco speech native vlan communiction.
Here the Cisco native VLAN and ERS8600 PVID should set to the same VLAN ID.

Cheers

It´s always the network...

Offline dstegall

  • Rookie
  • **
  • Posts: 8
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #6 on: January 24, 2012, 08:52:53 PM »
Just in case anyone was interested, or is searching the forum for this fix at a later time, it appears that "long" timers did the trick.  Still testing, but I'm not seeing the same port flap I was seeing previously.

Core1:5# config ethernet 1/12,1/14,2/11,2/13 lacp timeout long

Can anyone either explain the technical difference between short and long timers, or point me to a document that does?  The LACP/MLT/Link Aggregation doc I have mentions them, but doesn't go into any real detail.  I'd like to understand this better.

Also, since I am still testing this solution, I will also mention that an CCIE I was conferring with thinks the problem has something to do with Cisco's IP hash load sharing.  I think I get what that means, and how those hashes are calculated, but I don't see the connection between that and the LACP config on our ERS 8600.  Anyone Cisco experts care to weigh in?


Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #7 on: January 24, 2012, 09:23:32 PM »
It's really not a Cisco question.... it's more a question of standards. Unfortunately I was unable (to quickly) find either the 802.3ad or 802.3ax IEEE drafts so I could verify for myself what the timers actually are. I believe they might be 30 seconds but I'm not sure.

The short and long timers referred to in the Avaya/Nortel configuration refer to the interval at which the LACPDUs are transmitted between the switches. The LACPDUs frames are used as a heartbeat to verify that both switches are still alive and well.  You essentially end up with a timer mismatch between the two switches. The switch configured with short timers will declare the link down/unavailable because it hasn't received any LACPDUs frames in the proper amount of time. This essentially causes the link to flap as the it will eventually receive a LACPDUs frame and then put the link back into service.

As I previously mentioned Cisco doesn't support short LACP timers so you need to configure Avaya/Nortel switches with long LACP timers.

Cheers!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #8 on: January 24, 2012, 09:31:49 PM »
I got my Google working again...

http://standards.ieee.org/getieee802/download/802.1AX-2008.pdf


current_while_timer
This timer is used to detect whether received protocol information has expired. If Actor_Oper_State.LACP_Timeout is set to Short Timeout, the timer is started with the value Short_Timeout_Time. Otherwise, it is started with the value Long_Timeout_Time (see 5.4.4).

All timers specified in this subclause have an implementation tolerance of ± 250 ms.
Fast_Periodic_Time
The number of seconds between periodic transmissions using Short Timeouts.
Value: Integer
1

Slow_Periodic_Time
The number of seconds between periodic transmissions using Long Timeouts.
Value: Integer
30

Short_Timeout_Time
The number of seconds before invalidating received LACPDU information when using Short
Timeouts (3 × Fast_Periodic_Time).
Value: Integer
3

Long_Timeout_Time
The number of seconds before invalidating received LACPDU information when using Long
Timeouts (3 × Slow_Periodic_Time).
Value: Integer
90

We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline dstegall

  • Rookie
  • **
  • Posts: 8
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #9 on: January 24, 2012, 10:02:58 PM »
Thank you.  BEST. FORUM. EVER!

So Cisco (or, at least the UCS fabric extenders) default to 90 seconds before invalidating rx'd LACPDUs, and my short timer config was causing a mismatch.  That seems to fix exactly what I was seeing. 

Thank you everyone!

Online Dominik

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 661
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #10 on: January 25, 2012, 04:08:14 AM »
So it looks like as if Ciscos UCS series chassis wich are running the same code as the Nexus switches have
a differenet LACP behavior / implemantation than the iOS based devices.

It´s always the network...

Offline dstegall

  • Rookie
  • **
  • Posts: 8
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #11 on: March 22, 2012, 03:27:34 PM »
Hi folks,

So, update.  Maybe this didn't fix my problem after all.  I'm just now getting back to working on the UCS system, after being pulled to work on something else.  The port flap I was seeing previously is definitely gone, but I am getting major packet loss one way.

If I ping to any of our esx hosts residing in the UCS from my workstation, I don't see any loss.

If I ping from an esx host to my workstation, or even to it's subnet gateway, I get significant, intermittent packet loss.  Curiously, it's not consistent, though.  One attempt might be totally successful, the next 100% loss.

For example, our esx host has an address of 10.0.1.101, residing on 10.0.1.0/24.

I can ping that address from my workstation, no problems.

If I ping 10.0.1.1, the subnet gateway, from the esx host, I get packet loss.  If I ping from the esx host to one of our corp DNS servers, 1 server will respond, the other won't, even though they are on the same subnet (and they are most definitely both up and functional).  There is no obvious culprit like a misconfigured network adapter or virtual switch.

So I started reading the interop guide for Avaya/Cisco, wondering if my config has an error.  My config is as above.  I am wondering if I might be seeing some kind of load-balancing hash conflict.  The Cisco USC does src-dst-ip.  I don't know what kind of MLT balancing mode the ERS 8600 uses by default.

Any other ideas what else I could check?  I'm really stumped.  I have a case up on this issue with Avaya, but so far they have not responded.


Offline dstegall

  • Rookie
  • **
  • Posts: 8
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #12 on: March 22, 2012, 03:40:19 PM »
I also noticed this, but unsure if I makes a huge difference or not.  Note, that the UCS fabric extenders are appliance-like.  I don't know that I can change it's portchannel config settings beyond what is available in the java-based GUI.  For example, there is nowhere I can see to specify a key ID.

Notice the key mismatch:

Core1:5# show ports info lacp actor-admin port 1/12

================================================================================
                                  Actor Admin
================================================================================
INDEX SYS   SYS               KEY   PORT  PORT  STATE
         PRIO  ID                                         PRIO
--------------------------------------------------------------------------------
1/12  32768 00:23:0d:b8:10:00 16    0x4b    32768 act        long aggr

Core1:5# show ports info lacp partner-oper port 1/12

================================================================================
                              Partner Operational
================================================================================
INDEX SYS   SYS               KEY   PORT  PORT  STATE
         PRIO  ID                                         PRIO
--------------------------------------------------------------------------------
1/12  32768 54:7f:ee:39:44:fc 0     0x101   32768 act        long aggr sync col dis

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 2503
    • Michael McNamara
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #13 on: March 22, 2012, 10:47:29 PM »
I believe the UCS 6100 Series Fabric Extenders are essentially Cisco Nexus 5000s under the skins running NX-OS.

Assuming they are in a VPC configuration they should use the same LACP key as long as each port is associated with the same VPC number.

I'm guessing if you run with a single port your issues disappear?

Good Luck!
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline dstegall

  • Rookie
  • **
  • Posts: 8
Re: ERS 8600 LACP trouble with Cisco UCS fabric extenders
« Reply #14 on: March 23, 2012, 03:27:18 PM »
Thanks for the reply.  Can you clarify two points for me?

1.) Do you mean that the UCS Fabric extenders should reflect the same LACP key as on the Avaya-side if VPC configuration is correct?  Meaning, I should see matching keys on both sides?

2.) By single port, do you mean a single port in the Link Aggregation Group (by pulling out all cables but one), or if I destroy the PortChannel and literally configure it to work over a single port?