• September 25, 2017, 10:05:30 PM
Welcome, Guest. Please login or register. Registration is free.
Did you miss your activation email?

Author Topic: No LACP ID on links, but PDUs being exchanged  (Read 509 times)

0 Members and 1 Guest are viewing this topic.

Offline darkfader

  • Rookie
  • **
  • Posts: 7
No LACP ID on links, but PDUs being exchanged
« on: April 09, 2017, 01:49:38 PM »

I've got a little issue here...

We're connecting Linux servers using LACP to a switch stack.
There was quite a few issues with VM connectivity which all stopped after I removed 3/4 cables from the LACP bundles on both servers.
I'm now trying to figure out what's gone wrong.

First thing I found, following a recent VLAN addition, there was no
lacp aggregation enable on the server's ports anymore.
I've added it back, but can see it's not correctly coming up:
There's no MLT trunk (dynamic) info shown, and from the examples in the LACP Configuration examples blog posts I can clearly see this should exist even when connected to non-Avaya gear.
In LACP status  I also see there's no ID for those aggr's. I'll post all of this in hope someone knows what I'm missing.


4526GTX#show stack-info
Unit# Switch Model     Pluggable Pluggable Pluggable Pluggable SW Version
                         Port      Port      Port      Port             
----- ---------------- --------- --------- --------- --------- ----------
1     4526GTX          (21) None (22) None (23) None (24) None v5.5.0.003
                       (25) Unsp (26) Unsp
2     4526GTX          (21) None (22) None (23) None (24) None v5.5.0.003
                       (25) Unsp (26) Unsp
3     4548GT           (45) None (46) None (47) None (48) None v5.5.0.003

Linux VM host using OpenVSwitch and LACP.

[root@nw01 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 1
        Actor Key: 17
        Partner Key: 12533
        Partner Mac Address: 00:xxx

Slave Interface: eth0
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: xxx
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: xxxx
Aggregator ID: 2
Slave queue ID: 0

I've skipped the other two ports, they're unplugged at the moment, but members of the bond.
I've also rebooted the VM host I'm showing output from to make sure it's "clean".

Here we only see the connection to our second switch stack:
4526GTX# show lacp aggr
Aggr ID Trunk Status   Type   Members
------- ----- -------- ------ -------------------
8224    32    Enabled  LA     1/25-26

Here you can see the LACPDU exchange is happening and they got a good idea who's on the other side.

4526GTX#show lacp debug member 1/1-4 
Unit/Port AggrId TrunkId Rx State      Mux State     Partner Port
--------- ------ ------- ------------- ------------- ------------
1/1       0              PortDisabled  Detached             
1/2       0              PortDisabled  Detached             
1/3       0              Current       Attached      2       
1/4       0              PortDisabled  Detached             

This shows the main difference - lack of ID and key

4526GTX#show lacp port 1/1-4,1/25-26
                                       Admin Oper         Trunk Partner
Unit/Port Priority Lacp    A/I Timeout Key   Key   AggId Id    Port    Status
--------- -------- ------- --- ------- ----- ----- ----- ----- ------- -------
1/1       32768    Active  A   Short   245   12533 0                    Down
1/2       32768    Active  A   Short   245   12533 0                    Down
1/3       32768    Active  A   Short   245   12533 0            2       Active
1/4       32768    Active  A   Short   245   12533 0                    Down
1/25      32768    Active  A   Short   1234  17618 8224   32    26      Active
1/26      32768    Active  A   Short   1234  17618 8224   32    25      Active

mlt status only contains the 1/25-26 "backbone" link.

4526GTX#show mlt
Id Name             Members                Bpdu   Mode           Status  Type
-- ---------------- ---------------------- ------ -------------- ------- ------
32 Trunk #32        1/25-26                Single DynLag/Basic   Enabled Trunk

I'm not sure where to go with this...
Pretty certain it's not a server side issue.

Here's the VLAN/LACP/port-specific bits of my config.

spanning-tree op-mode rstp
vlan create <numbers> type port
vlan ports 1/1-8 tagging unTagPvidOnly filter-unregistered-frames disable
vlan configcontrol flexible
vlan ports 1/1-8 pvid <number>
vlan configcontrol autopvid

! *** LACP ***
interface fastEthernet ALL
lacp key port 1/1-4 245
lacp key port 1/5-8 246
lacp key port 1/9-10 186
lacp key port 1/11-12 187
lacp key port 1/25-26 1234
lacp timeout-time port 1/1-12,1/25-26,2/25-26,3/13-16 short
lacp mode port 1/1-12,1/25-26,3/13-16 active
lacp aggregation port 1/1-8,1/25-26,2/25-26,3/13-16 enable

(9-12 used to be LACP but aren't right now, so ignore those.
1/1-8 are enabled though and should be OK i think.
1-4 is the server we looked at, 5-8 the other one)

! *** RSTP (Phase 2) ***
interface FastEthernet ALL
spanning-tree rstp port 1/1-24 edge-port true

So, I think it should be enabled correctly, but it's not there.
I got a feeling I would maybe get the trunk back if I shut down the server and plug in all cables.
that's not really dynamic LACP and it would also not explain some other bad feelings I got with this.
I also got the feeling if I'd reboot my stack it would all be nice. but that's just not helping in the long term, so I won't do it.

I would be really keen on your feedback.
My only other option is to switch to "mode 6" standards-violating mode on the linux end.
That means adaptive load balancing, no LACP at all.
I've seen this not react reasonably fast if a link is removed.
It gains more throughput (due to layer4 hashing on both ends), but I don't like it a lot.

Any of you got some thoughts on this?

Offline Michael McNamara

  • Administrator
  • Hero Member
  • *****
  • Posts: 3818
    • michaelfmcnamara
    • Michael McNamara
Re: No LACP ID on links, but PDUs being exchanged
« Reply #1 on: April 10, 2017, 08:10:02 AM »
In the past you would need to disable LACP before you could add/remove a VLAN... I would check your configuration again, not sure if newer versions of software try to disable/enable while making the VLAN change.
We've been helping network engineers, system administrators and technology professionals since June 2009.
If you've found this site useful or helpful, please help me spread the word. Link to us in your blog or homepage - Thanks!

Offline darkfader

  • Rookie
  • **
  • Posts: 7
Re: No LACP ID on links, but PDUs being exchanged
« Reply #2 on: April 10, 2017, 08:34:11 AM »

yes, that's what I did.
  • disable LACP
  • add VLANs
  • forgot to re-enable it
  • finally enabled it
  • found it's not working anyway

The problem is that it is enabled, but not getting an ID, no MLT number is assigned, etc.
« Last Edit: April 10, 2017, 08:36:35 AM by darkfader »

Offline darkfader

  • Rookie
  • **
  • Posts: 7
Re: No LACP ID on links, but PDUs being exchanged
« Reply #3 on: August 15, 2017, 08:40:55 PM »

FYI I was able to reproduce the issue and fix it:

set the LACP keys once more on the ports.

So my normal procedure was - with a fully configured and working LACP trunk as starting point:
no lacp aggr enable
add new vlans to the ports
lacp aggr enable

Last weekend we had to add some more VLANs on our backbone links (2x10g between the two stacks) and yes, the show mlt output stopped even listing those two.
Out of paranoia I re-did some of the LACP settings, including the LACP admin key.

4526GTX(config)#interface fastEthernet 1/25-26
4526GTX(config-if)#lacp aggregation enable
4526GTX(config-if)#lacp key 42
4526GTX(config-if)#lacp mode active
4526GTX(config-if)#lacp timeout-time short

With that extra step the bug is avoided.
For clarity - in a show running-config you'd still see the LACP key. but something was off, and re-configuring the same key does fix it.
« Last Edit: August 15, 2017, 08:43:40 PM by darkfader »