This past weekend we finally replaced our unstable 5510 stack with a new 5510 stack. The old stack was running 5.1.4 and the new stack was running 6.1.2. After configuring the 6.1.2 stack in a lab environment, we made the swap. We have the advanced license to use OSPF for compatibility with our Cisco routers also running OSPF. Initially we thought everything was working fine. OSPF neighbors were created with the Cisco routers and firewalls. However, we determined that OSPF was "flapping". It wouldn't keep the connection active for longer than 10-15 seconds and the routes were lost.
We opened a case with Nortel and quickly got it escalated. The techs recommended we downgrade from 6.1.2 to 5.1.5 since OSPF was working properly in our environment with 5.1.4. However, there was no way to keep the config when downgrading from 6.1.2 to 5.1.5 and the config had to be manually applied. After the config was applied, OSPF neighbors wouldn't even even go into a full state. They were always going into a ExchangeStart mode and then the Cisco timeout forced them into an init state. After gathering some packet captures, we could see the Cisco sending the DBDs to the Nortel but according to show ip ospf ifstats, it had received none.
There were three Nortel units in the stack. The Cisco devices were plugged into access ports on Units 2 and 3. A tech had the idea to fail base over to unit 2 and voila, OSPF neighbors were created with the Cisco devices on Unit2. A tech thought that the Base switch was bad and wanted to replace it again. By that point, we had already been working on the stack replacement for several hours followed by a 6-7 support call. I was mentally exhausted after the 15 hour day and wanted to resume the support case the next day after I had a chance to replace the base switch.
The next day we replaced the base switch and updated our advanced license file. However, the results were the same. When unit1 was the base, OSPF neighbors wouldn't go past the ExchangeStart mode. Failing base to Unit2 would create the neighbors and routing was restored. We finally came to the conclusion that the OSPF neighbors HAD to be on an access port on the Base switch.
At least everything is back up and working properly... What a nightmare of a weekend.