Hello everyone,
Found this site while looking for guidance. Any suggestions would be beneficial. I am not really all that familiar with troubleshooting on this device and not very well versed in the CLI as well. Mostly use the JDM.
We have a 7yr old Passport(ERS now I guess) 8310. It has 5 8348TX, 1 8324GTX, 2 8348TX-PWR cards and 2 8393SF CPU cards. It has not been restarted for several years. It is running an old 3.0.2.0 OS release. There are a few baystack stacks connected via gBic.
For the past several months we have been experiencing layer 3 issues. At very random times throughout the day and night the CPU will spike to 70-90%(normal is 5-10%) for 10-15s. When that happens layer 3 latencies go way up and any Nortel/Avaya IP phones that are on different subnets( eg remote site with BCM etc or diff vlan) will reboot.
There is not much in the log(see below). Just the ports going up and down from the phones rebooting. I have been trying to see what process is spiking but cant seem to find out how to do that.
What else should I be looking for? Could it be network traffic? If I graph the chassis in JDM I don't see anything drastic happening at the time of the CPU spikes. I am getting some performance metrics with Zenoss but it is not giving much insight. A copy of some of the counters are below. This is over a couple mins as the counters(exept absolute value) will start over sometimes where there is a cpu spike.
Not sure if a reboot would correct the issue but would like to know the cause first? I am sure it is in need of an OS upgrade as well but we do not have support currently so not an option. Could it be too much network traffic for the device? Should we be looking at replacing? Any help would be greatly appreciated.
Thanks in advance!
Scott
Counters:
AbsoluteValue Cumulative Average/sec Minimum/sec Maximum/sec LastVal/sec
InReceives 948,925,508 2237.0 14.814569536423841 12.2 19.7 12.8
InHdrErrors 1,651,596 2.0 0.013245033112582781 0.0 0.2 0.0
InAddrErrors 261,157 0.0 0.0 0.0 0.0 0.0
ForwDatagrams 808,229,428 1609.0 10.655629139072847 3.2 14.8 9.0
InUnknownProtos 0 0.0 0.0 0.0 0.0 0.0
InDiscards 6,435,153 2.0 0.013245033112582781 0.0 0.09523809523809523 0.0
InDelivers 41,286,263 330.0 2.185430463576159 1.9 3.0 1.9
OutRequests 42,500,660 331.0 2.19205298013245 1.9 2.9 1.9
OutDiscards 0 0.0 0.0 0.0 0.0 0.0
OutNoRoutes 0 0.0 0.0 0.0 0.0 0.0
FragOKs 0 0.0 0.0 0.0 0.0 0.0
FragFails 0 0.0 0.0 0.0 0.0 0.0
FragCreates 0 0.0 0.0 0.0 0.0 0.0
ReasmReqds 0 0.0 0.0 0.0 0.0 0.0
ReasmOKs 0 0.0 0.0 0.0 0.0 0.0
ReasmFails 0 0.0 0.0 0.0 0.0 0.0
Log:
2012-01-10 08:37:36 Local7.Info 172.16.10.101 CPU5 [01/10/12 09:31:35] SNMP INFO Spanning Tree Topology Change(StgId=1, PortNum=2/13, MacAddr=00:11:f9:b7:c0:01)<000>
2012-01-10 08:57:53 Local7.Info 172.16.10.101 CPU5 [01/10/12 09:51:58] HW INFO portLinkUpEvent starting 01/10/12 09:51:58 on ports 2/27<000>
2012-01-10 08:58:27 Local7.Info 172.16.10.101 CPU5 [01/10/12 09:52:32] SNMP INFO Spanning Tree Topology Change(StgId=1, PortNum=2/27, MacAddr=00:11:f9:b7:c0:01)<000>
2012-01-10 10:47:19 Local7.Info 172.16.10.101 CPU5 [01/10/12 11:41:24] HW INFO portLinkDownEvent starting 01/10/12 11:41:24 on ports 3/20<000>
2012-01-10 10:47:21 Local7.Info 172.16.10.101 CPU5 [01/10/12 11:41:26] HW INFO portLinkUpEvent starting 01/10/12 11:41:26 on ports 3/20<000>
2012-01-10 10:47:49 Local7.Info 172.16.10.101 CPU5 [01/10/12 11:41:55] SNMP INFO Spanning Tree Topology Change(StgId=1, PortNum=3/20, MacAddr=00:11:f9:b7:c0:01)<000>
2012-01-10 10:55:11 Local7.Info 172.16.10.101 CPU5 [01/10/12 11:49:16] HW INFO portLinkDownEvent starting 01/10/12 11:49:16 on ports 8/6<000>
2012-01-10 10:55:14 Local7.Info 172.16.10.101 CPU5 [01/10/12 11:49:19] HW INFO portLinkUpEvent starting 01/10/12 11:49:19 on ports 8/6<000>
2012-01-10 10:55:43 Local7.Info 172.16.10.101 CPU5 [01/10/12 11:49:48] SNMP INFO Spanning Tree Topology Change(StgId=1, PortNum=8/6, MacAddr=00:11:f9:b7:c0:01)<000>
Editor: updated post with tt tags for readibility