Hello,
We are having an issue with our NICs getting a TX Unit Hang and the adaptor not resetting correctly. The below error messages are displayed to the console at a vigorous rate and all networking stops. Connecting via IPMI I've found that "service network restart" doesn't resolve the issue. I've found the following steps do work: service network stop; rmmod ixgbe; modprobe ixgbe; service network start. Then everything goes back to normal for some random number of hours (or in some cases days) until it happens again. If anyone has any insight or history with this issue I'd love any input. Also I'd be happy to provide more details where needed.
Thanks,
Matthew
The details:
kernel: 2.6.32-358.6.2.el6
Intel diver versions tested: 3.9.15-k (CentOS stock), 3.17.3 (latest version)
Adaptor: Ethernet controller: Intel Corporation 82599EB 10-Gigabit Network Connection (rev 01)
Subsystem: Intel Corporation Ethernet Server Adapter X520-2
The error messages from /var/log/messages (and dmesg):
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: Detected Tx Unit Hang
[kern.err] [kernel: .]: Tx Queue <2>
[kern.err] [kernel: .]: TDH, TDT <0>, <1a>
[kern.err] [kernel: .]: next_to_use <1a>
[kern.err] [kernel: .]: next_to_clean <0>
[kern.err] [kernel: .]: tx_buffer_info[next_to_clean]
[kern.err] [kernel: .]: time_stamp <101fd8552>
[kern.err] [kernel: .]: jiffies <101fd8d43>
[kern.info] [kernel: .]: ixgbe 0000:08:00.1: eth3: tx hang 301 detected on queue 2, resetting adapter
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: Reset adapter
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 8 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 9 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 10 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 11 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: master disable timed out
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 8 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 9 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 10 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 11 not cleared within the polling period
[kern.info] [kernel: .]: ixgbe 0000:08:00.1: eth3: detected SFP+: 4
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: Reset adapter
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 8 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 9 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 10 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 11 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: master disable timed out
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 8 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 9 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 10 not cleared within the polling period
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 11 not cleared within the polling period
[kern.info] [kernel: .]: ixgbe 0000:08:00.1: eth3: detected SFP+: 4
[kern.info] [kernel: .]: ixgbe 0000:08:00.1: eth3: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: Detected Tx Unit Hang
[kern.err] [kernel: .]: Tx Queue <2>
[kern.err] [kernel: .]: TDH, TDT <0>, <2>
[kern.err] [kernel: .]: next_to_use <2>
[kern.err] [kernel: .]: next_to_clean <0>
[kern.err] [kernel: .]: tx_buffer_info[next_to_clean]
[kern.err] [kernel: .]: time_stamp <101fd91c6>
[kern.err] [kernel: .]: jiffies <101fd9257>
[kern.info] [kernel: .]: ixgbe 0000:08:00.1: eth3: tx hang 303 detected on queue 2, resetting adapter