Issue with "Detected Tx Unit Hang" dropping network connections

September 23, 2013, 12:41 pm

Latest and popular articles on Intel Technologies

≫ Next: No Intel(r) Adapters are present in this computer

≪ Previous: VLAN creation on Windows 10 Enterprise TP

Hello,

We are having an issue with our NICs getting a TX Unit Hang and the adaptor not resetting correctly. The below error messages are displayed to the console at a vigorous rate and all networking stops. Connecting via IPMI I've found that "service network restart" doesn't resolve the issue. I've found the following steps do work: service network stop; rmmod ixgbe; modprobe ixgbe; service network start. Then everything goes back to normal for some random number of hours (or in some cases days) until it happens again. If anyone has any insight or history with this issue I'd love any input. Also I'd be happy to provide more details where needed.

Thanks,

Matthew

The details:

kernel: 2.6.32-358.6.2.el6

Intel diver versions tested: 3.9.15-k (CentOS stock), 3.17.3 (latest version)

Adaptor: Ethernet controller: Intel Corporation 82599EB 10-Gigabit Network Connection (rev 01)

Subsystem: Intel Corporation Ethernet Server Adapter X520-2

The error messages from /var/log/messages (and dmesg):

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: Detected Tx Unit Hang

[kern.err] [kernel: .]: Tx Queue <2>

[kern.err] [kernel: .]: TDH, TDT <0>, <1a>

[kern.err] [kernel: .]: next_to_use <1a>

[kern.err] [kernel: .]: next_to_clean <0>

[kern.err] [kernel: .]: tx_buffer_info[next_to_clean]

[kern.err] [kernel: .]: time_stamp <101fd8552>

[kern.err] [kernel: .]: jiffies <101fd8d43>

[kern.info] [kernel: .]: ixgbe 0000:08:00.1: eth3: tx hang 301 detected on queue 2, resetting adapter

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: Reset adapter

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 8 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 9 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 10 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: eth3: RXDCTL.ENABLE on Rx queue 11 not cleared within the polling period

[kern.err] [kernel: .]: ixgbe 0000:08:00.1: master disable timed out