Dell PowerEdge NIC do not hardware shutdown after nic shutdown on the OS

Published by Valentin on

Earlier this year, during a VMware Cloud Foundation (VCF) deployment on VxRail, something unexpected happened during a standard high availability (HA) test.

At the end of each deployment, a failover test usually runs by shutting down physical NICs one by one to confirm that every layer of the VMware stack correctly handles link redundancy failures.

Here is the PowerCLI/ESXCLI command used to bring down a specific NIC on all hosts in a cluster:

Get-Cluster -Name ClusterName | get-VMhost | foreach {$esx=$_;$esxcli = Get-EsxCli -VMHost $esx -V2; if(($esxcli.network.nic.list.invoke()).Name -contains 'vmnic5'){$esxcli.network.nic.down.Invoke(@{nicname='vmnic5'})}}

After running this command, the usual process is:

  • Ask the network team to confirm that the corresponding switch port is down, or
  • Check the external monitoring/observability tool.

Then, verify that each part of the stack behaves correctly:

  • vSAN: Check performance and confirm there are no new health alerts.
  • vMotion: Perform a vMotion and verify it completes successfully.
  • Management: Confirm that management connectivity is stable and that pings are not dropped.
  • NSX: Check NSX health and ping VMs on the overlay networks.

This time, though, the surprise was big: the network engineer replied,
“The port is still up on the switch.”

  • From the ESXi point of view, vmnic5 was down.
  • From the switch’s point of view, the physical port was still up. So effectively, only the OS thought the link had failed.
  • From the iDRAC point of view, port stay also up in the system overview network devices filter

Debugging and impact

After some debugging, a support case was opened to understand the full impact.

The behaviour was reproduced with:

  • Network adapters
    • Intel E810-C
    • Intel E810-XXV
    • Broadcom NetXtreme E-Series
  • Operating systems
    • VMware ESXi
    • Debian
    • Red Hat Enterprise Linux

Firmware and driver updates were tested, but nothing changed. The OS could logically shut down the interface, but the physical link on the switch stayed up. Monitoring tools and network teams had no visibility of the “failure”.

Root cause: iDRAC parameter

In the end, the issue was caused by a setting in the Dell iDRAC:

  • Parameter: PermitTotalPortShutdown
  • Required value: Enabled

If PermitTotalPortShutdown is not enabled, the operating system can logically disable the NIC, but the hardware will not bring the physical port down. As a result, the switch keeps the interface in an up state.

A few important details:

  • This parameter cannot be configured from the standard iDRAC BIOS setup interface.
  • It must be set using the iDRAC console / Server Configuration Profile (SCP) mechanism.

Once the parameter is correctly configured on one PowerEdge server, the configuration can be exported and then imported to other PowerEdge servers, selecting only the network interface section.

Dell provides a useful video on this process: This Video

How to fix it with Server Configuration Profiles

Step 1: Export the configuration

On the iDRAC of a reference host:

  1. Go to Configuration → Server Configuration Profile → Export.
  2. Choose Save locally and download the configuration file.

This file will contain the NIC configuration, including the PermitTotalPortShutdown parameter once set.

Step 2: Import the configuration on other hosts

For each additional PowerEdge host:

  1. Put the ESXi host in maintenance mode.
  2. Open the iDRAC and go to Configuration → Server Configuration Profile → Import.
  3. Select the previously exported configuration file.
  4. In the import options, choose only NIC (or the relevant network devices).
  5. Start the import.
    The host will reboot to apply the configuration.

After this change, when the NIC is shut down by the operating system or fails from the OS perspective, the physical port will also go down at the hardware level. The switch will see the port down, and your network monitoring tools and HA tests will behave as expected.

Categories: Uncategorised

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *