APD and PDL are few general terms you hear when working on a VMware environment, this is yet another blog on how to handle the situation of APD how to avoid those situations and how to get out of PDL situation
All-Paths-Down (APD) situation occurs when all paths to a device are down. As there is no indication whether this is a permanent or temporary device loss, the ESXi host keeps reattempting to establish connectivity. APD-style situations commonly occur when the LUN is incorrectly unpresented from the ESXi/ESX host. The ESXi/ESX host, still believing the device is available, retries all SCSI commands indefinitely. This has an impact on the management agents, as their commands are not responded to until the device is again accessible. This causes the ESXi/ESX host to become inaccessible/not-responding in vCenter Server.
In ESXi 5.x and 6.x we have another situation called PDL.
Esxi considers device loss permanent. It can be caused by making a LUN inaccessible to a host, either by unmapping or deleting it. In this case, the storage array informs the host of a PDL state through a SCSI command response. The removal is considered permanent when all paths have the PDL error. PDL was introduced in 5.1
In vSphere 6.0, VMware has enhanced APD and PL and introduced vSphere VMCP (VM component protection) feature. If an APD or a PDL condition occurs and VM is running on a host which has got connectivity issue with a datastore, HA will kick in and restart that VM to other host which has not connectivity issue with the same storage.
There are two variants of PDL, planned and unplanned:
- Planned PDL is when the administrator follows the recommend workflow to remove a storage device (https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004605)
- Unplanned PDL is when the storage administrator just removes a storage device (at the storage array)
Let’s see in detail about the VMCP, how to interpret a PDL and how to resolve it in upcoming articles