A few months ago, my team ran into a problem when we tried to mount a VMFs datastore with ISCSI backed LUN to particular hosts in a cluster. At the time all hosts were running VMware ESXi 6.7 U3
We had purchased a few new servers and for some reason, the VMFS datastore which was already connected to the existing servers in the current cluster was not showing up properly.
The ISCSI device shows up in the storage devices section of the vSphere client but the datastore displays “Not consumed”
We had escalated to VMware support but they pointed us to the storage vendor referring to the logs that show that the connection between the two was being dropped. We had also tried formatting the datastore but that didn’t work either. It just gave an error – “Failed to create VMFS datastore Test – Cannot change the host configuration“
I decided to set up a nested ESXi to try to simulate the issue since we didn’t have any spare servers to test it on. I downloaded the appliances from William Lam’s site – virtuallyghetto – https://www.virtuallyghetto.com/nested-virtualization/nested-esxi-virtual-appliance
Decided to try out 6.7 then 7.0. For some reason the appliance deployment for 6.7 failed so I went ahead to use the 7.0 appliance.
After installing and configuring the hypervisor as I would the physical server, I went on to configure the iSCSI software adapter as well and voila! I could see, mount and create a datastore from the LUN.
I went on to upgrade one of the hosts that were previously not connecting and it worked. I guess the conclusion is that the issue, whatever it was, was fixed in v7.0
After upgrading a few hosts we realized the issue still occurs on some of the hosts. This meant that the issue was not with the version of ESXi.
We took another look at the hosts that were working and compared them with those that were not and realized that it appears the MTU was different. Some hosts had 9000 and some others had 1500. The MTU was really not consistent all through.
We changed the MTU on the hosts that were not connecting to the storage and that solved the issue. It was amazing that after a few weeks of troubleshooting, the issue rested on the MTU.
Anyway, unto the next discovery….