Nutanix -NIC firmware upgrade issue

Recently I have noticed that when using LCM to upgrade NIC firmware , the NIC gets upgraded on the first host (if you have selected multiple hosts) , even if you select one host it has the same behavior, and then it gets stuck in the upgrade window (not phoenix window!) .I had to do ctrl+alt+delete to restart the host and then use the”python /phoenix/reboot_to_host.py” to take the hypervisor out of the phoenix mode.
when I reviewed genesis.out and lcm.log, I could see:

LCM is not able to boot into the phoenix mode after finishing the actual upgrade.
after opening a support ticket with Nutanix, it seems this is a known issue, which is due to slower than expected initialization of the disks, and when mdadm is called to assemble the RAID arrays the disks are not yet ready.
Due to this the node boots into an inconsistent phoenix state and Stucks with the mentioned error “

2023-11-02 07:58:44,122Z ERROR 87050992 foundation_manager.py:254 (172.16.140.34, kLcmUpdateOperation, 442ccacb-887d-4e7a-5c53-944c5aede97d) The node [172.16.140.34] is neither in hypervisor nor in phoenix

2023-11-02 07:58:44,123Z ERROR 87050992 foundation_manager.py:569 (172.16.140.34, kLcmUpdateOperation, 442ccacb-887d-4e7a-5c53-944c5aede97d) Phoenix doesnt come back up [172.16.140.34]

2023-11-02 07:58:44,123Z ERROR 87050992 lcm_ops_by_phoenix:1760 (172.16.140.34, kLcmUpdateOperation, 442ccacb-887d-4e7a-5c53-944c5aede97d) Phoenix doesnt come back up [172.16.140.34]

2023-11-02 07:58:44,124Z ERROR 87050992 lcm_ops_by_phoenix:1688 (172.16.140.34, kLcmUpdateOperation, 442ccacb-887d-4e7a-5c53-944c5aede97d) Failed to cold_reboot the node for task_list [‘442ccacb-887d-4e7a-5c53-944c5aede97d’].

2023-11-02 07:58:44,124Z ERROR 87050992 lcm_ops_by_phoenix:773 (172.16.140.34, kLcmUpdateOperation, 442ccacb-887d-4e7a-5c53-944c5aede97d) Phoenix doesnt come back up [172.16.140.34]Failed to cold_reboot the node for task_list [‘442ccacb-887d-4e7a-5c53-944c5aede97d’]

This issue was witnessed in LCM earlier than 2.7 (which was released on 16th of November 2023).to sort out the issue, upgrade the LCM to 2.7 (make sure the foundation is up to date) and run the LCM one more time before the upgrade of the firmware which was not working properly.

Ahmad Jamali,

Leave a comment