I think the scariest thing for a sysadmin is Windows failing to boot, especially after an unnecessary (?) BIOS update. Recently we got two Lenovo ThinkServer TS150's. They're cheap and work really well, and we quickly put one into production since the old server just suddenly up an died. So I was staring at the other server sitting there doing nothing except having a brand new install of Windows Server 2016, and seeing that the BIOS is dated 2016 and the latest is 2018, so I decided to upgrade the BIOS.
After the flashing was completed and the system rebooted, I got a black screen and this text: Error code 1962 - No operating system found. Well, the BIOS must've been reset to defaults. Go into BIOS, nope, everything is same as before. Maybe there are new BIOS options... tried some different configurations... didn't work... tried every possible configuration... didn't work... tried resetting to defaults... tried optimized setting... tried swearing... nothing worked. Luckily I have the other TS150, but since it's in production I had to wait until midnight to take it offline to look at the BIOS. Compared all settings and they're all identical except for the BIOS dates. Tried all possible configurations again just in case I missed something.
Next, I tried using Linux Live CD's, Windows 10 Live DVD's, and also Window' own rescue mode to flash different versions of the BIOS. Nothing worked. Tried going back to the original BIOS dated from 2016 but it wouldn't let me. Apparently there was a security update in 2017 and they disabled going back to older versions.
Next, I used rescue mode and the bootrec command to tried to fix the boot sectors. No go.
Gave up. Tried installing Windows from scratch. Nope, Windows complains that it can't be installed to this disk because the hardware may not support booting to this disk. Nooooooooooooooo.
Punch reset button in frustration. So while I was tearing my hair out again and pondering what to do next, I suddenly saw the familiar screen.
And next thing I know, Windows Server 2016 was booted up like nothing has happened.
After much head scratching, I discovered the reason it booted was because it was booting from the Windows install DVD, and because I was tearing my hair out and ignoring the server, the "Press any key to boot from CD or DVD..." prompt timed out, and it automatically booted into Windows. Nothing was changed in the BIOS, the boot sequence was correct, and after testing I confirmed that it will only boot Windows if it was booting from the DVD initially then let the prompt time out.
So... apparently after any BIOS update, something somewhere got modified in the boot sector and it would no longer boot correctly. But booting from the DVD then letting it time out seemed like a really strange thing, since this suggested the hard drive's boot sector was still functioning properly, it just wouldn't boot as the first boot device. Tried searching Google for this problem and found thousands and thousands of people with similar problems and no real fix except things I've already tried. Most ended up reinstalling Windows, which didn't work for me.
Well, after a week of even more head scratching, I finally came up with a working and reproducible solution (workaround). The ThinkServer came pre-configured with two hard drives which are configured as RAID1 array using the onboard Intel RSTe. I'm guessing the problem could be related to the Intel chipset and the RAID array configuration, but a BIOS update should not mess it up so much. Anyway, the fix was to remove one of the hard drives from the RAID array by booting into the Intel RSTe configuration screen and selecting the option "Reset Disks to Non-RAID". Remove one of the drives then add it back immediately. After that reboot into Windows using the DVD workaround method above. After the array was re-built, Windows could boot normally again.
Now let me go test this on my production server.
Post a Comment