HP DL20 Gen 9 and Centos 7.4
File this under "I hate you HP".
In 2016 I setup a HP DL20 Gen 9 server with CentOS 7. Everything worked great through multiple upgrades until September 2017. In September I upgraded CentOS 7.3 to 7.4 and immediately found that the machine would not boot with the latest kernel. Rolling back to the previous kernel allowed me to boot so I moved on for the time being. Fast forward a couple months and I decided it was time to tackle the issue. I ran the latest updates and was greeted with the same kernel panic on boot.
Kernel Panic
First thing I did was update the BIOS. I didn't think that would work but what the heck. It was old anyway...
I did a little googling and found that this was a know problem with HP servers and CentOS:
https://bugs.centos.org/view.php?id=13943&nbn=1
HP's solution is to disable RAID and turn on AHCI:
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04655546
I turned on AHCI and rebooted. Good news... The kernel panic is gone. Bad news... now it doesn't see the partitions.
My drives? Where are my drives?!?!
After a bit I realized that the "Recovery" kernel in the boot menu almost worked. All my data was there but I couldn't see
/boot/grub2/grubenv
. I think that's because the recovery Initramfs didn't have vfat support. I proceeded to follow the CentOS instructions to
Create a New Initramfs. Since I had a partially working environment I didn't need to mount disks. I issued the following command to generate a new Initramfs with the latest kernel on my machine.
dracut -f /boot/initramfs-3.10.0-693.5.2.el7.x86_64.img 3.10.0-693.5.2.el7.x86_64
I rebooted and selected this kernel. Nope. Still no go. What was wrong? My drives were there, I could see all my partitions but I still couldn't boot. I stumbled on
this post and realized that my Initramfs must be missing something crucial. I followed the instructions to add AHCI:
dracut --add-drivers ahci -f /boot/initramfs-3.10.0-693.5.2.el7.x86_64.img 3.10.0-693.5.2.el7.x86_64
Reboot... No luck and no improvement. It's not the lack of AHCI or at least not just AHCI.
After a lot of flailing around I found a solution was
"hostonly=no"
. I went in
/etc/dracut.conf.d/
and found
hpdsa.conf
. This was the broken HP raid driver so I deleted the file. Next I created
/etc/dracut.conf.d/all.conf
with the following contents:
hostonly=no
And generated a new Initramfs:
dracut -f /boot/initramfs-3.10.0-693.5.2.el7.x86_64.img 3.10.0-693.5.2.el7.x86_64
This created a new 52Meg Initramfs file rather than the normal 23Meg file. Because of the
hostonly=no
option, this file includes everything but the kitchen sink. And it boots!
I should probably dig a little deeper and figure out what driver was missing but I'll leave that for another day.