DevHeads.net

always update the bootloader during major upgrades

Hi,

This is not a formal proposal, this is for discussion and identifying
liabilities. This email has an x86 GRUB bias only because that's the
bootloader regime I'm most familiar with. I think it should apply to
other archs as well, i.e. their bootloaders shouldn't be permitted to
become stale.

Short version: Fedora should take responsibility for the bootloader
being up to date, by updating it during major version upgrades. This
is already the case on UEFI with conventional installations. I'd like
to make sure it always happens on major version upgrades for BIOS
installations. What's the problem? This would step on any custom
bootloader configuration a user has for multiboot. There is no
reasonable mechanism on BIOS systems to determine if the bootloader is
a Fedora bootloader, and somehow only update a Fedora bootloader and
not touch any other bootloader.

Fedora should be responsible for keeping the bootloader it installs in
~80% of the use cases up-to-date; and ignore the fallout from the ~20%
who have a custom setup that would be stepped on by a forced
bootloader update. The former is a feature and security risk, by
allowing the bootloader to go stale over time. The latter is an
inconvenience.

Longer version:

terms:

bootloader - this is the pre-boot bootloader binaries; on BIOS it's
embedded in the MBR, and the MBR gap or BIOS Boot partition, and
modules found in /boot/grub2 on Fedora. On UEFI, this is the
/EFI/fedora/shimx64.efi and /EFI/fedora/grubx64.efi which in the
typical installation are built and signed by the Fedora build system.
Other bootloaders have different names but generally follow a similar
convention, the point here is to distinguish between the "bootloader"
which runs in the pre-boot environment, and the installed package
containing user space tools that are used to install (or contain) the
bootloader.

Fedora bootloader - this is a bootloader that is derived from Fedora
packages and effort; as contrasted to merely upstream GRUB, or
Ubuntu's GRUB, etc. as these derivatives can actually be substantially
different from each other.

Discussion:

Both gnome-software (pk-offline-update), and dnf system-upgrade, on
BIOS firmware x86 computers, do not update the bootloader. That is, we
do not run 'grub2-install' either before starting an upgrade, or as a
post install operation following an upgrade. Therefore the bootloader
will become stale if the user does not manually run 'grub2-install'
periodically.

On conventional Fedora installations on UEFI computers, the bootloader
is included in the shim and grub2 packages as a file, and is therefore
updated. This happens sporadically within a Fedora release, not only
at major upgrade time. This updating of the bootloader files doesn't
happen on Silverblue on UEFI, so it ends up being subject to the same
problem under discussion.

In the most recent release, Fedora 30, we saw quite a lot of people
run into a problem we knew about before release, directly related to
the bootloader becoming stale. And it only affected BIOS systems (and
Silverblue on both firmware types).
<a href="https://fedoraproject.org/wiki/Common_F30_bugs#blscfg-fail" title="https://fedoraproject.org/wiki/Common_F30_bugs#blscfg-fail">https://fedoraproject.org/wiki/Common_F30_bugs#blscfg-fail</a>
<a href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=1652806" title="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=1652806">https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=1652806</a>

In the bug, you can find some dups as well but also quite a bit of
user aggravation. I'd like to avoid this going forward.

Windows and macOS never involved users in this stuff. They always
asserted complete and total domain over the bootloader, there wasn't
even an option to avoid updating it. And I think Fedora needs to do
the same. Just update it. That benefits most users. It's the
responsible thing to do. And those who need custom setups, can still
do that. It would only get stepped on at major upgrade time. And it's
decently likely we can warn them in advance.

There are some gotchas I'm already thinking of: MBR gap is too small,
there are multiple drives and it's ambiguous which one should get the
bootloader, and so on. I think it's sane to have a test for reasonable
certainty we can and should update the bootloader, and warn and not
update it in the cases that fail that test.

Thoughts?