DevHeads.net

encrypted swap with urandom key problem

Hi,

I'm having a heck of a time trying to troubleshoot an indefinite hang
on startup due to setting up crypto swap on one of my laptops. At this
point it seems computer specific. I can't reproduce it on two other
(dissimilar) computers or any qemu-kvm VM.

Without any debugging enabled, the problem happens perhaps 1 in 10
boots. With debugging (systemd.log_level=debug rd.udev.debug
systemd.debug-shell=1) I can reverse that, about 9 in 10 boots hang.
But there still isn't enough information why.

The first hint of a problem is this huge delay in the journal for cryptsetup:

[ 11.464519] flap.local systemd[577]: Operating on architecture: x86-64
[ 21.710149] flap.local systemd-cryptsetup[606]: Set cipher aes,
mode xts-plain64, key size 256 bits for device
/dev/disk/by-partuuid/688b193f-3b38-4ca5-8b65-2ef61f27ec83.
...
[ 21.777721] flap.local systemd[606]:
systemd- ... at cryptswap dot service: Executing:
/usr/lib/systemd/systemd-cryptsetup attach cryptswap
/dev/disk/by-partuuid/688b193f-3b38-4ca5-8b65-2ef61f27ec83
/dev/urandom swap,cipher=aes-xts-plain64,size=256
...
[ 22.732131] flap.local systemd[1]: Child 606 (systemd-cryptse) died
(code=exited, status=0/SUCCESS)

So cryptsetup does succeed. In the early debug shell, I can see it
exist with dmsetup and blkid also shows the /dev/mapper/cryptoswap
device is already formatted swap, so the mkswap command likewise
succeeded.

[ 22.732726] flap.local systemd[725]:
systemd- ... at cryptswap dot service: Executing: /sbin/mkswap
/dev/mapper/cryptswap
...
[ 22.742728] flap.local systemd[1]: Child 725 (mkswap) died
(code=exited, status=0/SUCCESS)

For whatever reason, swapon never happens.

At this point, while in early debug shell, I issue 'systemctl
list-jobs' I get many services 'waiting' with one device stuck
running.

126 dev-mapper-cryptswap.device start running

If at this time I manually 'swapon /dev/mapper/cryptswap' the command
works, and startup resumes.

Anyway I'm stuck, any ideas how to get more information? If it's
hardware specific, could it be a wrong dependency on hwrng and maybe
there just isn't enough entropy? Hence the cryptsetup delay? And then
that delay results in some other race that hangs swapon? How would I
go about showing there is or isn't enough entropy at the time
cryptsetup executes?

I've updated this bug with the latest logs and findings which should
all match the pid and timestamps in this email.

<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1691589" title="https://bugzilla.redhat.com/show_bug.cgi?id=1691589">https://bugzilla.redhat.com/show_bug.cgi?id=1691589</a>