DevHeads.net

Fedora 31 System-Wide Change proposal: Switch RPMs to zstd compression

<a href="https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression" title="https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression">https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression</a>

= Switch RPMs to zstd compression =

== Summary ==
Binary RPMs are currently compressed with xz level 2.
Switching to zstd would increase decompression speed significantly.

== Owner ==
* Name: [[User:dmach| Daniel Mach]]
* Email: <a href="mailto: ... at redhat dot com"> ... at redhat dot com</a>

== Detailed Description ==
* The change requires setting a new compression algorithm in rpm
macros. Then a mass rebuild of all packages is required.
* The macro for setting the compression is: %define _binary_payload w19.zstdio
* The recommended compression level is 19. The builds will take
longer, but the additional compression time is negligible in the total
build time and it pays off in better compression ratio than xz lvl2
has.
* SRPM payload compression should stay at gzip (there's almost no
benefit in changing the compression, because SRPM's contents is
compressed already)

=== Use case: Firefox installation ===
I rebuilt firefox-66.0.5-1.fc30 with zstd level19.
Then I compared installation times with the original (xz compressed) package:

{| class="wikitable"
|-
! Compression !! Target File System !! Time
|-
| xz level 2 || tmpfs || 8s
|-
| xz level 2 || ext4 on nvme || 11s
|-
| zstd level 19 || tmpfs || 2s
|-
| zstd level 19 || ext4 on nvme || 4s
|-
|}

=== Comparison of compression algorithms and levels ===
Following table shows '''cpio''' and '''compressed cpio''' extraction
times into a tmpfs. Actual times in decompressing RPMs will differ due
to extracting on an actual disk and also some overhead in the RPM tool
(checks, scriptlets).

{| class="wikitable"
|-
! Compression !! Level !! Size B !! Size GiB
!! Compression time !! Compression time, 4 threads !!
Decompression time !! Comment
|-
| CPIO || - || 5016785692 || 4,7
|| - || - || -
||
|-
| xz || 2 || 1615017616 || 1,6
|| 9m55s || - || 1m36s
|| slow decompression
|-
| pxz || 2 || 1631869880 || 1,6
|| - || 6m11s || 1m38s
|| slow decompression
|-
| gzip || 9 || 2086354992 || 2,0
|| 10m23s || - || 31s
|| insufficient compression ratio
|-
| bzip2 || 9 || 1889161565 || 1,8
|| 8m || - || 2m50s
|| very slow decompression; compression ratio could be
better
|-
| zstd || 3 || 1913536587 || 1,8
|| 31s || 29s || 6,5s
||
|-
| zstd || 10 || 1737928978 || 1,7
|| 3m27s || 2m34s || 6,3s
||
|-
| zstd || 15 || 1717303256 || 1,7
|| 9m37s || 6m34s || 6,3s
|| identical compression speed to xz; fast decompression;
slightly worse compression ratio than xz
|-
| zstd || 17 || 1635525492 || 1,6
|| 16m16s || 11m20s || 6,7s
||
|-
| zstd || 19 || 1575843696 || 1,5
|| 24m2s || 18m55s || 7,7s
||
|-
|}

== Benefit to Fedora ==
* Faster installations/upgrades of user systems
* Faster koji builds (installations in build roots)
* Faster container builds
* Lower bandwidth on mirrors if we choose the highest compression level

== Scope ==
* Proposal owners: submit a patch to redhat-rpm-config
* Other developers: redhat-rpm-config maintainer: include the patch
and make a new build
* Release engineering: [https://pagure.io/releng/issue/8345 #8345]
mass rebuild is needed

== Upgrade/compatibility impact ==
* RPM in Fedora supports zstd compression already (from Fedora 28,
rpm-4.14.0-0.rc2.5.fc28). No impact on Fedora users is expected.
* Fedora <= 27 and some other distros will not be able to decompress
zstd-compressed RPMs.

== How To Test ==
* dnf install <package>
* rpm -q --qf "%{PAYLOADCOMPRESSOR} %{PAYLOADFLAGS}\n" <package>
* expected output: zstd 19

Also the overall system installation time should decrease significantly.

== User Experience ==
See '''Benefit to Fedora'''

== Dependencies ==
N/A

== Contingency Plan ==
* Contingency mechanism: Not needed, Fedora will stay at current compression.
* Contingency deadline: N/A
* Blocks release? No
* Blocks product? N/A

== Documentation ==
N/A

== Release Notes ==
RPMs have switched to zstd compression level 19.
Users will benefit from faster package decompression.
Users that build their packages will experience slightly longer build times.

Comments

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Igor Gnatenko at 05/30/2019 - 02:39

Last time I was about to propose this in F29, I did mass-rebuild myself and
while decompressing was faster in most of the cases, the size was
definitely worse. So definitely "Lower bandwidth on mirrors if we choose
the highest compression level" is under the question.

I think before approving such changes, owners need to do mass rebuilds on
their own and provide a graph of changes in size between original
compression format and new one(s).

Just saying it works better on Firefox doesn't sound to me like the way to
go.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Chris Murphy at 05/29/2019 - 21:46

On Wed, May 29, 2019 at 2:20 PM Ben Cotton < ... at redhat dot com> wrote:
Arch has been discussing this change also, with more elaborate test
results. This is the most recent table including --ultra flag to
unlock level 20+
<a href="https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029542.html" title="https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029542.html">https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029542....</a>

The first post
<a href="https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520.html" title="https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520.html">https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520....</a>

Even without dictionary, -T0 will outperform xz. By how much depends
on the resources available at the time.

Phase 2 (it's out of scope for this feature): If you can figure out a
way to leverage the zstd (training) dictionary feature, that would
increase compression ratios, reduce compression time, as well as
decompression speed. The gotcha is the dictionary must be specified
for both compression and decompression. So you'd need a way for the
RPM metadata to reference a dictionary version, and package the
dictionary with RPM to make sure it's available. You only need one
version, but if future training demonstrates a significant
improvement, you'd want a way to deploy multiple dictionaries, and
differentiate which was used to compress an RPM since packages could
be made with either or none.
<a href="https://github.com/facebook/zstd" title="https://github.com/facebook/zstd">https://github.com/facebook/zstd</a> See "the case for small data compression"

Likely also faster openqa installations and testing.

Someone building an RPM locally for local use (or within their
organization) shouldn't get hit with level 19 compression time and
memory requirements. They're probably alright with just the default,
level 3. That's way faster than xz, and compresses better than any of
the zips.

Is there a way to configure different defaults, either on the command
line or with a configuration file? If you don't want to expose all of
the zstd options, even coming up with your own mapping/grouping is
useful: faster=3, better=20 And at some future date, both of them can
use the latest version dictionary automatically.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By James Cassell at 05/29/2019 - 18:32

On Wed, May 29, 2019, at 4:20 PM, Ben Cotton wrote:
Would this help with drpms similar to how it helps with faster yum repo metadata downloads? My biggest problem with drpms is the slow rebuild speed which is usually slower than my download bandwidth. It would be a big win if zstd helps here.

V/r,
James Cassell

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Jonathan Dieter at 05/30/2019 - 02:30

On Wed, 2019-05-29 at 18:32 -0400, James Cassell wrote:
Unfortunately not. The drpm rebuild process involves recompressing the
rpm, so we'd be affected by the compression speed, not the
decompression speed. With zstd compression level > 15, the drpm
rebuild speed would actually slow down (possibly quite significantly).

Jonathan

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Josh Boyer at 05/29/2019 - 17:52

On Wed, May 29, 2019 at 4:20 PM Ben Cotton < ... at redhat dot com> wrote:
The gcc team often does mass rebuilds on the side prior to updating
gcc in Fedora. Would it be possible to do the same or leverage their
rebuild work with the default changed in RPM to see what the true
overall savings is? That would get us a lot more data to see if it's
truly going to benefit the distro in terms of size and installation
speed.

If we did this, wouldn't it make it very difficult to use tools like
mock on RHEL / CentOS 7 to build for Fedora 3x? Or does RHEL 7 RPM
support zstd?

Does MBS's concept of platform modules help us build a module across
the RPM zstd-support boundary? I think it does, but I honestly can't
remember for sure and I'm not aware of the details that go into MBS
performing the build.

This seems wrong. If we get through a mass rebuild (or partial mass
rebuild) and find some ugly unknown issue with zstd compression, we're
going to have to do another mass rebuild to revert everything back,
correct? That should be listed as the Contingency, even if it's
unlikely.

Are we not advocating for a fully successful mass rebuild? Would we
ship the distribution with only a portion (significant or otherwise)
switched to zstd?

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By =?ISO-8859-2?Q?... at 05/30/2019 - 04:38

Dne 29. 05. 19 v 23:52 Josh Boyer napsal(a):
Speaking of Mock:
Either the RPM on host need to understand the new format/compression **or** the packages in @buildsys group (including
transitional deps) have to be in old format - then you can build for Fedora 3x using bootstrap feature.

Both of them would be painful. But I guess the former is more feasible.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By King InuYasha at 05/29/2019 - 18:05

On Wed, May 29, 2019 at 5:53 PM Josh Boyer < ... at fedoraproject dot org> wrote:
This is news to me, as I've never heard of any "side mass rebuilds".
They're prohibitively expensive to do, which is why we do only one per
release anyway.

I'm pretty sure this would break DeltaRPMs, since none of the drpm
software has been updated to handle zstd compression. Neither drpm nor
deltarpm handle it today.

We're pretty much screwed here. Also, since RHEL 8's rpm package does
not have zstd support compiled in, it too cannot handle the RPMs.

Cf. <a href="https://git.centos.org/rpms/rpm/blob/c8/f/SPECS/rpm.spec#_17-18" title="https://git.centos.org/rpms/rpm/blob/c8/f/SPECS/rpm.spec#_17-18">https://git.centos.org/rpms/rpm/blob/c8/f/SPECS/rpm.spec#_17-18</a>

Why would this help? MBS does nothing useful in this regard. It just
calls Koji to make builds. When built for a specific platform, it'll
use the definitions of that platform. And since the platform maps to
the distro release, it's effectively the same as normal packages.

Yeah, if this turns out bad, we'd need a second mass build to
eliminate packages with zstd compression.

I'd hope not...

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Jonathan Dieter at 05/30/2019 - 02:45

On Wed, 2019-05-29 at 18:05 -0400, Neal Gompa wrote:
I just wanted to point out my post[1] in November where I suggested
using zchunk as the compression format for rpm. IIRC, the main concern
with that proposal was compatibility with RHEL.

The one main advantage using zchunk would have over zstd would be the
ability to completely eliminated drpms, but, as mentioned in that
thread, it would require some changes to the RPM format.

Jonathan

1:
<a href="https://lists.fedorahosted.org/archives/list/ ... at lists dot fedoraproject.org/thread/YHKXMJHZW3O6EWA2WYMFWOC22KTVTPLB/" title="https://lists.fedorahosted.org/archives/list/ ... at lists dot fedoraproject.org/thread/YHKXMJHZW3O6EWA2WYMFWOC22KTVTPLB/">https://lists.fedorahosted.org/archives/list/ ... at lists dot fedoraproject.o...</a>

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Chris Murphy at 05/29/2019 - 22:15

On Wed, May 29, 2019 at 4:07 PM Neal Gompa < ... at gmail dot com> wrote:
Is it sane to test this in Rawhide now with just new builds? As things
get rebuilt in rawhide, we should start seeing reduced times in
various installation tests, including the ones openqa does, I think
dozens of times every day.

'dnf info deltarpm' says
URL : <a href="http://gitorious.org/deltarpm/deltarpm" title="http://gitorious.org/deltarpm/deltarpm">http://gitorious.org/deltarpm/deltarpm</a>
which has an expired certificate, but pushing passed that it says
current version 3.6 is 5 years old. Is this really maintained or
updatabled?

I see compression options in makedeltarpm, and zstd isn't in it. I'm
guessing we'd end up at line 580:

fprintf(stderr, "unknown compression type: %s\n", comp);

<a href="https://github.com/rpm-software-management/deltarpm/blob/3.6.1/makedeltarpm.c" title="https://github.com/rpm-software-management/deltarpm/blob/3.6.1/makedeltarpm.c">https://github.com/rpm-software-management/deltarpm/blob/3.6.1/makedelta...</a>

Anyway, I think optimizing for this is something rpm-ostree is better
suited for anyway. Due to the significantly faster decompress time of
zstd, whatever advantage there is of deltarpm is rapidly diminished.
Possibly the only way to know this is for someone to update deltarpm
to handle zstd and then test if the savings is still significant
compared to local reprocessing time.

Hence it needs to be configurable. Fedora EPEL RPMs need to be built
with xz. Everything else that's expected to be consumed by Fedora 29
and higher, can use either zstd or xz. I'd expect RHEL built packages
intended for Fedora would use xz, and Fedora's RPM would support that
just fine.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Jonathan Dieter at 05/30/2019 - 02:24

On Wed, 2019-05-29 at 20:15 -0600, Chris Murphy wrote:
Upstream has changed to
<a href="https://github.com/rpm-software-management/deltarpm" title="https://github.com/rpm-software-management/deltarpm">https://github.com/rpm-software-management/deltarpm</a>. The code is still
maintained, but there's not much active development. I can't speak for
the upstream maintainer, but I would guess that a PR that adds zstd
support would probably be welcomed.

Jonathan

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By John Reiser at 05/30/2019 - 01:12

Fedora should provide a means to convert from .rpm-with-compression-A
to .rpm-with-compression-B. Already there is 'alien' which converts
between .rpm, .deb, and .tgz.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Josh Boyer at 05/29/2019 - 21:25

On Wed, May 29, 2019 at 6:07 PM Neal Gompa < ... at gmail dot com> wrote:
They do them quite frequently outside of the Fedora infrastructure.

Hm. That is highly unfortunate. I would hope an RFE would be filed
to at least add support in for RHEL 8 if this is approved. Otherwise
we're literally just shooting our distribution ecosystem in the foot
for the benefit of only Fedora.

Right. So it helps take a single module and build the same sources
for all distributions. You get binary artifacts that are a result of
that distribution's toolchain. That was the intent of my question.

I agree it would not help the binary RPMs from the F31/F32 modules run
on a distribution that doesn't support zstd. However, because the
module could be built for such distributions we can still offer it
there. From an end user perspective, they get the application they
wanted. The number of people that want to cross-install RPMs/modules
is going to be proportionally small.

josh

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Jason L Tibbitts III at 05/29/2019 - 17:47

BC> * The change requires setting a new compression algorithm in rpm
BC> macros. Then a mass rebuild of all packages is required.

Technically there is no harm if a mass rebuild is not done; there will
simply be no benefit for packages which aren't rebuilt. Certainly the
change should be made in advance of the mass rebuild, assuming we're
going to do one.

BC> * The recommended compression level is 19. The builds will take
BC> longer, but the additional compression time is negligible in the
BC> total build time and it pays off in better compression ratio than xz
BC> lvl2 has.

That seems different than other results I've seen. According to the
wikipedia page (<a href="https://en.wikipedia.org/wiki/Zstandard" title="https://en.wikipedia.org/wiki/Zstandard">https://en.wikipedia.org/wiki/Zstandard</a>) and the
references therein, Ubuntu found that zstd level 19 was faster but with
poorer compression when compared with xz level 2 (which is the same
level that we use now).

I'm not super familiar with zstd but the data presented also implies
that multithreaded compression is available and at no loss to package
size (unlike parallel xz). But looking at the RPM source, I don't see
that the thread count can be specified for zstdio as it can be with xzio
(as in setting %_binary_payload to "w2T16.xzdio"). If I'm understanding
this correctly, it would be really nice of threaded zstd compression
(and decompression) were possible and supported.

Finally, note that as far as I can tell, this will render RHEL7 and
older unable to decompress Fedora RPMs. RPM 4.14 seems to be required.

- J<

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Stuart Gathman at 05/29/2019 - 16:28

...

Why? Newly built RPMs will use the new compression, and will rapidly
replace the old compression - with 100% replacement by fedora 32.