DevHeads.net

Fedora 31 System-Wide Change proposal: Switch RPMs to zstd compression

<a href="https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression" title="https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression">https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_compression</a>

= Switch RPMs to zstd compression =

== Summary ==
Binary RPMs are currently compressed with xz level 2.
Switching to zstd would increase decompression speed significantly.

== Owner ==
* Name: [[User:dmach| Daniel Mach]]
* Email: <a href="mailto: ... at redhat dot com"> ... at redhat dot com</a>

== Detailed Description ==
* The change requires setting a new compression algorithm in rpm
macros. Then a mass rebuild of all packages is required.
* The macro for setting the compression is: %define _binary_payload w19.zstdio
* The recommended compression level is 19. The builds will take
longer, but the additional compression time is negligible in the total
build time and it pays off in better compression ratio than xz lvl2
has.
* SRPM payload compression should stay at gzip (there's almost no
benefit in changing the compression, because SRPM's contents is
compressed already)

=== Use case: Firefox installation ===
I rebuilt firefox-66.0.5-1.fc30 with zstd level19.
Then I compared installation times with the original (xz compressed) package:

{| class="wikitable"
|-
! Compression !! Target File System !! Time
|-
| xz level 2 || tmpfs || 8s
|-
| xz level 2 || ext4 on nvme || 11s
|-
| zstd level 19 || tmpfs || 2s
|-
| zstd level 19 || ext4 on nvme || 4s
|-
|}

=== Comparison of compression algorithms and levels ===
Following table shows '''cpio''' and '''compressed cpio''' extraction
times into a tmpfs. Actual times in decompressing RPMs will differ due
to extracting on an actual disk and also some overhead in the RPM tool
(checks, scriptlets).

{| class="wikitable"
|-
! Compression !! Level !! Size B !! Size GiB
!! Compression time !! Compression time, 4 threads !!
Decompression time !! Comment
|-
| CPIO || - || 5016785692 || 4,7
|| - || - || -
||
|-
| xz || 2 || 1615017616 || 1,6
|| 9m55s || - || 1m36s
|| slow decompression
|-
| pxz || 2 || 1631869880 || 1,6
|| - || 6m11s || 1m38s
|| slow decompression
|-
| gzip || 9 || 2086354992 || 2,0
|| 10m23s || - || 31s
|| insufficient compression ratio
|-
| bzip2 || 9 || 1889161565 || 1,8
|| 8m || - || 2m50s
|| very slow decompression; compression ratio could be
better
|-
| zstd || 3 || 1913536587 || 1,8
|| 31s || 29s || 6,5s
||
|-
| zstd || 10 || 1737928978 || 1,7
|| 3m27s || 2m34s || 6,3s
||
|-
| zstd || 15 || 1717303256 || 1,7
|| 9m37s || 6m34s || 6,3s
|| identical compression speed to xz; fast decompression;
slightly worse compression ratio than xz
|-
| zstd || 17 || 1635525492 || 1,6
|| 16m16s || 11m20s || 6,7s
||
|-
| zstd || 19 || 1575843696 || 1,5
|| 24m2s || 18m55s || 7,7s
||
|-
|}

== Benefit to Fedora ==
* Faster installations/upgrades of user systems
* Faster koji builds (installations in build roots)
* Faster container builds
* Lower bandwidth on mirrors if we choose the highest compression level

== Scope ==
* Proposal owners: submit a patch to redhat-rpm-config
* Other developers: redhat-rpm-config maintainer: include the patch
and make a new build
* Release engineering: [https://pagure.io/releng/issue/8345 #8345]
mass rebuild is needed

== Upgrade/compatibility impact ==
* RPM in Fedora supports zstd compression already (from Fedora 28,
rpm-4.14.0-0.rc2.5.fc28). No impact on Fedora users is expected.
* Fedora <= 27 and some other distros will not be able to decompress
zstd-compressed RPMs.

== How To Test ==
* dnf install <package>
* rpm -q --qf "%{PAYLOADCOMPRESSOR} %{PAYLOADFLAGS}\n" <package>
* expected output: zstd 19

Also the overall system installation time should decrease significantly.

== User Experience ==
See '''Benefit to Fedora'''

== Dependencies ==
N/A

== Contingency Plan ==
* Contingency mechanism: Not needed, Fedora will stay at current compression.
* Contingency deadline: N/A
* Blocks release? No
* Blocks product? N/A

== Documentation ==
N/A

== Release Notes ==
RPMs have switched to zstd compression level 19.
Users will benefit from faster package decompression.
Users that build their packages will experience slightly longer build times.

Comments

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Panu Matilainen at 06/03/2019 - 07:07

This is what we always thought with rpmbuild, "no point optimizing
because it'll just get drowned in the noise". However this has gotten to
be a hot topic in the last year or so, with people from different
backgrounds wanting to parallelize various aspects of rpmbuild to speed
it up.

To that background, going from 9m55s compression time to 24m2s is a
HORRIBLE regression that will eat away all the gains we just managed to
scrape by parallelizing new things.

Note that rpm doesn't support parallel zstd compression, and while it
does for xz, that's not even utilized in Fedora.

To me the sweet spot between compression efficiency and speed seems
closer to 10 than 19 - yes at a minor loss in space but huge speedup in
both compress and decompress times.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Jason L Tibbitts III at 06/03/2019 - 21:00

PM> Note that rpm doesn't support parallel zstd compression, and while
PM> it does for xz, that's not even utilized in Fedora.

Doing parallel xz compression has a surprising cost in compression ratio
which gets worse as the thread count increases (because it just splits
the input into independent blocks and compresses them separately). I
did start on a feature to have it enabled but then abandoned that after
realizing that it didn't really work as I'd hoped.

That said, I do wonder how difficult it would be to do parallel zstd
compression/decompression within RPM. If it were possible then that
might help to obviate some of the downsides.

PM> To me the sweet spot between compression efficiency and speed seems
PM> closer to 10 than 19 - yes at a minor loss in space but huge speedup
PM> in both compress and decompress times.

One problem is that I don't think anyone wants to see any quantifiable
regression in overall package size. Spins still struggle to fit within
fixed media sizes as the package set grows ever larger.

- J<

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Chris Murphy at 06/04/2019 - 17:53

On Mon, Jun 3, 2019 at 7:01 PM Jason L Tibbitts III < ... at math dot uh.edu> wrote:
Which is also why parallel xz compression doesn't produce reproducible results.

At least for small files, and there are many in any distribution,
using a dictionary very well could improve compression/decompression
time, compression ratio, more than threads. Adding dictionary support
would help all the single thread hardware, and even the builders when
zstd -T0 option dictates there's only 1 or 2 threads available. On the
generic sample set, it's functionally like getting 4 threads on speed,
and even compression ratio goes up by ~3x. But I have no idea how that
sample set compares to Fedora's files.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Panu Matilainen at 06/05/2019 - 04:10

On 6/5/19 12:53 AM, Chris Murphy wrote:
Yes, but as I mentioned in another email, rpm doesn't compress the files
individually, it compresses them as one big continuous archive. The
dictionary is unlikely to help that (in my quick test yesterday it
actually made it worse)

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Chris Murphy at 06/05/2019 - 12:01

On Wed, Jun 5, 2019 at 2:11 AM Panu Matilainen < ... at redhat dot com> wrote:
Sorry about that I missed it. The --long/windowLog option sounds interesting.

I found this on HN today. While xz is not expressly being used within
Fedora/Red Hat packaging in an archive context, it does seem to have
quite a lot of other potential problems. But I have no idea what
lurking liabilities zstd will have.

<a href="http://lzip.nongnu.org/xz_inadequate.html" title="http://lzip.nongnu.org/xz_inadequate.html">http://lzip.nongnu.org/xz_inadequate.html</a>

Tangentially, I think there is room for improvement with LiveOS
delivery, which right now is doing something pathological I haven't
been able to figure out compared to other distro LiveOS's: 100% CPU
usage reported by one of the /dev/loopN processes during startup and
installation. And as it's a single thread, it's a bottleneck.
Everytime I do an installation on any computer, fans go to the max. I
don't know that this is xz related, but it might be perturbing things
because decompression is so processor intensive. I'll start a separate
thread.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Zbigniew =?utf-... at 06/05/2019 - 15:36

On Wed, Jun 05, 2019 at 10:01:22AM -0600, Chris Murphy wrote:
That page should be taken with a grain of salt. IIRC, it's was written
by some who wanted to push their own alternative version, and most of
the critique has been debunked.

Zbyszek

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Chris Murphy at 06/05/2019 - 16:00

On Wed, Jun 5, 2019 at 1:38 PM Zbigniew Jędrzejewski-Szmek
< ... at in dot waw.pl> wrote:
It's a fair point. The HN comments from a year ago and also this week
get into some of that.

<a href="https://news.ycombinator.com/item?id=16884832" title="https://news.ycombinator.com/item?id=16884832">https://news.ycombinator.com/item?id=16884832</a>
<a href="https://news.ycombinator.com/item?id=20103255" title="https://news.ycombinator.com/item?id=20103255">https://news.ycombinator.com/item?id=20103255</a>

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Kevin Kofler at 06/04/2019 - 17:38

Jason L Tibbitts III wrote:
The RPM compression method is pretty irrelevant to Spin sizes because Spins
are typically live media and so use the live media's compression, not the
RPM compression. All the RPMs are already unpacked and recompressed using
the live media compression technology (currently xz).

Kevin Kofler

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Panu Matilainen at 06/04/2019 - 01:31

On 6/4/19 4:00 AM, Jason L Tibbitts III wrote:
Yup, I know. More than one people have been down that route :)

No idea, except that last I looked, zstd seemed to be the only kid in
town who can do parallel decompression at all. The current zstd support
in rpm is basically just an initial code drop that implements the
barebones compress/decompress functionality. Besides parallel
operations, it'd probably be worth trying to teach it to use a
dictionary for example (the charts at <a href="https://github.com/facebook/zstd" title="https://github.com/facebook/zstd">https://github.com/facebook/zstd</a>
are pretty impressive on that)

Sure, everything's a compromise. Personally I find the fixed media
battle one that was long lost already and low in the overall priorities
but that's just me.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Panu Matilainen at 06/04/2019 - 02:22

On 6/4/19 8:31 AM, Panu Matilainen wrote:
...except that of course rpm compression doesn't occur at individual
file level but the whole, prebuilt dictionaries aren't that useful with
the payload.

But there are other tuning options that seem more beneficial to the rpm
use-case, such as the zstd cli --long equivalent: in my testcase
compressing with --long at level 14 gives a slightly better compression
rate than level 19 without it, at a fraction of the time (40s vs
2min31s). It uses more memory (there never was a free lunch was there?)
but peanuts to what large compiles can use. At any rate, that is
something to look into.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Kevin Fenzi at 05/30/2019 - 15:06

So, most of my concerns have already been mentioned by other folks in
this thread:

* No rhel7/8 support will annoy people, and also increase burden on
fedora infrastructure since we would have to move our koji hubs to
Fedora instead of RHEL to be able to read the rpms made on builders.
(Or ship a custom rpm, but we have done that before and it's been always
a nightmare).

* This cannot land until we finish sorting out armv7 builder issues.
(see bug <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1576593" title="https://bugzilla.redhat.com/show_bug.cgi?id=1576593">https://bugzilla.redhat.com/show_bug.cgi?id=1576593</a> ).
I am trying to see if we can get away with a f29 userspace and a
specific kernel we think works. Until this is moved however, all the
armv7 buildvm's are on fedora 27, so they wouldn't be able to handle
this change.

* The drpm issue is somewhat minor in my mind since we don't produce
very useful drpms right now (due to pungi not having anything more than
the last updates compose to build them against).

So, this definitely needs extra coordination if we decide to go for it.

kevin

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By King InuYasha at 06/14/2019 - 06:50

On Thu, May 30, 2019 at 3:07 PM Kevin Fenzi < ... at scrye dot com> wrote:
There's a bug open for fixing this in RHEL 8:
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1715799" title="https://bugzilla.redhat.com/show_bug.cgi?id=1715799">https://bugzilla.redhat.com/show_bug.cgi?id=1715799</a>

I started looking at making a patch for rpm 4.11.x for RHEL 7, but
it's not trivial...

More broadly, I've just submitted an SR so that openSUSE Tumbleweed
will have support for zstd payloads:
<a href="https://build.opensuse.org/request/show/709948" title="https://build.opensuse.org/request/show/709948">https://build.opensuse.org/request/show/709948</a>

I'll have to see if I can get SLE 15 SP2 (and thus openSUSE Leap 15.2)
to have it turned on, but I think that's unlikely...

I'm not sure if the Open Build Service needs the underlying hosting
rpm package to support zstd or not to handle zstd rpms properly...
Michael, do you know if that's the case?

Were you able to get the f29 userland on f27 kernel to work?

The deltarpm package now supports zstd payloads, as Michael Schroeder
added support yesterday morning and released 3.6.2, which is now in
Rawhide: <a href="https://koji.fedoraproject.org/koji/buildinfo?buildID=1287231" title="https://koji.fedoraproject.org/koji/buildinfo?buildID=1287231">https://koji.fedoraproject.org/koji/buildinfo?buildID=1287231</a>

At this point, the drpm library is the only blocker for zstd payloads,
since createrepo_c needs to be able to handle zstd drpms.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By =?utf-8?b?QWxlx... at 06/19/2019 - 06:51

I looked into the drpm library and I should be able to add the zstd support
(and make sure it works with createrepo_c)

Working on it now.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Panu Matilainen at 06/19/2019 - 08:41

On 6/19/19 1:51 PM, Aleš Matěj wrote:
FWIW, as drpm links to librpm anyway, it should be possible for drpm to
just use the file API from rpm to gain support for everything that rpm
does instead of duplicating the effort for all the compression types.

If there's something broken or missing that prevents this from working,
we could always address that...

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Nico Kadel-Garcia at 06/25/2019 - 07:16

On Wed, Jun 19, 2019 at 9:31 AM Panu Matilainen < ... at redhat dot com> wrote:
This whole zstd replacement seems like a hazardous idea, because of
backporting SRPMs to older operating systems for EPEL compilation.
It's possible, but awkward, to chew through the git repos to deduce
whch git branch was used and reference that, rather than directly
extract from the SRPM. It would mean that unless this compression is
only applied to limited uses such as drpm, then older OS releases
would not be able to read the modern SRPM by default. Backwards
compatibility is not why people write new software, but broad
accessibility of the source code seems a vital feature to preserve for
what should be rock stable build environments downstream.

I, for one, have done considerable backporting of Fedora SRPMs of
python modules to RHEL and CentOS environments. I'd hate to add
another step to extract them on RHEL or CentOS or to build from them
in "mock". I'd also hate for "rpm2cpio" to break: I hope adding zstd
compatibility to the older versions of that tool, as well, is not
difficult.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Tomas Mraz at 06/25/2019 - 09:20

On Tue, 2019-06-25 at 07:16 -0400, Nico Kadel-Garcia wrote:
This is change is strictly only about binary rpm payload compression
method change not at all about SRPMs.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Martin Kolman at 06/19/2019 - 07:12

On Wed, 2019-06-19 at 10:51 +0000, Aleš Matěj wrote:

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By King InuYasha at 05/30/2019 - 16:18

On Thu, May 30, 2019 at 3:07 PM Kevin Fenzi < ... at scrye dot com> wrote:
I'm actually okay with the thought of Koji hub moving to Fedora. I'd
rather see most of our infra running on Fedora so that we don't get
kneecapped by RHEL moving too slowly. Our transition to Python 3 was
made way more complicated by the fact our infrastructure ran on RHEL 6
or RHEL 7, where Python 3 wasn't available in a useful manner for a
very long time. Having our own infra run on our distribution that we
have a say in makes a huge difference in being able to move things
forward.

Not that I hate RHEL or anything, but we don't have a say in anything
when it comes to RHEL, and they don't really care about bugs we report
that afflict us that much. Not exactly the most solid foundation to
run a distribution's infrastructure on, wouldn't you say?

That said, I'm less happy about the thought that inspecting Fedora
RPMs on RHEL 8 or openSUSE is going to be a royal pain.
Ecosystem-wise, no one really prepared for a distribution to switch to
zstd so quickly. Thankfully, it's easier to support than things like
modularity, which break the entire way people do things. If we decide
to do this, at least I'll try to see to get things fixed on the SUSE
side. Maybe someone can push for this to be fixed on the RHEL side as
well?

Ugh, I didn't realize this is still a problem. It _should_ work with
an F30 userspace on the F27 kernel, but that's gross... :(

This feels more like a failing on pungi. We don't have archives or
indexes of what old composes looked like to maintain drpm content?

I agree.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Kevin Fenzi at 05/31/2019 - 16:49

Sure, but it means more work for us due to the updates churn and
upgrades/re-installs. If we do have to do this, I'm going to likely
investigate just moving the hubs into openshift.

Well, I don't find that to really be the case... in the past when we
have needed things urgently many RHEL folks have gone out of their way
to come up with a solution for us.

...snip...

I'm not sure it does, but I can test that combo, given time.

It's out of scope for pungi to keep track of a bunch of composes. It
really only is concerned with the compose it's doing. Likely we need a
higher level script/tool that keeps drpms from all the composes that
makes sense available. Or perhaps we need pungi to not do drpms at all,
but have something else do them out of band and update when it finishes.

kevin

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Tomas Mraz at 05/31/2019 - 06:09

On Thu, 2019-05-30 at 16:18 -0400, Neal Gompa wrote:
I created this BZ:

<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1715799" title="https://bugzilla.redhat.com/show_bug.cgi?id=1715799">https://bugzilla.redhat.com/show_bug.cgi?id=1715799</a>

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Adam Jackson at 05/30/2019 - 12:20

The numbers here seem to indicate that you'll have faster koji build
_setup_. But getting comparable compression rates as xz means spending
(apparently) significantly more time at successful build completion.
That's likely a win overall, especially when we consider the local mock
case of "why is this build failing", where you're likely to iterate
several times until you succeed. Still, it would be nice to see some
more detailed numbers to back that up. For example:

- For the minimal buildroot, what's the difference in download size and
decompression time?
- What's the mean and/or median size of an rpm in Fedora, and what
difference in {de,}compression time would that likely experience?
- Which package's mock buildroot has the largest size (compressed or
not, though it's probably the same either way), and what time
difference would that package experience with this change?

- ajax

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Adam Jackson at 05/30/2019 - 13:41

Just to follow up on this since it was quick to math out. For Fedora
30's x86_64 repo, various "averages" and some nearby binary rpms to
each:

Arithmetic mean: 1347495
-rw-r--r--. 1 ajax ajax 13532128 Feb 9 16:52 texlive-pgfplots-doc-svn47373-25.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 13522512 Feb 17 19:28 Singular-doc-4.1.1p3-4.fc30.x86_64.rpm
-rw-r--r--. 1 ajax ajax 13452180 Feb 7 11:45 asterisk-sounds-core-es-g722-1.6.1-5.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 13411540 Mar 14 10:27 eclipse-dtp-1.14.102-4.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 13358352 Mar 13 05:50 gcc-go-9.0.1-0.10.fc30.x86_64.rpm

Geometric mean: 104613
-rw-r--r--. 1 ajax ajax 104624 Feb 9 16:55 texlive-datetime2-polish-doc-svn36692.1.0-25.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 104624 Feb 3 21:55 usbutils-010-3.fc30.x86_64.rpm
-rw-r--r--. 1 ajax ajax 104612 Aug 17 2018 samtools-libs-0.1.19-16.fc29.x86_64.rpm
-rw-r--r--. 1 ajax ajax 104600 Feb 5 11:43 kf5-khtml-devel-5.55.0-1.fc30.x86_64.rpm
-rw-r--r--. 1 ajax ajax 104588 Feb 2 00:49 objenesis-2.6-4.fc30.noarch.rpm

Median: 71064
-rw-r--r--. 1 ajax ajax 71068 Feb 7 01:26 dagger-1.2.2-10.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 71068 Feb 24 17:02 gnome-shell-extension-system-monitor-applet-36-4.20190224git2583911.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 71064 Feb 15 10:55 cbi-plugins-javadoc-1.1.5-5.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 71064 Mar 11 08:03 opensips-acc-2.4.5-1.fc30.x86_64.rpm
-rw-r--r--. 1 ajax ajax 71060 Feb 2 05:40 libgrss-devel-0.7.0-8.fc30.x86_64.rpm
-rw-r--r--. 1 ajax ajax 71040 Feb 2 23:15 mbuffer-20181119-2.fc30.x86_64.rpm

So I kind of take it back. Even single-threaded and at zstd level 19
you'll get about 1MB/s of output (according to your sample table in the
change proposal), and something like 90% of packages are below 1MB
compressed, so I'm hard pressed to care about <1s of difference in
compression time for the vast majority of packages.

Possibly more interesting are the 21 biggest packages (an almost
arbitrary number, the 22nd biggest is the first one that's not noarch):

-rw-r--r--. 1 ajax ajax 1690320420 Feb 1 08:13 FlightGear-data-2018.3.2-1.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 1378818072 Feb 16 12:29 speed-dreams-robots-base-2.2.2-2.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 918112496 Mar 20 11:06 xonotic-data-0.8.2-6.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 913953504 Feb 7 11:46 astrometry-data-4204-0.76-2.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 876513824 Feb 16 12:29 redeclipse-data-1.5.6-9.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 795939928 Feb 6 15:24 alienarena-data-7.71.0-2.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 763842068 Feb 4 15:33 0ad-data-0.0.23b-2.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 520122860 Aug 23 2018 supertuxkart-data-0.9.3-2.fc30.5.noarch.rpm
-rw-r--r--. 1 ajax ajax 518557008 Mar 13 15:44 kicad-packages3d-5.1.0-1.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 496263868 Feb 3 22:27 vdrift-data-20141020-16.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 464651048 Feb 7 11:46 astrometry-data-4205-0.76-2.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 447486852 Feb 3 17:35 warsow-data-2.1.2-3.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 426017596 Feb 26 22:33 wesnoth-data-1.14.6-1.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 413617108 Feb 3 17:15 vegastrike-data-0.5.1-18.r1.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 400129316 Feb 2 01:53 openarena-0.8.8-14.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 398661608 Feb 6 21:49 berusky2-data-0.9-10.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 398113064 Jan 31 08:12 btbuilder-data-0.5.16-4.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 382267140 Mar 9 14:26 pioneer-data-20190203-2.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 367174128 Feb 1 21:20 julius-japanese-models-4.4.2.1-5.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 357514216 Feb 9 16:48 texlive-kerkis-svn15878.0-25.fc30.noarch.rpm
-rw-r--r--. 1 ajax ajax 353033380 Aug 17 2018 torcs-data-1.3.7-4.fc28.noarch.rpm

Or, the biggest desktop apps, since they're likely to see frequent
rebuilds, which are basically: eclipse, libreoffice, and firefox.

- ajax

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Vitaly Zaitsev ... at 05/30/2019 - 09:59

Hello, Ben Cotton.

Good change, but it will significantly increase Delta RPM rebuild
process especially on HDD, that's why drpm should be disabled by default
to achieve maximum speed.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Igor Gnatenko at 05/30/2019 - 02:39

Last time I was about to propose this in F29, I did mass-rebuild myself and
while decompressing was faster in most of the cases, the size was
definitely worse. So definitely "Lower bandwidth on mirrors if we choose
the highest compression level" is under the question.

I think before approving such changes, owners need to do mass rebuilds on
their own and provide a graph of changes in size between original
compression format and new one(s).

Just saying it works better on Firefox doesn't sound to me like the way to
go.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Daniel Mach at 05/30/2019 - 10:54

Dne 30. 05. 19 v 8:39 Igor Gnatenko napsal(a):
BTW, which compression level did you use?
Could you share some of your observations and stats if you still have them?

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Samuel Sieb at 05/30/2019 - 14:44

On 5/30/19 7:54 AM, Daniel Mach wrote:
Is it possible to just recompress the rpms instead of doing a full build?

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Zbigniew =?utf-... at 05/31/2019 - 05:33

On Thu, May 30, 2019 at 11:44:23AM -0700, Samuel Sieb wrote:
I did a somewhat unscientific test with 1GB of packages from a mock cache:

$ rm -rf zstd; mkdir zstd; for i in /var/cache/mock/fedora-rawhide-x86_64/dnf_cache/fedora-2d95c80a1fa0a67d/packages/*rpm; do rpm2cpio $i | zstd >zstd/$(basename $i .rpm).cpio.zstd -19 -;done
$ rm -rf xz; mkdir xz; for i in /var/cache/mock/fedora-rawhide-x86_64/dnf_cache/fedora-2d95c80a1fa0a67d/packages/*rpm; do rpm2cpio $i | xz >xz/$(basename $i .rpm).cpio.xz -2 -;done
$ rm -rf zstd20; mkdir zstd20; time for i in zstd/*; do zstdcat $i | zstd >zstd20/$(basename $i) --ultra -20 -;done

$ du -sh /var/cache/mock/fedora-rawhide-x86_64/dnf_cache/fedora-2d95c80a1fa0a67d/packages/ /tmp/{xz,zstd,zstd20}
1019M /var/cache/mock/fedora-rawhide-x86_64/dnf_cache/fedora-2d95c80a1fa0a67d/packages/
985M /tmp/xz
946M /tmp/zstd
930M /tmp/zstd20

$ time xzcat /tmp/xz/* >/dev/null
xzcat /tmp/xz/* > /dev/null 78.45s user 0.78s system 99% cpu 1:19.72 total
$ time zstdcat /tmp/zstd/* >/dev/null
zstdcat /tmp/zstd/* > /dev/null 9.19s user 0.44s system 98% cpu 9.751 total
$ time zstdcat /tmp/zstd20/* >/dev/null
zstdcat /tmp/zstd20/* > /dev/null 8.82s user 0.50s system 99% cpu 9.394 total

Notes:
- this is all single-threaded, since rpm doesn't do multithreading.
I saw some discussion that multithreading is easier with zstd because it's
reproducible with zstd, but not entirely with xz. Enabling multithreading
would be very beneficial for compression.
- $ rpm -q zstd xz
zstd-1.4.0-1.fc29.x86_64
xz-5.2.4-3.fc29.x86_64
- machine:
$ lscpu
Architecture: x86_64
CPU(s): 12
...
NUMA node(s): 2
Model name: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz

/var/ is on spinning rust, /tmp is RAM.
- I'm not providing timings of compression, because I only did one run and
this might not be reproducible. Unscientific impression was that zstd
was quite a bit slower when compressing.

Zbyszek

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Igor Gnatenko at 05/30/2019 - 11:54

From what I remember, I've tried at least 4-5 different ones.

No, they were all on my RH laptop. But in short, quite some packages were
actually bigger and building time was much slower in many cases. I think
you'd want to check on 0ad-data package for something big.

I never tested unpacking time because package size was bigger overall so I
decided not to spend much time on it.

Just use resources you have inside company :) If you don't have, I can
donate some.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Peter Robinson at 05/30/2019 - 11:12

The gcc team use a different process, they don't use koji at all when
they test new gcc releases against the Fedora package set, you should
probably reach out to them to find out their process and what
infrastructure they use.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Chris Murphy at 05/29/2019 - 21:46

On Wed, May 29, 2019 at 2:20 PM Ben Cotton < ... at redhat dot com> wrote:
Arch has been discussing this change also, with more elaborate test
results. This is the most recent table including --ultra flag to
unlock level 20+
<a href="https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029542.html" title="https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029542.html">https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029542....</a>

The first post
<a href="https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520.html" title="https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520.html">https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520....</a>

Even without dictionary, -T0 will outperform xz. By how much depends
on the resources available at the time.

Phase 2 (it's out of scope for this feature): If you can figure out a
way to leverage the zstd (training) dictionary feature, that would
increase compression ratios, reduce compression time, as well as
decompression speed. The gotcha is the dictionary must be specified
for both compression and decompression. So you'd need a way for the
RPM metadata to reference a dictionary version, and package the
dictionary with RPM to make sure it's available. You only need one
version, but if future training demonstrates a significant
improvement, you'd want a way to deploy multiple dictionaries, and
differentiate which was used to compress an RPM since packages could
be made with either or none.
<a href="https://github.com/facebook/zstd" title="https://github.com/facebook/zstd">https://github.com/facebook/zstd</a> See "the case for small data compression"

Likely also faster openqa installations and testing.

Someone building an RPM locally for local use (or within their
organization) shouldn't get hit with level 19 compression time and
memory requirements. They're probably alright with just the default,
level 3. That's way faster than xz, and compresses better than any of
the zips.

Is there a way to configure different defaults, either on the command
line or with a configuration file? If you don't want to expose all of
the zstd options, even coming up with your own mapping/grouping is
useful: faster=3, better=20 And at some future date, both of them can
use the latest version dictionary automatically.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By James Cassell at 05/29/2019 - 18:32

On Wed, May 29, 2019, at 4:20 PM, Ben Cotton wrote:
Would this help with drpms similar to how it helps with faster yum repo metadata downloads? My biggest problem with drpms is the slow rebuild speed which is usually slower than my download bandwidth. It would be a big win if zstd helps here.

V/r,
James Cassell

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Jonathan Dieter at 05/30/2019 - 02:30

On Wed, 2019-05-29 at 18:32 -0400, James Cassell wrote:
Unfortunately not. The drpm rebuild process involves recompressing the
rpm, so we'd be affected by the compression speed, not the
decompression speed. With zstd compression level > 15, the drpm
rebuild speed would actually slow down (possibly quite significantly).

Jonathan

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Josh Boyer at 05/29/2019 - 17:52

On Wed, May 29, 2019 at 4:20 PM Ben Cotton < ... at redhat dot com> wrote:
The gcc team often does mass rebuilds on the side prior to updating
gcc in Fedora. Would it be possible to do the same or leverage their
rebuild work with the default changed in RPM to see what the true
overall savings is? That would get us a lot more data to see if it's
truly going to benefit the distro in terms of size and installation
speed.

If we did this, wouldn't it make it very difficult to use tools like
mock on RHEL / CentOS 7 to build for Fedora 3x? Or does RHEL 7 RPM
support zstd?

Does MBS's concept of platform modules help us build a module across
the RPM zstd-support boundary? I think it does, but I honestly can't
remember for sure and I'm not aware of the details that go into MBS
performing the build.

This seems wrong. If we get through a mass rebuild (or partial mass
rebuild) and find some ugly unknown issue with zstd compression, we're
going to have to do another mass rebuild to revert everything back,
correct? That should be listed as the Contingency, even if it's
unlikely.

Are we not advocating for a fully successful mass rebuild? Would we
ship the distribution with only a portion (significant or otherwise)
switched to zstd?

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Daniel Mach at 05/30/2019 - 10:31

Dne 29. 05. 19 v 23:52 Josh Boyer napsal(a):
I rebuilt the packages that are available in fedora:30 docker image:
<a href="https://copr.fedorainfracloud.org/coprs/dmach/fedora-zstd/" title="https://copr.fedorainfracloud.org/coprs/dmach/fedora-zstd/">https://copr.fedorainfracloud.org/coprs/dmach/fedora-zstd/</a>

The overall size is roughly equal to xz compressed RPMs.

It's not comparable with the whole Fedora repo, but it's a good start.
I can build more packages if a larger sample is needed.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By =?ISO-8859-2?Q?... at 05/30/2019 - 04:38

Dne 29. 05. 19 v 23:52 Josh Boyer napsal(a):
Speaking of Mock:
Either the RPM on host need to understand the new format/compression **or** the packages in @buildsys group (including
transitional deps) have to be in old format - then you can build for Fedora 3x using bootstrap feature.

Both of them would be painful. But I guess the former is more feasible.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Pavel Raiskup at 05/30/2019 - 20:15

On Thursday, May 30, 2019 10:38:25 AM CEST Miroslav Suchý wrote:
I need to underline this, it would be really really really bad if we were
not able to --installroot fedora chroots at least on RHEL 8. How likely
is a backport of zstd support into RPM in EL7+?

Regarding @buildsys group and compat compression; if we were OK to use
`mock --bootstrap-chroot`, we could limit the package compatibility set
(not really subset) to dnf + dnf-utils + deps (see dnf_install_command in
site-defaults.cfg). But TBH I don't view this idea as feasible/maintainable
solution, "Requires:" do change all the time...

Another slightly more realistic way around would be to not --installroot
the bootstrap chroot in mock, but rely on some distributed "bootstrap"
root cache tarball (a standard, safe way) instead.

Typo: "having none of them would be painful", and very likely to happen.

Pavel

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Daniel Mach at 06/04/2019 - 02:56

Dne 31. 05. 19 v 2:15 Pavel Raiskup napsal(a):
RHEL 7 is a different story. The patch doesn't apply directly and a
backport would be needed.

Panu,
how difficult it would be to backport the zstd support to RHEL 7 RPM?

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Daniel Mach at 06/04/2019 - 02:56

Dne 31. 05. 19 v 2:15 Pavel Raiskup napsal(a):
RHEL 7 is a different story. The patch doesn't apply directly and a
backport would be needed.

Panu,
how difficult it would be to backport the zstd support to RHEL 7 RPM?

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Panu Matilainen at 06/04/2019 - 03:43

On 6/4/19 9:56 AM, Daniel Mach wrote:
Technically, backporting rpmio backends is almost trivial, even to much
older rpm versions.

Getting new features into older RHEL is a much bigger problem.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Vitaly Zaitsev ... at 05/31/2019 - 03:25

Le vendredi 31 mai 2019 à 02:15 +0200, Pavel Raiskup a écrit :
We should not been talking about rpm backports, so much of the
fedora/epel/el flow depends on rpm enhancements, new rpm versions
should be pushed by default to old streams after a year/six months of
proofing Fedora-side.

I'm quite sure all the efforts wasted working around old rpm
limitations in EL cost a lot more (including @RH) than the people that
would be needed to correct problems in case something slipped through
Fedora QA. It's done for Firefox and the amount of changes pushed to
Firefox is crazy compared to what happens rpm side.

The non-rpm distributors are running circles around Fedora and EL, and
it's not because their binaries are better, their QA process more
solid, their core design easier to use, it's just that they make their
software deployments enhancements available timely and not after 5
years of procastination.

Right now any attempt to contribute modern rpm packaging starts with a
long list of “you could do X, but it’s not available yet, use Y
instead”. Who actually expects to attract new contributors this way?

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Igor Gnatenko at 05/31/2019 - 03:45

On Fri, May 31, 2019 at 9:36 AM Nicolas Mailhot via devel <

You just forgot that Firefox while being important piece for desktop users
is not used by anything. But everything depends on RPM. If you want new
version of RPM, you need to also rebuild packages to ensure that they are
generated with new RPM... And so on.

The non-rpm distributors are running circles around Fedora and EL, and

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By King InuYasha at 05/29/2019 - 18:05

On Wed, May 29, 2019 at 5:53 PM Josh Boyer < ... at fedoraproject dot org> wrote:
This is news to me, as I've never heard of any "side mass rebuilds".
They're prohibitively expensive to do, which is why we do only one per
release anyway.

I'm pretty sure this would break DeltaRPMs, since none of the drpm
software has been updated to handle zstd compression. Neither drpm nor
deltarpm handle it today.

We're pretty much screwed here. Also, since RHEL 8's rpm package does
not have zstd support compiled in, it too cannot handle the RPMs.

Cf. <a href="https://git.centos.org/rpms/rpm/blob/c8/f/SPECS/rpm.spec#_17-18" title="https://git.centos.org/rpms/rpm/blob/c8/f/SPECS/rpm.spec#_17-18">https://git.centos.org/rpms/rpm/blob/c8/f/SPECS/rpm.spec#_17-18</a>

Why would this help? MBS does nothing useful in this regard. It just
calls Koji to make builds. When built for a specific platform, it'll
use the definitions of that platform. And since the platform maps to
the distro release, it's effectively the same as normal packages.

Yeah, if this turns out bad, we'd need a second mass build to
eliminate packages with zstd compression.

I'd hope not...

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Daniel Mach at 05/30/2019 - 10:39

Dne 30. 05. 19 v 0:05 Neal Gompa napsal(a):

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Chris Murphy at 05/30/2019 - 16:56

On Thu, May 30, 2019 at 8:40 AM Daniel Mach < ... at redhat dot com> wrote:
I think the net resources consumed by all parties needs to be
considered. Whether xz:2 or zstd:19, multiplied by thousands of users,
that's energy and heat.

I have no idea how deltarpm works, but if working on bit level
difference on uncompressed data, I don't see why local rebuild needs
to use the same compression level as the Fedora build system. If it's
working on compressed data, well I'm not sure how that works, in
particular if pixz is used which gives non-reproducible results.

Another idea for the training dictionary: the training could be done
per RPM at create time based on the files for that RPM, and stuff the
dictionary in the RPM. No versioning needed. The speed and compression
improvements are significant enough it's plausible whatever hit there
is for training is overcome by the gain, even at lower compression
levels. But it probably needs testing to know.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Samuel Sieb at 05/30/2019 - 17:29

On 5/30/19 1:56 PM, Chris Murphy wrote:
I was going to suggest earlier that deltarpm could use a faster
compression when repacking. But then I realized that the result has to
be be bit-exact with the original so the package signing is still intact.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Vitaly Zaitsev ... at 05/31/2019 - 03:33

Le jeudi 30 mai 2019 à 14:29 -0700, Samuel Sieb a écrit :
That's because someone in the old old past took the shortut of signing
compressed payload hashes instead of signing the uncompressed payload.
That was an easy mistake to make at the time everything was a gzip
file.

That’s something which is also killing us hosting side, now that many
”source” archives are generated on-the-fly, and the on-the-fly
compression method is not stable over time.

Someday the technical debt will reach such levels, the whole package
creation and distribution toolchain will have to be audited to hunt
down all the steps where we assume the security invariant is the
compressed payload instead of the payload itself.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Chris Murphy at 05/30/2019 - 17:38

On Thu, May 30, 2019 at 3:31 PM Samuel Sieb < ... at sieb dot net> wrote:
Package signing happens after compression? Compression is an
optimization, in no way does it affect the validity of the payload.

Re: Fedora 31 System-Wide Change proposal: Switch RPMs to zstd c

By Samuel Sieb at 05/30/2019 - 17:52

On 5/30/19 2:38 PM, Chris Murphy wrote:
My understanding is that the signature is calculated over the compressed
payload. (I couldn't find any clear documentation on it with a quick
search.) I see that would make it simpler and somewhat quicker to
verify, but it does cause problems with things like deltarpm and
recompressing packages.