DevHeads.net

zstd compression for packages

Hey folks,

We had a coding day in Foundations last week and Balint and Julian added support for zstd compression to dpkg [1] and apt [2].

[1] <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664" title="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664">https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664</a>
[2] <a href="https://salsa.debian.org/apt-team/apt/merge_requests/8" title="https://salsa.debian.org/apt-team/apt/merge_requests/8">https://salsa.debian.org/apt-team/apt/merge_requests/8</a>

Zstd is a compression algorithm developed by Facebook that offers far
higher decompression speeds than xz or even gzip (at roughly constant
speed and memory usage across all levels), while offering 19 compression
levels ranging from roughly comparable to gzip in size (but much faster)
to 19, which is roughly comparable to xz -6:

In our configuration, we run zstd at level 19. For bionic main amd64,
this causes a size increase of about 6%, from roughly 5.6 to 5.9 GB.
Installs speed up by about 10%, or, if eatmydata is involved, by up to
40% - user time generally by about 50%.

Our implementations for apt and dpkg support multiple frames as used by
pzstd, so packages can be compressed and decompressed in parallel
eventually.

We are considering requesting a FFe for that - the features are not
invasive, and it allows us to turn it on by default in 18.10.

Thanks,
Balint and Julian

Raw Measurements
===============
All measurements where performed on a cloud instance of bionic, in a basic bionic schroot with overlay, on an ssd.

Kernel install (eatmydata, perf report, time spent in compression)
Kernel install (eatmydata)
12.49user 3.04system 0:12.57elapsed 123%CPU (0avgtext+0avgdata 68720maxresident)k
0inputs+1056712outputs (0major+159306minor)pagefaults 0swaps

5.60user 2.33system 0:07.07elapsed 112%CPU (0avgtext+0avgdata 81388maxresident)k
0inputs+1108720outputs (0major+171171minor)pagefaults 0swaps

firefox
4.52user 3.30system 0:33.14elapsed 23%CPU (0avgtext+0avgdata 25152maxresident)k
0inputs+544560outputs (0major+386394minor)pagefaults 0swaps

firefox eatmydata
libreoffice
11.34user 6.66system 1:18.04elapsed 23%CPU (0avgtext+0avgdata 64676maxresident)k
16inputs+1370112outputs (0major+1024989minor)pagefaults 0swaps

libreoffice eatmydata
10.86user 5.78system 0:17.70elapsed 94%CPU (0avgtext+0avgdata 64800maxresident)k
0inputs+1370112outputs (0major+1043637minor)pagefaults 0swaps

Comments

Re: zstd compression for packages

By Steve Langasek at 03/16/2018 - 18:13

Hi Julian,

Thanks for posting about this. I agree that if this is landing in dpkg+apt
upstream, it's reasonable to try to get it into the 18.04 release so that it
can be used in later releases without needing a dpkg versioned pre-depends.

If we are to evaluate using zstd as the default compression in 18.10 (or
later), I think we need to consider the total install experience, and not
just look at the dpkg unpack time.

For example:

On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
Since you don't list binary package names for kernel or libreoffice, I'll
look at firefox, which is the obvious one. The archive version of this
package is 42MiB in size in bionic. If the zstd version is 6% larger, but
takes 4 seconds less time to unpack, this means the total install time
(download+unpack) is only improved for the end user if the download speed
from the apt source is faster than (44108204 bytes * .06 * 8bits/byte / 4.03s
~=) 5.25Mbps.

Have you established that this is a typical effective download speed for
Ubuntu users? It's certainly faster than my home connection, though I also
use a local mirror to speed up installs. It may be reasonable to expect
cloud instances to have this much throughput from their mirrors, and so it
might be the sensible choice solely on that basis; I'm just checking that
it's been measured.

In other words: if we want to make this the default, we should quantify
Daniel's remark that he would prefer a 6% faster download over a 10% faster
unpack.

I think we also need to look at the spread of package size increases. If 6%
is typical, are there some packages on the high end (of both absolute package
size and relative size increase) that we should exclude from switching to
zstd? We should be transparent about our analysis here.

Thanks,

Re: zstd compression for packages

By Dimitri John Ledkov at 03/17/2018 - 11:09

On 16 March 2018 at 22:13, Steve Langasek <steve. ... at ubuntu dot com> wrote:
Well, I think it does not make sense to think about this in absolute
terms. Thinking about user stories is better.

A stable series user will be mostly upgrading packages from -security
and -updates. The download speed and/or size of debs does not matter
much in this case, as these are scheduled to be done in the background
over the course of the day, via unattended upgrades download timer.
Installation speed matters, as that is the window of time when the
system is actually somewhat in a maintenance mode / degraded
performance (apt is locked, there are CPU and disk-io loads).

New instance initialization - e.g. spinning up a cloud instance, with
cloud-init, and installing a bunch of things; deploying juju charm /
conjure-up spell; configuring things with puppet / ansible / etc =>
these are download & install heavy. However, users that do that
heavily, will be in a corporate / bussiness / datacentre environment
and thus it is reasonable to expect them to have either a fat internet
pipe, and/or a local mirror. Meaning download speed & size, are not
critical.

Then there are devel series users, developers who do sbuild builds,
etc. These users are most likely to be on slower home-user connections
and watch things a lot more closely interactively, who indeed care
about the total download+install time. These users, are most likely
very vocal / visible, but are not ultimately the target audience as to
why we develop Ubuntu in the first place. Thus I would be willing to
trade personal developer/devel-series user experience, in favor of the
stable series user. I'm not sure how much it makes sense to
proxy/cache/local-mirror devel series, if it is only a single machine
in use.

Re: zstd compression for packages

By Steve Langasek at 03/20/2018 - 20:25

On Sat, Mar 17, 2018 at 03:09:55PM +0000, Dimitri John Ledkov wrote:
Sure.

Does unattended upgrades download both -security and -updates, or does it
only download -security? From what I can see in
/usr/bin/unattended-upgrade, the allowed-origins check applies to both the
downloads and the installation.

So by default, increases in the download time of non-security SRUs would be
perceivable by the user (though perhaps not of interest).

Generally agreed (but the assertion should still be tested, not assumed).

I disagree that we don't develop Ubuntu for developers. The developer
desktop continues to be an important use case, and while it shouldn't
necessarily dominate every time there is tension between the desktop and
server use cases, it also shouldn't be ignored.

But furthermore, I think there's a separate use case you've not included
here, which is "client user selects a piece of software for installation and
wants to use it immediately". In that case, the total clock time from
expression of intent, to when the package can be used, does matter. And
it's not limited to developers of Ubuntu or people tracking the devel
series; this is relevant to the usability of the desktop in stable releases.
It is also, I would argue, the use case that is most important in terms of
its impact on user satisfaction, because it's precisely in the critical path
of a task that has the user's attention; whereas improvements to the other
use cases may improve overall efficiency, but have little or no proximate
benefit to the human user.

Re: zstd compression for packages

By Dmitrijs Ledkovs at 03/28/2018 - 09:49

On 21 March 2018 at 00:25, Steve Langasek <steve. ... at ubuntu dot com> wrote:
That's not the use case I brought up.

I said users of the devel series, aka ubuntu+1.

The compression vs download trade off, is irrelevant on the ubuntu+1
series, since the churn is so high anyway, that the only way to win,
is to not update every transition / archive push, and only
dist-upgrade weekly. And optimizing for users of ubuntu+1 is very
niche, in comparison to the stable series users.

I make no distinction among the stable series users - be that
"developers" or "not", they are all simply stable series users.

Re: zstd compression for packages

By Julian Andres Klode at 03/19/2018 - 10:03

On Sat, Mar 17, 2018 at 03:09:55PM +0000, Dimitri John Ledkov wrote:
I'd like us to have <a href="https://wiki.debian.org/Teams/Dpkg/Spec/DeltaDebs" title="https://wiki.debian.org/Teams/Dpkg/Spec/DeltaDebs">https://wiki.debian.org/Teams/Dpkg/Spec/DeltaDebs</a>
this would mostly solve that problem too.

Re: zstd compression for packages

By Balint Reczey at 03/19/2018 - 09:57

Hi All,

On Sat, Mar 17, 2018 at 3:09 PM, Dimitri John Ledkov
< ... at surgut dot co.uk> wrote:
I agree with Dimitri's analysis and I'would also like to add one more
thing to consider. During unpacking of packages the system is in a
transient state where programs may not work correctly. Minimizing the
time spent in that transient state is and important additional benefit
of speeding up decompression.

The speedup varies a lot across use cases and IMO the 10% speed
increase is an understatement for many very important use cases.

Cheers,
Balint

Re: zstd compression for packages

By Julian Andres Klode at 03/17/2018 - 09:47

On Fri, Mar 16, 2018 at 03:13:55PM -0700, Steve Langasek wrote:
We're really only considering cloud cases, as a 10% gain on non-eatmydata
cases on slower connections does not really seem worth it, right?

It's not just the firefox package, but the entire apt install firefox in a
fresh debootstrap, so including most dependencies.

Kernel was apt install linux-image-generic initramfs-tools- grub<somethng>-
libreoffice was apt install libreoffice-$foo for all $foo (calc,draw,...)

Therefore the calculations are off, and the improvement at low connection
speeds is likely not worth it.

I attached the complete analysis of size differences for main, ordered
by relative increase. There are a few huge relative increases, but only
really for tiny packages.

Re: zstd compression for packages

By Jeremy Bicha at 03/12/2018 - 10:02

On Mon, Mar 12, 2018 at 6:06 AM, Julian Andres Klode
<julian. ... at canonical dot com> wrote:
What does Debian's dpkg maintainer think?

Thanks,
Jeremy Bicha

Re: zstd compression for packages

By Colin Watson at 03/12/2018 - 10:19

On Mon, Mar 12, 2018 at 10:02:49AM -0400, Jeremy Bicha wrote:
FWIW, I'd be quite reluctant to add support for this to Launchpad until
it's landed in Debian dpkg/apt; a future incompatibility would be very
painful to deal with.

Re: zstd compression for packages

By Julian Andres Klode at 03/12/2018 - 10:36

On Mon, Mar 12, 2018 at 02:19:18PM +0000, Colin Watson wrote:
Acknowledged. I don't think we want to go ahead without dpkg upstream
blessing anyway. On the APT side, we don't maintain Ubuntu-only branches,
so if we get a go-ahead it would land in Debian immediately too.

I had a quick look at Launchpad and I think it only needs a backport of
the APT commits to an older branch (or an upgrade to bionic, but that
sounds like more work :D) but I might be wrong.

I think the format is versioned and there might be new versions eventually,
so we might have to take care eventually to only keep generating files
in an old format, but xz has the same problem.

Re: zstd compression for packages

By Colin Watson at 03/12/2018 - 11:03

On Mon, Mar 12, 2018 at 03:36:11PM +0100, Julian Andres Klode wrote:
Good.

We'll probably also need a dpkg backport (preferably in xenial-updates)
and some small changes to lib/lp/archiveuploader/. It's not hugely
difficult but will need a bit of work.

Re: zstd compression for packages

By Julian Andres Klode at 03/12/2018 - 10:11

On Mon, Mar 12, 2018 at 10:02:49AM -0400, Jeremy Bicha wrote:
We are waiting to hear from him in <a href="https://bugs.debian.org/892664" title="https://bugs.debian.org/892664">https://bugs.debian.org/892664</a> - last
time we chatted on IRC, he was open to investigating zstd.

Re: zstd compression for packages

By Robie Basak at 03/12/2018 - 09:49

On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
libzstd has only been stable in the archive since Artful. We had to SRU
fixes to Xenial because it was added to Debian (and outside
experimental) before the format was stable upstream.

Of all the general uses of a new compression algorithm, I'd expect our
distribution archival case to be near the end of a develop/test/rollout
cycle. Are you sure we want to rely on it so completely by switching to
it by default in 18.10?

Robie

Re: zstd compression for packages

By Julian Andres Klode at 03/12/2018 - 10:05

On Mon, Mar 12, 2018 at 01:49:42PM +0000, Robie Basak wrote:
So the goal is to have it in 20.04, which means we should ship it now, so
we can do upgrades from 18.04 to it. Whether we change the default in
18.10 or not, I don't know, but:

IMO, better 18.10 than later. We should gain experience with it,
and if it turns out to be problematic, we can switch the default back
and do no-change rebuilds for 20.04 :)

That said, if we have problems, I expect people using zstd in filesystems
(btrfs) or backup tools (borg) to be off worse.

Re: zstd compression for packages

By Robie Basak at 03/12/2018 - 10:15

On Mon, Mar 12, 2018 at 03:05:13PM +0100, Julian Andres Klode wrote:
Sure. I don't have any objection to making it available now for future
use (apart from the usual post-FF required care etc. which the release
team will decide upon).

I can understand why it may be a goal for 20.04, but I assume that's
subject to it having proven itself by then. So while it makes sense to
start this by default in 18.10 to flush out any issues, that also
pre-supposes that it will have proven itself in the future. A tough call
I think, and not one I have enough information to have an opinion upon.
I mention it to point out that the other side of the trade-off exists.

I think there are certain classes of possible problems for which we will
be worse off than the users in the use cases you point out. The
publication of our archives is somewhat more permanent and we can't, for
example, restore from backup using a different compression to repair our
filesystem. It's providing an *automatic* and seamless upgrade path for
affected Ubuntu users that could prove difficult. In some other cases
where users have individually opted in, a seam isn't necessarily a
problem; but it can be for us.

Robie

Re: zstd compression for packages

By Julian Andres Klode at 03/12/2018 - 06:58

On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
More links:

PPA: <a href="https://launchpad.net/~canonical-foundations/+archive/ubuntu/zstd-archive" title="https://launchpad.net/~canonical-foundations/+archive/ubuntu/zstd-archive">https://launchpad.net/~canonical-foundations/+archive/ubuntu/zstd-archive</a>
APT merge request: <a href="https://salsa.debian.org/apt-team/apt/merge_requests/8" title="https://salsa.debian.org/apt-team/apt/merge_requests/8">https://salsa.debian.org/apt-team/apt/merge_requests/8</a>
dpkg patches: <a href="https://bugs.debian.org/892664" title="https://bugs.debian.org/892664">https://bugs.debian.org/892664</a>

I'd also like to talk a bit more about libzstd itself: The package is
currently in universe, but btrfs recently gained support for zstd,
so we already have a copy in the kernel and we need to MIR it anyway
for btrfs-progs.

Re: zstd compression for packages

By Daniel Axtens at 03/12/2018 - 09:11

Hi,

I looked into compression algorithms a bit in a previous role, and to be
honest I'm quite surprised to see zstd proposed for package storage. zstd,
according to its own github repo, is "targeting real-time compression
scenarios". It's not really designed to be run at its maximum compression
level, it's designed to really quickly compress data coming off the wire -
things like compressing log files being streamed to a central server, or I
guess writing random data to btrfs where speed is absolutely an issue.

Is speed of decompression a big user concern relative to file size? I admit
that I am biased - as an Australian and with the crummy internet that my
location entails, I'd save much more time if the file was 6% smaller and
took 10% longer to decompress than the other way around.

Did you consider Google's Brotli?

Regards,
Daniel

On Mon, Mar 12, 2018 at 9:58 PM, Julian Andres Klode <

Re: zstd compression for packages

By King InuYasha at 03/12/2018 - 09:30

On Mon, Mar 12, 2018 at 9:11 AM, Daniel Axtens
<daniel. ... at canonical dot com> wrote:
I can't speak for Julian's decision for zstd, but I can say that in
the RPM world, we picked zstd because we wanted a better gzip.
Compression and decompression times are rather long with xz, and the
ultra-high-efficiency from xz is not as necessary as it used to be,
with storage becoming much cheaper than it was nearly a decade ago
when most distributions switched to LZMA/XZ payloads.

zstd also provides the necessary properties to make it chunkable and
rsyncable, which is useful for metadata. For package payloads, there
are things we can do to make compression go much faster than it does
now (and it's still quite a bit faster than xz as-is and somewhat
faster than gzip now).

I don't know for sure if Debian packaging allows this, but for RPM, we
switch to xz payloads when the package is sufficiently large in which
the compression/decompression speed isn't really going to be matter
(e.g. game data). So while most packages may not necessarily be using
xz payloads, quite a few would. That said, we've been xz for all
packages for a few years now, and the main drag is the time it takes
to wrap everything up to make a package.

As for Google's Brotli, the average compression ratio isn't as high as
zstd, and is markedly slower. With these factors in mind, the obvious
choice was zstd.

(As an aside, rpm in sid/buster and bionic doesn't have zstd support
enabled... Is there something that can be done to make that happen?)

Re: zstd compression for packages

By Julian Andres Klode at 03/12/2018 - 10:09

On Mon, Mar 12, 2018 at 09:30:16AM -0400, Neal Gompa wrote:
I want zstd -19 as an xz replacement due to higher decompression speed,
and it also requires about 1/3 less memory when compressing which should
be nice for _huge_ packages.

We could. But I don't think it matters much.

I'd open a wishlist bug in the Debian bug tracker if I were you. If
we were to introduce a delta, we'd have to maintain it...

Re: zstd compression for packages

By King InuYasha at 03/12/2018 - 10:15

On Mon, Mar 12, 2018 at 10:09 AM, Julian Andres Klode
<julian. ... at canonical dot com> wrote:
On a pure space efficiency basis, zstd -19 is still not as good as xz
-9, but it's pretty darned good.

Maybe not. It was useful a long time ago, now we don't really care
either, as we use xz across the board (for the moment).

Hence asking about sid/buster and bionic. :)

My previous experience with debbugs is that it's a black hole. We'll
see if it's better this time.

Re: zstd compression for packages

By Balint Reczey at 03/12/2018 - 10:43

Hi Daniel,

On Mon, Mar 12, 2018 at 2:11 PM, Daniel Axtens
<daniel. ... at canonical dot com> wrote:
Yes, decompression speed is a big issue in some cases. Please consider
the case of provisioning cluoud/container instances, where after
booting the image plenty of packages need to be installed and saving
seconds matter a lot.

Zstd format also allows parallel decompression which can make package
installation even quicker in wall-clock time.

Internet connection speed increases by ~50% (according to this [3]
study which matches my experience) on average per year which is more
than 6% for every two months.

We did consider it but it was less promising.

Cheers,
Balint

[3] <a href="http://xahlee.info/comp/bandwidth.html" title="http://xahlee.info/comp/bandwidth.html">http://xahlee.info/comp/bandwidth.html</a>

Re: zstd compression for packages

By Daniel Axtens at 03/12/2018 - 21:07

On Tue, Mar 13, 2018 at 1:43 AM, Balint Reczey <balint. ... at canonical dot com>
wrote:

AFAICT, [3] is anecdotal, rather than a 'study' - it's based on data from 1
person living in California. This is not really representative. If we look
at the connection speed visualisation from the Akamai State of the Internet
report [4], it shows that lots and lots of countries - most of the world! -
has significantly slower internet than that person.

(FWIW, anecdotally, I've never had a residential connection get faster
(except when I moved), which is mostly because the speed of ADSL is pretty
much fixed. Anecdotal reports from users in developing countries, and rural
areas of developed countries are not encouraging either: [5].)

Having said that, I'm not unsympathetic to the usecase you outline. I just
am saddened to see the trade-offs fall against the interests of people with
worse access to the internet. If I can find you ways of saving at least as
much time without making the files bigger, would you be open to that?

Regards,
Daniel

[4]
<a href="https://www.akamai.com/uk/en/about/our-thinking/state-of-the-internet-report/state-of-the-internet-connectivity-visualization.jsp" title="https://www.akamai.com/uk/en/about/our-thinking/state-of-the-internet-report/state-of-the-internet-connectivity-visualization.jsp">https://www.akamai.com/uk/en/about/our-thinking/state-of-the-internet-re...</a>
[5] <a href="https://danluu.com/web-bloat/" title="https://danluu.com/web-bloat/">https://danluu.com/web-bloat/</a>

Re: zstd compression for packages

By Benjamin Tegge at 03/14/2018 - 11:09

Am Dienstag, den 13.03.2018, 12:07 +1100 schrieb Daniel Axtens:
I want to mention that you can enable ultra compression levels 20 to 22
in zstd which usually achieve results comparable to the highest
compression levels of xz. There should be a level that matches the
results of xz -6 while still being faster than it.

Best regards,
Benjamin

Re: zstd compression for packages

By Julian Andres Klode at 03/14/2018 - 13:14

On Wed, Mar 14, 2018 at 04:09:27PM +0100, Benjamin Tegge wrote:
Ultra compression is unusable, it requires about 10 times the memory
or something to decompress.