DevHeads.net

yum-presto not on by default

Comments

Re: yum-presto not on by default

By Peter Lemenkov at 09/24/2009 - 23:53

2009/9/23 Jonathan Dieter < ... at gmail dot com>:

Agree with you. For me - traffic and space for rpms are cheap, while
rebuilding rpms is slow.

Re: yum-presto not on by default

By Bill Nottingham at 09/24/2009 - 00:05

Jonathan Dieter (<a href="mailto: ... at gmail dot com"> ... at gmail dot com</a>) said:

Stats of the day...

I took an existing xz-compressed RPM, and built a newer release of that
package with varying xz levels. The numbers are:

- the new package size at that XZ compression level
- the time to create a new deltarpm from a delta to that new version

level 1
1416k

real 0m0.935s
user 0m0.893s
sys 0m0.036s

level 2
952k

real 0m0.839s
user 0m0.790s
sys 0m0.048s

level 3
848k

real 0m2.762s
user 0m2.702s
sys 0m0.055s

level 4
832k

real 0m2.902s
user 0m2.817s
sys 0m0.084s

level 5
824k

real 0m3.269s
user 0m3.131s
sys 0m0.128s

level 6
812k

real 0m4.364s
user 0m4.026s
sys 0m0.164s

level 7 (what we do now)
812k

real 0m4.698s
user 0m4.516s
sys 0m0.176s

So... just set the xz compression level to 2, let it be that way for future
builds, and go about our business?

Bill

Re: yum-presto not on by default

By Adam Williamson at 10/04/2009 - 14:05

Quick follow-up on this issue: I heard that this part at least has been
done, and it certainly seems to have done the trick. Delta rebuild
speeds have increased approx ten-fold here, and are now faster than my
(pretty quick) download speeds.

Re: yum-presto not on by default

By Eric Springer at 10/04/2009 - 15:42

Is it possible to do the rebuilds in parallel? I noticed that only one
of my four cores was used. And that on smolt 63% of people have a
dual-core or greater, so it could lead to massive speed-ups as well.

Although, personally I would be happy even if the rebuild was a tenth
the current speed -- as for me the priority is reducing the load on
the network link. Which brings me to the next point:

Would it be possible to have a diff on the filelist db? It seems like
a very large download for something that would change very little.

Re: yum-presto not on by default

By shmuel siegel at 10/05/2009 - 12:32

Also the rawhide db itself. A guaranteed 8-12 megabyte download usually
swamps out any saving from presto (barring changes to eclipse, kernel
and openoffice).

Re: yum-presto not on by default

By Ahmed Kamal at 10/05/2009 - 13:32

Why aren't we rsync'ing that 12MB db, instead of re-downloading ? Wasn't
there some web friendly rsync fork

Re: yum-presto not on by default

By James Antill at 10/05/2009 - 14:35

zsync, see the recent thread for why it isn't in Fedora.

And I doubt they'd be much savings, even if it worked (and you'd have
to be clever due to the names changing). Adding delta MD support would
be "easier", and by easier I mean significant amounts of code and
significant problems on the f-i side.

Re: yum-presto not on by default

By Bill Nottingham at 10/05/2009 - 13:37

Ahmed Kamal (<a href="mailto:email. ... at googlemail dot com">email. ... at googlemail dot com</a>) said:

rsync over http/ftp? We don't really have that.

Bill

Re: yum-presto not on by default

By John Reiser at 10/04/2009 - 15:51

The speedup would be noticeably less than the number of cores.
xz uses a history+search table that is significantly larger than dcache.
There is competition for memory bandwidth, not just CPU+cache cycles.

Agreed, this would be a 99% savings.

Re: yum-presto not on by default

By Ben Boeckel at 10/04/2009 - 19:08

Hash: SHA256

seems like

little.

Relatedly, I am amused every time the presto metadata is larger
than the resultant package itself (usually 700k or less). Maybe
there could/should be some heuristic for this?

- --Ben

Re: yum-presto not on by default

By drago01 at 10/04/2009 - 15:45

xz is not threaded yet, I did some benchmark on my core i7 systems:

<a href="http://193.200.113.196/apache2-default/res" title="http://193.200.113.196/apache2-default/res">http://193.200.113.196/apache2-default/res</a>

pbzip2 and pigz are threaded (compare them with bzip2 and gzip in the
output to see what kind of speedup is possible).

Re: yum-presto not on by default

By Warren Togami at 10/04/2009 - 16:21

Although that isn't a problem if multiple packages are reconstructed
simultaneously.

Warren

Re: yum-presto not on by default

By Kevin Kofler at 09/26/2009 - 02:49

We still need to fix the "noarch bug" (i.e. xz compression being endianness-
specific and breaking deltas of noarch packages).

Kevin Kofler

Re: yum-presto not on by default

By Adam Williamson at 09/24/2009 - 11:18

sounds good to me.

Re: yum-presto not on by default

By Toshio Kuratomi at 09/24/2009 - 09:05

One further question -- where does libz compression fit into these stats?

-Toshio

Re: yum-presto not on by default

By Bill Nottingham at 09/24/2009 - 18:56

Toshio Kuratomi (<a href="mailto:a. ... at gmail dot com">a. ... at gmail dot com</a>) said:

zlib compressed rpm was 1756k; applydeltarpm time ~1.3 seconds. (Yes, that
implies xz -2 is faster.)

Bill

Re: yum-presto not on by default

By John Reiser at 09/24/2009 - 09:07

That's the best one-size-fits-all policy. We can do better because the
current xz compression algorithm is at least as bad as O(n*n). Restrict the
compression level to 2 for large .rpm, but use a higher level for smaller .rpm.
This will tend to avoid the largest time penalties yet still produce smaller
files for most .rpm.

Out of 4042 .rpm in my local cache (both i686 and x86_64):
number size
60 >= 10MB
138 >= 5MB
235 >= 3MB
344 >= 2MB
635 >= 1MB
720 >= 800KB
921 >= 500KB
1229 >= 300KB
1503 >= 200KB
2214 >= 100KB

Use something such as:
______size_____ level
< 200KB 7 (current)
200KB - 500KB 4

On top of that, there could be a sliding scale based on release date.
From general availability release to next alpha, limit the level to 2
for all .rpm. From alpha to beta, limit the level to 4. From beta
to next general availability, use the size table. This tends to avoid
large time penalties for the cases most likely to be seen by end users,
yet still tends to give better compression for a full release.

[Of course, fix the *STUPID* endianness property.]

Re: yum-presto not on by default

By Bill Nottingham at 09/24/2009 - 19:02

Given that it's a macro set at build time, where it has no knowledge yet
of the output side, I'd consider this if you can find a logical way to
input it into the system. I can't think of a good one off the top of my
head.

Bill

Re: yum-presto not on by default

By Michal Schmidt at 09/23/2009 - 03:51

Dne Wed, 23 Sep 2009 07:04:23 +0300 Jonathan Dieter napsal(a):

Do I understand it right that yum-presto compresses the data and then
passes them to rpm which decompresses them back again?
Why? Is it because it's currently the only way to verify
checksums/signatures?

Michal

Re: yum-presto not on by default

By Michael Schroeder at 09/23/2009 - 03:53

Yes, exactly. If you want to support uncompressed rpms, you'll
have to put the checksum of the uncompressed rpm in the metadata.
The deltarpm changes are quite easy, it just has to remove all
signatures from the rpm header.

Cheers,
Michael.

Re: yum-presto not on by default

By drago01 at 09/23/2009 - 03:49

We had a IRC discussion about this yesterday ... it is not yum-presto
but delta rpm and it does not make sense at all.
It should just create uncompressed rpms (assuming rpm can handle them
which it should) ...according to Seth yum does not care whether the
rpms are compressed or not.

So yes the compression is a useless step here.

Re: yum-presto not on by default

By James Antill at 09/23/2009 - 09:20

No, we have at least 3 problems I think:

1. Nobody wants to download uncompressed rpms, if they don't have
presto.

2. gig signature is over the rpm data (and thus. is over compressed
data).

3. createrepo sha256 data is over the entire rpm (and thus. is over
compressed data).

...but to me this is all a _problem_in_xz_, not presto/deltarpms. If
nobody can fix xz before F12 GA then IMNSO we should revert the
compression to something that works ... the minor savings in xz
compression isn't worth as much as delta's.

Re: yum-presto not on by default

By drago01 at 09/23/2009 - 10:08

Does not matter which compression algorithm we use creating a
compressed rpm just to uncompressed it again shortly after that is a
waste of cycles/power/time.
As for the GPG signature ... can't the drpm itself be signed?
So we would only need to check that, rather than the rebuilt rpm if we
don't trust the files on the disk we already lost anyway (box is
compromised).

Re: yum-presto not on by default

By Kevin Kofler at 09/26/2009 - 02:53

If the metadata is getting signed, it basically is already. The metadata
contains a checksum of the DRPM, so if the metadata passes the signature
check and the DRPM matches the checksum, the DRPM's integrity and
uncompromisedness is verified. So I think it's safe to disable the checksum
check for the rebuilt RPMs entirely.

Kevin Kofler

Re: yum-presto not on by default

By drago01 at 09/26/2009 - 03:27

Well if this is the case then we can simply not compress the generated
rpms, problem solved.

Re: yum-presto not on by default

By Seth Vidal at 09/23/2009 - 10:11

We'd need to do that signing which would take, umm, forever.

-sv

Re: yum-presto not on by default

By drago01 at 09/23/2009 - 10:12

What? You mean at compose time? (Signing on the client side would not
make much sense)

Re: yum-presto not on by default

By Seth Vidal at 09/23/2009 - 10:14

I mean on the server/repo side. The steps we'd need to do for a full
release would be:

1. compose tree
2. sign pkgs in tree
3. make deltarpms of pkgs vs older tree
4. sign deltarpms
5. generate repository metadata

that would take a long time.

-sv

Re: yum-presto not on by default

By drago01 at 09/23/2009 - 10:26

Yeah but if you take into account the time saved on x clients it would
be worth it (assume x is very high).
How long would the extra signing process take?

Re: yum-presto not on by default

By Seth Vidal at 09/23/2009 - 10:33

figure at leastas long as it takes to gpg sign the distro now - probably
longer b/c there are potentially MORE drpms.

Finally, this would mean we're generating rpms from the
deltarpms that do not match the original rpms.

That may or may not be fatal but it is feels weird.

-sv

Re: yum-presto not on by default

By Till Maas at 09/23/2009 - 09:54

I guess there won't be any deltarpms to update from F11 to F12, so
afaics it would be enough to only switch back to gzip payload for
everthing that is going into F12 updates(-testing). Is this true?

Then we would still save space on the iso images and for the Everything
Repository, but deltarpms would still be possible.

Regards
Till

Re: yum-presto not on by default

By Seth Vidal at 09/23/2009 - 09:58

We'd need a mass rebuild to all pkgs in rawhide/f12-candidate to shake Xz
out of the payload compression.

That's non-trivial.

-sv

Re: yum-presto not on by default

By Till Maas at 09/23/2009 - 10:22

Why do we need a mass rebuild? Afaics it is only needed to change the
compression back to gzip at the time of the final freeze, so that all
newly build packages (which are the ones going to F12-updates(-testing)
are build with gzip compression. Then delta rpms can be created from F12
Everything to F12 updates-testing. For packages that are built to break
the freeze, there could be a seperate build target tag, that still uses
the xz compression.

This is all under the assumption, that delta rpm creation from a xz
compressed rpm to a gzip compressed rpm works.

Regards
Till

Re: yum-presto not on by default

By James Antill at 09/23/2009 - 23:46

Yeh, I don't know the answer to that. I'd _guess_ that it would work,
but someone needs to try it.
This would mean that drpms on rawhide will still suck upto F12, but I
could live with that :).
I assume we don't do F11 => F12 drpms?

On the other side of it, does anyone have any stats. on how much was
saved by using Xz instead of bzip2? -- Ie. what we'd lose if we did a
mass rebuild.

Re: yum-presto not on by default

By shmuel siegel at 09/24/2009 - 00:47

The article also hints at our problem. We ARE doing the compression on
the end user side. So the compression is costing us 3 minutes to save 24
megabytes of transmission. This actually slows things down for most
broadband users.

Re: yum-presto not on by default

By Ben Boeckel at 09/24/2009 - 08:26

Hash: SHA256

compression on

to save 24

most

Since when was yum-presto about time? I thought it was about
bandwidth usage. Here, the dorm connections are capped at
600kb/s (well, not a hard cap, but it can be annoying anyways).
At one university I know (with over 20k students on the main
campus), there is (or was last year at least) a cap at 2GB /
week. Go over and you're capped at 56k for the rest of the
semester. I can't imagine Fedora on such a restriction (and I
have 4 machines to update, 2 with largely non-overlapping
package sets, the other 2 are similar and a caching server would
help) and that's a lot of students that would be hard pressed to
use Fedora at college. CPU time is cheaper than bandwidth these
days. Maybe I'm mistaken about what yum-presto was aiming to
solve?

- --Ben

Re: yum-presto not on by default

By Seth Vidal at 09/24/2009 - 15:00

it's not about local cpu But if we make presto be on by default and the
local performance is so bad for people with fast connections that it is
almost unusuable then we have a problem.

So the idea is:

1. make performance not suck
2. maybe not make presto the default anyway.

Now #1 is obvious, I think :)

#2 is about the way someone would use the system. If I'm a place where I
know the bandwidth is questionable then I figure immediately after install
I can run: yum install yum-presto and be ready to go.

Or, we install yum-presto by default but disable it. So the first thing
someone with bandwidth issues does is enable the plugin.

i think that's what this is all about.

-sv

Re: yum-presto not on by default

By Matthias Clasen at 09/24/2009 - 15:15

Neither of these will happen because they require esoteric knowledge of
yum plugins that users don't have. So if we turn it off by default, it
will not be used by a significant percentage of the people for whom it
is beneficial. And all the infrastructure cost we put into maintaining
delta rpms is effectively wasted...

Re: yum-presto not on by default

By Chris Adams at 09/24/2009 - 15:32

Once upon a time, Matthias Clasen < ... at redhat dot com> said:

If it impacts update performance by default, then it should be off, or
we'll have another thing added to the oft-repeated "Fedora fixes" list
(like "yum remove pulseaudio", "turn off SELinux", etc.).

Lots of users are connected to high-speed Internet, so their update
performance should not be impacted to help those that are not as well
connected.

Re: yum-presto not on by default

By Yuan Yijun at 09/25/2009 - 00:35

2009/9/25 Chris Adams < ... at hiwaay dot net>:

This one does not hurt. If installed by default, the high bandwidth
guys can do this to remove it and never need to bring it back, unlike
for pulseaudio and SELinux.

For slow connection people like me, one package being constructed on
my disk is better than downloading goes for ever.

This is an example of better hardware condition is actively seeking to
prevent innovation ("not my problem").

Re: yum-presto not on by default

By Jason L Tibbitts III at 09/25/2009 - 00:56

YY> This is an example of better hardware condition is actively seeking
YY> to prevent innovation ("not my problem").

Regardless of how this comes out, I fail to understand how you can say
that when the entire system for generating these deltarpms exists and
nobody is talking about removing it. How is that preventing anything at
all?

- J<

Re: yum-presto not on by default

By Yuan Yijun at 09/25/2009 - 02:37

2009/9/25 Jason L Tibbitts III < ... at math dot uh.edu>:

I'm sorry for my words. From my limited understanding, gzip is well
known for its speed, while xz compression makes some broadband client
unhappy. The point is to fix xz compresion. Well I really cannot find
a good technical solution, but there is a chance to improve if more
people used it.

Re: yum-presto not on by default

By drago01 at 09/25/2009 - 02:41

Well there are three options here:

1) Not compress at all (does not work due to signing issues)
2) Threaded compression (not implemented yet)
3) Use a lower compression level (probably the best option we have right now).

Re: yum-presto not on by default

By Adam Williamson at 09/24/2009 - 15:39

if we fix xz to use a sensible compression level, it won't be. Well,
unless you mean 'positively impacted'. :)

Re: yum-presto not on by default

By Andre Robatino at 09/24/2009 - 15:50

Does yum-presto make use of multiple cores when rebuilding the RPMs?
(My machines only have one, so I can't tell.)

Re: yum-presto not on by default

By Seth Vidal at 09/24/2009 - 15:53

other than the xz compression part I believe most of the time is
disk bound, not cpu bound.

-sv

Re: yum-presto not on by default

By Ben Boeckel at 09/24/2009 - 18:23

Hash: SHA256

RPMs?

The rpmrebuild operation has saturated one of my 3.0GHz Intel Core
2 Duo cores for at least a minute on large updates, so my guess is
that that part is not multithreaded and CPU bound.

- --Ben