DevHeads.net

Replacing glibc langpacks

I'm investigating whether it makes sense to switch to a scheme where the
glibc locale data is built from source, during package installation,
based on the langpack configuration system. This is similar to what
Debian does.

The reason is that the compressed locale source code (without the
charmaps, which are not strictly needed once we patch localedef) is
smaller than the subset of locales of a langpack package which people
actually. For example, glibc-langpack-en on Fedora 29 is 6.7 MiB when
installed, but en_US.utf8 is 2.9 MiB, and the locale sources are
3.4 MiB, so even the common case realizes a small saving.

For the installer, the savings might be much larger. If we can teach
anaconda to generate the appropriate locale only after the user has
selected the language, then we no longer need the full locale archive in
the installation image (and in RAM).

Thanks,
Florian

Comments

Re: Replacing glibc langpacks

By =?ISO-8859-2?Q?... at 06/04/2019 - 03:42

Dne 27. 05. 19 v 11:34 Florian Weimer napsal(a):
I cannot comment on this one, but Debian world also have this:

<a href="https://manpages.ubuntu.com/manpages/precise/man8/localepurge.8.html" title="https://manpages.ubuntu.com/manpages/precise/man8/localepurge.8.html">https://manpages.ubuntu.com/manpages/precise/man8/localepurge.8.html</a>

and I used to use this a lot and it saved a lot of disk space on target instance. It will not help you with installer
size thou.

Re: Replacing glibc langpacks

By King InuYasha at 06/03/2019 - 09:05

On Mon, May 27, 2019 at 5:36 AM Florian Weimer < ... at redhat dot com> wrote:
I'm generally opposed to this because it introduces a scriptlet
requirement fairly early on in the system and I don't consider it to
be significant enough. If we wanted to have savings here, we should
look at encoding finer-grained locale attributes to the files in the
package file list so that rpm locale filters can strip them.

Even without this, I don't think the savings are worth it as you propose.

Re: Replacing glibc langpacks

By Nico Kadel-Garcia at 05/27/2019 - 17:14

On Mon, May 27, 2019 at 5:36 AM Florian Weimer < ... at redhat dot com> wrote:
May I wildly discourage this? It's too sensitive to local libraries
and binary updates, and reduces stability for what should be a very
stable package.

Re: Replacing glibc langpacks

By =?UTF-8?Q?Tomas... at 05/27/2019 - 14:46

On Mon, 27 May 2019 at 10:41, Florian Weimer < ... at redhat dot com> wrote:
In other words your proposition is *not* about not any kind of
reduction size but increase size of installed resources because those
binary files which needs to be present will be increased by source of
those binary files. Other thing is that generating those files on
install-time elongates install time.

Remember that dpkg does not have any kind equivalent of rpm %lang()
tagging in packages descriptions.
In that exactly context Fedora still does not properly setups
/etc/rpm/macros::%_install_langs macro and instead setting that macro
during install-time provides langpack packages (which IMO is at least
engendering/design mistake/misunderstanding).

kloczek

Re: Replacing glibc langpacks

By Florian Weimer at 05/27/2019 - 15:13

* Tomasz Kłoczko:

2.9 MiB (compiled en_US.utf8 locale) plus 3.4 MiB (compressed locale
sources without charmaps) is 6.3 MiB, which is less than 6.7 MiB
(current installed glibc-langpack-en size).

Thanks,
Florian

Re: Replacing glibc langpacks

By Zbigniew =?utf-... at 06/03/2019 - 03:43

On Mon, May 27, 2019 at 09:13:50PM +0200, Florian Weimer wrote:
Hmm, the tradeoff is not very convincing: doing install-time shenanigans
to save 400k doesn't seem like a great deal.

Do I understand correctly, that the saving essentially comes from the fact
that current glibc-langpack-en contains 14 localized variants (AU, BW, ZA,
US, ...), and only a subset of those could be generated in your proposal?
If so, would simply splitting glibc-langpack-en further into subpackages
be an alternative? E.g. glibc-langpack-en-US, glibc-langpack-en-AU, ... ?

Zbyszek

Re: Replacing glibc langpacks

By Florian Weimer at 06/03/2019 - 08:59

* Zbigniew Jędrzejewski-Szmek:

localedef currently reads character conversion tables from charmap files
under /usr/share/i18n/charmaps. The same information is contained in
the gconv modules unconditionally installed under /usr/lib*/gconv.

In theory, yes, but that would result in a few dozen more langpack
packages.

The other variance is the supported single-byte charset (UTF-8,
ISO-8859-1, ISO-8859-15).

Thansk,
Florian

Re: Replacing glibc langpacks

By Zbigniew =?utf-... at 06/03/2019 - 09:50

On Mon, Jun 03, 2019 at 02:59:13PM +0200, Florian Weimer wrote:
Hmm, so maybe that's the way to go: split each langpack into
glibc-langpack-XX and glibc-langpack-XX-legacy. Not installing -legacy
will halve the disk usage, no?

Zbyszek

Re: Replacing glibc langpacks

By Florian Weimer at 06/03/2019 - 10:06

* Zbigniew Jędrzejewski-Szmek:

This will nearly double the number of langpack packages needed by glibc.
We also use hard links to share identical files across locales—compare
the output of “du -hcs /usr/lib/locale/en_*”, “du -hcsl
/usr/lib/locale/en_*”, “du -hcs /usr/lib/locale/en_US.utf8/” and finally
“du -hcs /usr/lib/locale/en_US{,.utf8}/”.

In short, there's 6.7 MiB today, 2.9 MiB for UTF-8 only, and 3.2 MiB for
UTF-8 and ISO-8859-1. (I don't think skipping en_US is realistic.)

Thanks,
Florian

Re: Replacing glibc langpacks

By =?UTF-8?Q?Tomas... at 05/27/2019 - 15:38

On Mon, 27 May 2019 at 20:13, Florian Weimer < ... at redhat dot com> wrote:
[.,]
It should be possible to minimise this size by use proper %lang(en_US) tagging.
Only this and nothing more.
Nevertheless Fedora is not using rpm as it is designed .. shame but
that is only cause of what is seen as the issue in this context.

kloczek

Re: Replacing glibc langpacks

By =?ISO-8859-1?Q?... at 05/27/2019 - 07:26

Because Fedora is binary distribution, I think we should have everything
prebuilt and packaged. If we followed the path you propose, we would end
up with Gentoo.

Vít

Dne 27. 05. 19 v 11:34 Florian Weimer napsal(a):

Re: Replacing glibc langpacks

By Florian Weimer at 05/27/2019 - 08:51

* Vít Ondruch:

We do not pre-package the contents of /etc/ld.so.cache, either. And
/usr/lib/locale/locale-archive in glibc-all-langpacks is generated at
installation time, too.

Thanks,
Florian

Re: Replacing glibc langpacks

By Hans de Goede at 05/27/2019 - 06:40

Hi,

On 27-05-19 11:34, Florian Weimer wrote:
Interesting idea, my first thoughts on this are that doing this
during installation time feels wrong. How are you going to figure
out for which languages to generate the locale data ? The language
can differ per user. e.g. on my system the system language is nl_NL,
for testing purposed, but I greatly prefer to have my apps in English,
so for the hans user it is en_US.

Even if you check the lang setting for all users during install time,
it may change later at a per user level an new users may be added
after install time.

Thinking out loud here, if we go this route I think the data should be
under say /var/cache/locale and be generated on demand. E.g.
/var/cache/locale could be owned by a locale user/group and the binary
to generate these files could be suid or sgid locale; then glibc could
start this helper on demand if necessary. This would also remove the
need to add some support / hack to anaconda for this.

Regards,

Hans

p.s.

An alternative to a suid/sgid helper would be a dbus activated service,
with an idle timeout to make it stop after it has been unused for a while.

Re: Replacing glibc langpacks

By Florian Weimer at 05/27/2019 - 08:49

* Hans de Goede:

Today, you need to install multiple langpacks to cover this case. If we
can detect the requested langpacks at %post time (or in a trigger), then
we could mirror the current behavior.

That looks a bit overengineered to me, and it would drive up
installation size again. We would also end up with diverging approaches
for container images (where the D-Bus daemon might not even run) and
other use cases.

Thanks,
Florian

Re: Replacing glibc langpacks

By Hans de Goede at 05/27/2019 - 09:31

Hi,

On 27-05-19 14:49, Florian Weimer wrote:
True (determine languages from installed langpacks), but that does not
cover the anaconda and livecd cases. If we do this ondemand, we could also
gain some space on the livecd.
How would this driver up installation size, we need a tool for
generating the data anyways and whether the files live under
/usr/share/locale or under /var/cache/local does not change their
size.

I admit that the on demand approach has issues for containers and
atomic.

Anyways if you go this route, I do have 2 requests:

1) Please put the files under /var/cache/local, we really need to stop
putting generated files under /usr

2) Please use a trigger for generating the files rather then %post
scripts, AFAIK we are working towards eliminating scripts as much as
possible.

Regards,

Hans