DevHeads.net

mass-removal of LANG=anything-not-C.UTF-8 in packages

Dear maintainers,

I'm working again on implementing
<a href="https://fedoraproject.org/wiki/Changes/Remove_glibc-langpacks-all_from_buildroot" title="https://fedoraproject.org/wiki/Changes/Remove_glibc-langpacks-all_from_buildroot">https://fedoraproject.org/wiki/Changes/Remove_glibc-langpacks-all_from_b...</a>.
The first step is to replace LC_ALL=en_US.UTF-8 with LC_ALL=C.UTF-8
(and similarly for LANG=, LC_CTYPE=, etc.) in all spec files. This
will be backwards and forwards compatible, in the sense that packages
that use C.UTF-8 should build OK on older and newer Fedoras.

Once that's done, I'll file the PRs to actually replace glibc-langpacks-all
with glibc-minimal-langpacks in mock and koji.

I'll do a mass update to use C.UTF-8 for the packages in the list that
follows, next week. I'll do test builds locally, and I'll only push to
dist-git if the local builds succeed. Let me know if you want your
package to be excluded.

Zbyszek

Maintainers by package:
OpenTK orphan
apache-poi gil lef mbooth
ardour5 nphilipp tartina
bash-completion mrunge ooprala sheltren svashisht
borgbackup bpereto fschwarz
clover2 mtasaka
elixir codeblock fnux martinlanghoff puiterwijk s4504kr
fail2ban athimm atkac jgu orion
fantasdic mtasaka
felix-osgi-core jcapik mizdebsk
fmf psss
gfan jjames pcpa tremble
ghc petersen
git amahdal besser82 chrisw pcahyna pstodulk skisela tmz
hibernate3 gil lef
hive coolsvap moceap orphan pmackinn
hunspell-az caolanm
hunspell-fa caolanm
hunspell-ga caolanm
hunspell-gv caolanm
hunspell-ky caolanm
ibus-typing-booster anishpatil mfabian
ipython churchyard cstratak dcantrel ignatenkobrain lbalhar mrunge salimma tomspur
jblas zbyszek
kgb-bot averi
langtable mfabian
lazygal rathann
libmp4v2 amigadave moezroy rathann sergiomb thias
libraqm moceap
migemo mtasaka
# migemo is special, I'll just add BR:glibc-langpack-ja
mongodb hhorak jpacner maxamillion mskalick panovotn strobert tdawson
openqa adamwill
paraview deji orion sagitter
passenger jkaluza kanarip tdawson
php-horde-Horde-Imap-Client remi
php-horde-Horde-JavascriptMinify remi
php-horde-Horde-Util remi
php-kdyby-events orphan
php-kdyby-strict-objects orphan
php-latte orphan
php-nette-application orphan
php-nette-bootstrap orphan
php-nette-caching orphan
php-nette-component-model orphan
php-nette-database orphan
php-nette-deprecated orphan
php-nette-di orphan
php-nette-finder orphan
php-nette-forms orphan
php-nette-http orphan
php-nette-mail orphan
php-nette-neon orphan
php-nette-php-generator orphan
php-nette-reflection orphan
php-nette-robot-loader orphan
php-nette-safe-stream orphan
php-nette-security orphan
php-nette-tester orphan
php-nette-tokenizer orphan
php-nette-utils orphan
php-phpspec remi siwinski
php-tracy orphan
pyp2rpm bkabrda cstratak ishcherb kevin mcyprian rkuska
python-CommonMark jujens
python-acoustid terjeros
python-blessed abompard aviso
python-click cstratak fab mstuchli rkuska
python-deprecation jpena
python-django bkabrda churchyard jdornak mrunge salimma sgallagh
python-djangoql vkrizan
python-evic besser82
python-execnet ktdreyer thm
python-ipython_genutils orion
python-mapnik tomh
python-mtg tc01
python-musicbrainzngs amluto
python-pankoclient pkilambi
python-path laxathom
python-pexpect amcnabb fabiand ignatenkobrain radez tomspur
python-pypandoc orion zbyszek
python-pytest-pep8 cstratak orion
python-pythonz-bd mcyprian orphan
python-seesaw tc01
python-setproctitle hguemar stevetraylen
python-setuptools_git apevec
python-sphinx-autodoc-typehints tdecacqu
python-sphinx-intl jujens
python-spur orion
python-tables tnorth zbyszek
python-vcstools cottsay rmattes
python-webassets dcallagh kumarpraveen pjp sundaram
python-whitenoise piotrp
python2-django1.11 pviktori
python2-ipython lbalhar
rubygem-gettext mtasaka sseago
rubygem-http_parser.rb ilgrad spredzy
rubygem-nokogiri kanarip mtasaka tdawson tremble
rubygem-org-ruby vondruch
rubygem-ruby-openid orphan
udiskie jstanek
varnish ingvar luhliarik
xorg-x11-drv-intel ajax glisse

Packages by maintainer:
abompard python-blessed
adamwill openqa
ajax xorg-x11-drv-intel
amahdal git
amcnabb python-pexpect
amigadave libmp4v2
amluto python-musicbrainzngs
anishpatil ibus-typing-booster
apevec python-setuptools_git
athimm fail2ban
atkac fail2ban
averi kgb-bot
aviso python-blessed
besser82 git python-evic
bkabrda pyp2rpm python-django
bpereto borgbackup
caolanm hunspell-az hunspell-fa hunspell-ga hunspell-gv hunspell-ky
chrisw git
churchyard ipython python-django
codeblock elixir
coolsvap hive
cottsay python-vcstools
cstratak ipython pyp2rpm python-click python-pytest-pep8
dcallagh python-webassets
dcantrel ipython
deji paraview
fab python-click
fabiand python-pexpect
fnux elixir
fschwarz borgbackup
gil apache-poi hibernate3
glisse xorg-x11-drv-intel
hguemar python-setproctitle
hhorak mongodb
ignatenkobrain ipython python-pexpect
ilgrad rubygem-http_parser.rb
ingvar varnish
ishcherb pyp2rpm
jcapik felix-osgi-core
jdornak python-django
jgu fail2ban
jjames gfan
jkaluza passenger
jpacner mongodb
jpena python-deprecation
jstanek udiskie
jujens python-CommonMark python-sphinx-intl
kanarip passenger rubygem-nokogiri
kevin pyp2rpm
ktdreyer python-execnet
kumarpraveen python-webassets
laxathom python-path
lbalhar ipython python2-ipython
lef apache-poi hibernate3
luhliarik varnish
martinlanghoff elixir
maxamillion mongodb
mbooth apache-poi
mcyprian pyp2rpm python-pythonz-bd
mfabian ibus-typing-booster langtable
mizdebsk felix-osgi-core
moceap hive libraqm
moezroy libmp4v2
mrunge bash-completion ipython python-django
mskalick mongodb
mstuchli python-click
mtasaka clover2 fantasdic migemo rubygem-gettext rubygem-nokogiri
nphilipp ardour5
ooprala bash-completion
orion fail2ban paraview python-ipython_genutils python-pypandoc python-pytest-pep8 python-spur
orphan OpenTK hive php-kdyby-events php-kdyby-strict-objects php-latte php-nette-application php-nette-bootstrap php-nette-caching php-nette-component-model php-nette-database php-nette-deprecated php-nette-di php-nette-finder php-nette-forms php-nette-http php-nette-mail php-nette-neon php-nette-php-generator php-nette-reflection php-nette-robot-loader php-nette-safe-stream php-nette-security php-nette-tester php-nette-tokenizer php-nette-utils php-tracy python-pythonz-bd rubygem-ruby-openid
panovotn mongodb
pcahyna git
pcpa gfan
petersen ghc
piotrp python-whitenoise
pjp python-webassets
pkilambi python-pankoclient
pmackinn hive
psss fmf
pstodulk git
puiterwijk elixir
pviktori python2-django1.11
radez python-pexpect
rathann lazygal libmp4v2
remi php-horde-Horde-Imap-Client php-horde-Horde-JavascriptMinify php-horde-Horde-Util php-phpspec
rkuska pyp2rpm python-click
rmattes python-vcstools
s4504kr elixir
sagitter paraview
salimma ipython python-django
sergiomb libmp4v2
sgallagh python-django
sheltren bash-completion
siwinski php-phpspec
skisela git
spredzy rubygem-http_parser.rb
sseago rubygem-gettext
stevetraylen python-setproctitle
strobert mongodb
sundaram python-webassets
svashisht bash-completion
tartina ardour5
tc01 python-mtg python-seesaw
tdawson mongodb passenger rubygem-nokogiri
tdecacqu python-sphinx-autodoc-typehints
terjeros python-acoustid
thias libmp4v2
thm python-execnet
tmz git
tnorth python-tables
tomh python-mapnik
tomspur ipython python-pexpect
tremble gfan rubygem-nokogiri
vkrizan python-djangoql
vondruch rubygem-org-ruby
zbyszek jblas python-pypandoc python-tables

Comments

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Todd Zullinger at 11/11/2018 - 13:35

Hi Zbigniew,

Zbigniew Jędrzejewski-Szmek wrote:
I'll take care of git soon.

Thanks,

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By =?UTF-8?B?Sm Dq... at 11/11/2018 - 16:33

Hi all,

I don't care as long you leave the C locale on the system.

<a href="http://git.savannah.nongnu.org/cgit/gsequencer.git/tree/ags/lib/ags_regex.c?h=2.1.x#n43" title="http://git.savannah.nongnu.org/cgit/gsequencer.git/tree/ags/lib/ags_regex.c?h=2.1.x#n43">http://git.savannah.nongnu.org/cgit/gsequencer.git/tree/ags/lib/ags_rege...</a>

FYI: the regexp engine behaves different as providing multi-byte input
as with C.UTF-8. In contrary
the C locale allows you to do match character ranges using 1 single
byte per character.

Bests,
Joël

On Sun, Nov 11, 2018 at 7:26 PM Todd Zullinger < ... at pobox dot com> wrote:

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Rafal Luzynski at 11/12/2018 - 12:41

11.11.2018 21:33 Joël Krähemann < ... at gmail dot com> wrote:
IIUC, the C locale is built in and impossible to remove. When Zbyszek
said "remove anything-not-C.UTF-8" he probably meant removal of
actual national langpacks (including English) rather than removal
of other generic locales like C or POSIX.

I think that people were using en_US.UTF-8 as a way to force use of
UTF-8 which is possible with C.UTF-8 as well.

Regards,

Rafal

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Rafal Luzynski at 11/06/2018 - 17:14

6.11.2018 00:24 Zbigniew Jędrzejewski-Szmek < ... at in dot waw.pl> wrote:
Sorry if it's been discussed already before but one thing makes me wonder.
If glibc requires glibc-langpack and then we create glibc-minimal-langpack
which is empty and its only purpose is to provide glibc-langpack and thus
satisfy the dependency, then maybe we should just drop glibc-langpack
dependency from glibc and it would solve the problem? glibc-all-langpacks
could be removed rather than replaced with glibc-minimal-langpack.
The existence of glibc-minimal-langpack proves that glibc is able to work
without any external locale data.

Otherwise your change looks correct to me (although I am aware of the
objections expressed in this thread).

Regards,

Rafal

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Zbigniew =?utf-... at 11/06/2018 - 18:15

On Tue, Nov 06, 2018 at 10:14:58PM +0100, Rafal Luzynski wrote:
Things are the way they are so that without the additional step of
specifying glibc-minimal-langpack, one get's all the locales by
default. This design was chosen for maximum backwards compatibility when
the langpack split was being made.

Installing no locales by default would probably be the default if we
were starting from scratch today, but when the split was made, a
different choice was made. I don't see enough benefit to revisit this.

Zbyszek

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Rafal Luzynski at 11/07/2018 - 18:05

6.11.2018 23:15 Zbigniew Jędrzejewski-Szmek < ... at in dot waw.pl> wrote:
This case has been discussed during this year's Flock. Indeed,
glibc-all-langpacks was introduced for backward compatibility when upgrading
the Fedora systems predating langpacks. But it is considered a bug that
glibc-all-langpacks is installed by default. The intention of splitting
langpacks was to have only selected locales installed rather than all.

Sure, this may need a separate discussion.

Regards,

Rafal

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Dennis Gilmore at 11/06/2018 - 11:21

El lun, 05-11-2018 a las 23:24 +0000, Zbigniew Jędrzejewski-Szmek
escribió:
What is the change you are planning to put into mock and koji?

the build group in koji is defined as

<group>
<id>build</id>
<name>build</name>
<description>None</description>
<default>false</default>
<uservisible>true</uservisible>
<packagelist>
<packagereq type="mandatory">bash</packagereq>
<packagereq type="mandatory">bzip2</packagereq>
<packagereq type="mandatory">coreutils</packagereq>
<packagereq type="mandatory">cpio</packagereq>
<packagereq type="mandatory">diffutils</packagereq>
<packagereq type="mandatory">fedora-release</packagereq>
<packagereq type="mandatory">findutils</packagereq>
<packagereq type="mandatory">gawk</packagereq>
<packagereq type="mandatory">grep</packagereq>
<packagereq type="mandatory">gzip</packagereq>
<packagereq type="mandatory">info</packagereq>
<packagereq type="mandatory">make</packagereq>
<packagereq type="mandatory">patch</packagereq>
<packagereq type="mandatory">redhat-rpm-config</packagereq>
<packagereq type="mandatory">rpm-build</packagereq>
<packagereq type="mandatory">sed</packagereq>
<packagereq type="mandatory">shadow-utils</packagereq>
<packagereq type="mandatory">tar</packagereq>
<packagereq type="mandatory">unzip</packagereq>
<packagereq type="mandatory">util-linux</packagereq>
<packagereq type="mandatory">which</packagereq>
<packagereq type="mandatory">xz</packagereq>
</packagelist>
</group>
and in f30 comps the buildsys-build group is
<group>
<id>buildsys-build</id>
<_name>Buildsystem building group</_name>
<_description/>
<default>false</default>
<uservisible>false</uservisible>
<packagelist>
<packagereq type="mandatory">bash</packagereq>
<packagereq type="mandatory">bzip2</packagereq>
<packagereq type="mandatory">coreutils</packagereq>
<packagereq type="mandatory">cpio</packagereq>
<packagereq type="mandatory">diffutils</packagereq>
<packagereq type="mandatory">fedora-release</packagereq>
<packagereq type="mandatory">findutils</packagereq>
<packagereq type="mandatory">gawk</packagereq>
<packagereq type="mandatory">grep</packagereq>
<packagereq type="mandatory">gzip</packagereq>
<packagereq type="mandatory">info</packagereq>
<packagereq type="mandatory">make</packagereq>
<packagereq type="mandatory">patch</packagereq>
<packagereq type="mandatory">redhat-rpm-config</packagereq>
<packagereq type="mandatory">rpm-build</packagereq>
<packagereq type="mandatory">sed</packagereq>
<packagereq type="mandatory">shadow-utils</packagereq>
<packagereq type="mandatory">tar</packagereq>
<packagereq type="mandatory">unzip</packagereq>
<packagereq type="mandatory">util-linux</packagereq>
<packagereq type="mandatory">which</packagereq>
<packagereq type="mandatory">xz</packagereq>
</packagelist>
</group>

These are what mock uses to create the minimal buildroot in both cases,
neither includes anything with glibc, greping through the mock code for
glibc turns up nothing. I mention all of this because glibc-all-
langpacks is pulled into the buildroot entirely by dependencies, the
only change needed is to whatever package is pulling in glibc-all-
langpacks to no longer pull it in.

Dennis

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Zbigniew =?utf-... at 11/21/2018 - 08:07

On Tue, Nov 06, 2018 at 09:21:09AM -0600, Dennis Gilmore wrote:
Hmm, where is this defined?

<a href="https://pagure.io/fedora-comps/pull-request/346" title="https://pagure.io/fedora-comps/pull-request/346">https://pagure.io/fedora-comps/pull-request/346</a>

Zbyszek

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Peter Robinson at 11/21/2018 - 08:13

On Wed, Nov 21, 2018 at 12:09 PM Zbigniew Jędrzejewski-Szmek
< ... at in dot waw.pl> wrote:
In koji:
koji list-groups f30

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Zbigniew =?utf-... at 11/21/2018 - 08:25

On Wed, Nov 21, 2018 at 12:13:41PM +0000, Peter Robinson wrote:
Thanks. <a href="https://pagure.io/releng/issue/7926" title="https://pagure.io/releng/issue/7926">https://pagure.io/releng/issue/7926</a>.

Zbyszek

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Zbigniew =?utf-... at 11/06/2018 - 12:04

On Tue, Nov 06, 2018 at 09:21:09AM -0600, Dennis Gilmore wrote:
In both cases, we know that bash and other utilities will pull in
glibc, which Requires glibc-langpack, and pulls in glibc-all-langpacks
by default. My plan is to add glibc-minimal-langpack to those lists.
It also provides glibc-langpack, and will prevent glibc-all-langpacks
from being pulled in automatically.

Zbyszek

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By David Woodhouse at 11/06/2018 - 10:53

On Mon, 2018-11-05 at 23:24 +0000, Zbigniew Jędrzejewski-Szmek wrote:
The self-tests for OpenConnect explicitly use cs_CZ.ISO8859-2 for
pathological password handling — using a password of "ĂŻ" (U+0102
U+017B) in the local charset and making sure it works correctly.

Do I just need to BuildRequire glibc-langpacks-all manually to make
that work again?

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Zbigniew =?utf-... at 11/06/2018 - 11:05

On Tue, Nov 06, 2018 at 03:53:56PM +0100, David Woodhouse wrote:
That, or glibc-langpack-cs. (cs is 0.5 MB, glibc-all-langpacks is 25 MB).

Zbyszek

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Florian Weimer at 11/06/2018 - 11:02

* David Woodhouse:

Yes, or depend on the cs langpack. Similar for any test that needs a
non-UTF-8 locale.

Thanks,
Florian

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Dominik 'Rathan... at 11/06/2018 - 09:07

On Tuesday, 06 November 2018 at 00:24, Zbigniew Jędrzejewski-Szmek wrote:
Please add BR: glibc-langpack-en instead of changing locale for these.

Regards,
Dominik

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Nico Kadel-Garcia at 11/06/2018 - 10:01

On Tue, Nov 6, 2018 at 8:58 AM Dominik 'Rathann' Mierzejewski
< ... at greysector dot net> wrote:
From pain with other packages that require language settings, such as
Chef, I'd really encourage enabling the existing defaults rather than
trying to modify all the packages. That way lies a lot of work for our
friends doing EPEL backports.

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Zbigniew =?utf-... at 11/06/2018 - 09:13

On Tue, Nov 06, 2018 at 02:07:06PM +0100, Dominik 'Rathann' Mierzejewski wrote:
Zbyszek

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Florian Weimer at 11/06/2018 - 08:13

* Zbigniew Jędrzejewski-Szmek:

I think this is a very bad idea. The C.UTF-8 locale is Fedora-specific.
It is not upstream, and it is known to be broken in many ways. It may
or may not match what other distributions use.

I have argued for some time that the locale must be upstreamed, but
unfortunately, I'm not getting anywhere.

Thanks,
Florian

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Dominik 'Rathan... at 11/06/2018 - 09:05

On Tuesday, 06 November 2018 at 13:13, Florian Weimer wrote:
Can we use glibc-langpack-en instead of glibc-langpack-minimal and keep
en_US.UTF-8 locale as default?

Regards,
Dominik

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Zbigniew =?utf-... at 11/06/2018 - 09:26

On Tue, Nov 06, 2018 at 02:05:06PM +0100, Dominik 'Rathann' Mierzejewski wrote:
glibc-langpack-en is 6MB, glibc-langpack-minimal is ~0. Pretty much
all packages don't need an actual language.

Zbyszek

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By =?UTF-8?B?TWlyb... at 11/06/2018 - 06:00

On 06. 11. 18 0:24, Zbigniew Jędrzejewski-Szmek wrote:

Note for Python package owners:

Since Python 3.7 upstream (and 3.6 in Fedora 26+ [0])
the locale is automatically coerced from C to C.utf-8 [1].

If you set LANG=en_US.utf-8 (or similar) for your Python 3
tests/docs/..., consider trying to remove the statement entirely before
converting it to LANG=C.utf-8. The same applies if you already use
LANG=C.utf-8.

Zbyszek: I'm going to do this in ipython and python-django.

[0] <a href="https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale" title="https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale">https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale</a>
[1] <a href="https://www.python.org/dev/peps/pep-0538/" title="https://www.python.org/dev/peps/pep-0538/">https://www.python.org/dev/peps/pep-0538/</a>

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Kevin Kofler at 11/05/2018 - 21:05

Zbigniew Jędrzejewski-Szmek wrote:
But there are probably many more packages where the setting is hidden in
upstream build scripts.

Older Fedoras only since F22 updates / F24 GA, see:
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=902094" title="https://bugzilla.redhat.com/show_bug.cgi?id=902094">https://bugzilla.redhat.com/show_bug.cgi?id=902094</a>

And what about EL?

Kevin Kofler

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Panu Matilainen at 11/06/2018 - 06:10

On 11/06/2018 03:05 AM, Kevin Kofler wrote:
Build- and various other scripts.

Is C.UTF-8 glibc upstream now, or is it still Fedora-specific?

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Zbigniew =?utf-... at 11/06/2018 - 06:15

On Tue, Nov 06, 2018 at 12:10:04PM +0200, Panu Matilainen wrote:
It was never Fedora-specific. The original justification in 2013 or so
was "other distros already do it". It's just glibc upstream that doesn't
have it.

We still carry
<a href="https://src.fedoraproject.org/rpms/glibc/blob/master/f/glibc-c-utf8-locale.patch" title="https://src.fedoraproject.org/rpms/glibc/blob/master/f/glibc-c-utf8-locale.patch">https://src.fedoraproject.org/rpms/glibc/blob/master/f/glibc-c-utf8-loca...</a>,
so it seems this hasn't been upstream.

Zbyszek

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Panu Matilainen at 11/06/2018 - 06:49

On 11/06/2018 12:15 PM, Zbigniew Jędrzejewski-Szmek wrote:
Ugh, this is a rather cumbersome situation for other projects:
supporting and using C.UTF-8 isn't going to happen large scale until
it's upstreamed. And it does make one wonder what exactly is preventing
it from being upstreamed in glibc.

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Mike FABIAN at 11/06/2018 - 08:13

Panu Matilainen < ... at redhat dot com> さんはかきました:

The current C.UTF-8 locale doesn’t sort correctly. It should sort
according to code point order, but it does that only partly. It is sort
of a quick hack. The glibc developers are working on a better solution
but this takes more time.

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Panu Matilainen at 11/06/2018 - 10:34

On 11/06/2018 02:13 PM, Mike FABIAN wrote:
Hmm. Not sorting correctly doesn't sound so good when LANG=C (and now
C.UTF-8) is quite commonly used exactly for that purpose.

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Florian Weimer at 11/06/2018 - 10:59

* Panu Matilainen:

Not all looks fixable to me in the current setting. We expose the table
layout via nl_langinfo, so that's part of the ABI, and the tables just
cannot express the sorting order with less than three to four bytes per
codepoint. That's a lot of data even if we restrict ourselves to the
modern UTF-8 range (those codepoints addressable using UTF-16 surrogate
pairs).

I think we could generate the tables on the fly if they are ever
requested using nl_langinfo. Not many applications seem to do that.
Internally within glibc, we could use a different interface to avoid the
table generation.

The table layout also has significant problems with expressing proper
collation tables. We need to investigate this more deeply, but my
impression is that the collation and collation sequence tables
constitute a significant fraction of the locale data on disk. Changing
the table layout again has ABI implications there, similar to those for
C.UTF-8, except that the on-the-fly conversation code will be more
difficult to write.

Thanks,
Florian

Re: mass-removal of LANG=anything-not-C.UTF-8 in packages

By Zbigniew =?utf-... at 11/06/2018 - 04:04

On Tue, Nov 06, 2018 at 02:05:27AM +0100, Kevin Kofler wrote:
That is possible, but I don't think it'll be that widespread. Gnarly
upstream build scripts tend to be old, and not all systems always had
en_US.UTF-8, so those script should do some autodetection of the available
encodings. Anyway, we'll see.

Neither version of EPEL seems to support C.UTF-8. So if somebody wants
to support F30+ and EPEL (or F21-) from the same branch, they should
probably add additional BR and use one of the more heavyweight locales.

Zbyszek