DevHeads.net

RFC: i18n: strict translation call-to-catalog mapping

(This is the second and the last thing I want to change in i18n for KF5,
honestly.)

At present (and since forever) it is practically indetermined from which
catalog exactly an i18n() call will fetch the translation. All loaded
catalogs in the process are tried in mostly arbitrary order, depending when
and which library loaded them, and the first one that contains the
translation is used. This leads to situations where a random piece of code
on KDE-Look.org gets for "Sun" the star the translation of "Sun" the short
for Sunday. I think this could pass under the radar so far because KDE was
rather monolithic on the organization level ("go and add context to that
conflicting message"), but it becomes untollerable in the scope of
Frameworks. So the question is how to fix this.

For C++ code, I couldn't think of a better solution than that advised for
plain Gettext, and e.g. as formalized in Glib. It would amount to having:

#define TRANSLATION_CATALOG "foolib"
#include "KLocalizedString"

in a "top" include file of the library. This specializes all i18n() calls in
the including sources to look only in foolib catalog and nowhere else. No
other i18n() call in the process will look in foolib catalog (unless
explicitly instructed to). Under the hood there would be no particular
magic, as i18n() calls are just wrappers for
k18n().subs()...subs().toString(), and toString() already has overload that
takes catalog name. Anyone having a better idea? Maybe something more
C++ish.

The other part are .ui files. Since tr() calls generated by uic are actually
calls directly to KDE's tr2i18n(), set via -tr option to uic, I had this in
mind. uic would be updated to recognize -tr func,catalog form as well,
generating calls func("catalog", ...). KDE4_ADD_UI_FILES CMake macro would
get optional catalog name parameter, and pass it on to uic. So, like for C++
sources, the catalog for all .ui files in the library would be stated in
only one place, in its CMakeLists.txt.

The above was all for library code, and for application code the Gettext way
is that non-catalog-specific i18n() calls, i.e. where there was no
#define TRANSLATION_CATALOG before KLocalizedString inclusion, look into the
"main" catalog only. In KDE context, this is the catalog set with KAboutData
or KLocale::setMainCatalog(). So, for normal applications this change would
be fully transparent.

Comments

Re: RFC: i18n: strict translation call-to-catalog mapping

By Thomas Zander at 04/04/2012 - 14:00

On Tuesday 03 April 2012 20.05.14 Chusslove Illich wrote:
Hmm, this looks bleak, I'm kind of surprised by this, though. Let me explain;

First, any Qt tr call has a context (typically the class name), am I correct
that i18n() still does the same thing?
In that case the context AND the translation string together are searched
through the collection of catalogs.
I'm wondering if you have any numbers on how often conflicting class names (and
thus context names) appear in our frameworks.

Next to that, I'm wondering how many catalogs we expect any normal app to
actually load at runtime. I would think that most processes load 3 at most
currently, maybe double that in frameworks.

Can you explain a bit more how things are untollerable? In my experience the
current system scales rather well, but I could definitely be missing details
here.

Re: RFC: i18n: strict translation call-to-catalog mapping

By Chusslove Illich at 04/04/2012 - 15:11

It doesn't: context is added manually, when judged or reported to be needed.
(And this is how it always was.)

Conflicts happen very rarely in practice. But when they do happen, it goes
like this. After some time, some user will report "wrong translation of X"
in app Y. This will be forwarded to the translator of that language of app
Y, who will point out that in Y PO file everything is fine, translation is
proper. This will puzzle the programmer. Eventually, the problem will reach
someone who knows what's going on. That someone will tell to someone to tell
the programmer to add the context to message in Y PO file. The programmer
will do it, and case closed. Technically, it does scale rather well. But, I
don't know if you share this feeling with me, it's a ridiculous thing to
begin with. Especially since plain Gettext-based code has never had such
conflicts.

There is one more annoying problem with current system. One has to make sure
to call KLocale::insertCatalog() at proper place in the library code, so
that no i18n call within the library can be reached before
KLocale::insertCatalog(). This often results in lack of translation for
certain strings in app Y, and then we have to look for which X put
KLocale::insertCatalog() in wrong place (or put at all) an fix it. This is
also something that was never a problem in plain Gettext-based code.

In principle as many as there are separate pieces of code (libraries,
plugins) within the process, as each will (should!) draw in its own catalog.
For "plain" KDE app, currently that is at least 7: kdecore loads 6, the app
1. On the other end of the range is KDE PIM: KMail itself loads 16 catalogs,
atop what comes from kdepimlibs and kdelibs.

Re: RFC: i18n: strict translation call-to-catalog mapping

By Thomas Zander at 04/04/2012 - 15:25

On Wednesday 04 April 2012 21.11.22 Chusslove Illich wrote:
Just to be clear; I am talking about the 'context' as used in the first
argument to QCoreApplication::translate() method.

Having the frameworks-libs always specify that (you can probably write a unit
test that fails if this is not done) sounds like the best solution to me.

Then any app string will never resolve to a string in any framework.

Re: RFC: i18n: strict translation call-to-catalog mapping

By Oswald Buddenhagen at 04/04/2012 - 16:40

On Wed, Apr 04, 2012 at 09:25:10PM +0200, Thomas Zander wrote:
now, i'm not sure i would solve it exactly this way - the extra argument
seems wasteful (just like in all the inlined tr() calls). i'd probably
let the build system generate non-inline per module i18n functions and
alias the generic one via #define to it. that would also leave some
room for a heavier implementation of the translation function, e.g.,
automatically instantiating a QTranslator() (if gettext was not used).

fwiw, irrespective of the missing hierarchy, the real problem is of
course that all these short strings are not properly annotated. scripty
should make reports and bug the respective maintainers when strings do
not comply with some minimum disambiguation criteria (these could be
statistically determined).

Re: RFC: i18n: strict translation call-to-catalog mapping

By Thomas Zander at 04/06/2012 - 04:27

On Wednesday 04 April 2012 22.40.22 Oswald Buddenhagen wrote:
Good.

Ok, that sounds sane, indeed.

I like that idea in general, not sure how to implement it properly, though.
Simplest idea is to make cmake generate a "klocale-{module}.h" file and make
everyone include that. But that sounds like a lot of work.

Most solutions have the problem that the string extraction will become very
difficult since the context (the {module} here) would depend in gcc preprocessor
or include path setups.

Would be nice to have a solid suggestion for Chusslove :)

Re: RFC: i18n: strict translation call-to-catalog mapping

By Chusslove Illich at 04/06/2012 - 04:56

Actually I have nothing against providing any kind of special support on
CMake side, for whatever reason. For example, code size aside, it would be
really nice to be able to write catalog name in exactly one place and have
it applied wherever necessary (i.e. like in Autotools).

But this must be technically optional. The library interface must provide a
build-system agnostic way to map i18n calls to specific catalogs.

Re: RFC: i18n: strict translation call-to-catalog mapping

By Thomas Zander at 04/06/2012 - 05:46

Quoting Chusslove Illich <caslav. ... at gmx dot net>:
Good, I think both ossi and I feel thats the way to go.

Agreed.

In my exact example-solution the user either includes klocale.h for
traditional or klocale-plasma.h for his library (plasma in my
example). Can't get more optional than that ;)
My worry with it is that its confusing to have to include a different
header in different modules.

It does have the advantage of being really easy to auto test and stay
consistent.

Re: RFC: i18n: strict translation call-to-catalog mapping

By Chusslove Illich at 04/05/2012 - 06:14

In practice, this would be sufficient if people were very careful (see next
point). But, there is the more basic issue of one piece of code polluting
the "translation namespace" of another. Qualitatively it can be compared to
not having a namespace mechanism in a programming language, or even not
having lexical scoping -- people can quite manage without them, but it's
annoying when you know there exists better. And that better in this case is
just plain Gettext, which does not have the issue. So I personally have hard
time telling someone "oh, that's the library X translation popping up in
your application Y, add context yourself or tell maintainer of X to add
context" with a straight face.

Krazy has been doing this for several years now. It reports missing contexts
to a number of manually indicated problematic strings, as well as when the
message is a single adjective (for several hundred most frequent
adjectives), which should absolutely never be without context. But to no
great avail. So by now programmer behavior has been sufficiently established
in this respect, a fact to count with.

Argh, one extra argument wasteful, compared to everything else going on
under the hood :) Better to compare it with the gain of no longer searching
for translation through average n/2 loaded catalogs...

Re: RFC: i18n: strict translation call-to-catalog mapping

By Oswald Buddenhagen at 04/05/2012 - 16:34

On Thu, Apr 05, 2012 at 12:14:35PM +0200, Chusslove Illich wrote:

Re: RFC: i18n: strict translation call-to-catalog mapping

By Chusslove Illich at 04/06/2012 - 04:45

You have got to be kidding me. Gettext designers apparently haven't thought
about doing this back in 1995, when 16 MiB of RAM was enormous, 1 GiB disks
were huge, and Pentium FPU was the favorite laughingstock (last bit for
dramatization only).

Note also that while basic Gettext workflow is tightly integrated into
Autotools (single line in Makefile.am generating make targets down to
extraction and merging of catalogs), nowhere was this kind of optimization
suggested.

Re: RFC: i18n: strict translation call-to-catalog mapping

By Oswald Buddenhagen at 04/06/2012 - 05:11

On Fri, Apr 06, 2012 at 10:45:30AM +0200, Chusslove Illich wrote:

Re: RFC: i18n: strict translation call-to-catalog mapping

By Albert Astals Cid at 04/03/2012 - 19:19

El Dimarts, 3 d'abril de 2012, a les 20:05:14, Chusslove Illich va escriure:
This means you can't include KLocalizedString in public headers, no? Isn't it
a bit too much strict?

Albert

Re: RFC: i18n: strict translation call-to-catalog mapping

By Chusslove Illich at 04/04/2012 - 14:38

I thought forward declarations would suffice, assuming KLocalizedString to
be used as function parameters at places. But now when I examined some
amount of public headers, it is also used as empty default value and even
i18n'd default value for such parameters. While these could be thrown out,
if a bit uglish (passing pointer default to null instead of reference), I've
also found one use of i18n call inside a template. I completely forgot about
templates.

So, how about this. With strict mapping, there would also be i18nd*() series
of calls which take as first argument the catalog name (like d*gettext()
calls); I didn't mention them earlier because they weren't important for the
proposal. But they would now be implicitly used as follows.
klocalizedstring.h would look like this:

#ifndef KLOCALIZEDSTRING_H
#define KLOCALIZEDSTRING_H

// All normal stuff.

#endif
// end of include guard

#ifdef TRANSLATION_CATALOG
#define i18n(...) i18nd(TRANSLATION_CATALOG, __VA_ARGS__)
#define i18nc(...) i18ndc(TRANSLATION_CATALOG, __VA_ARGS__)
// etc.
#else
#undef i18n
#undef i18nc
// etc.
#endif
// end of file

To point i18n calls to a specific catalog, the inclusion would be like
before:

#define TRANSLATION_CATALOG "foolib"
#include "KLocalizedString"

but if this was a public header, to unpollute the client code at the end of
file inclusion would be repeated like this:

#undef TRANSLATION_CATALOG
#include "KLocalizedString"