DevHeads.net

RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

Starting with KDE 4.0, i18n() functions act as XML processors under the
hood, expecting the strings to be well-formed XML and resolving some tags
(KUIT tags) to a target format (HTML or pure text). These KUIT tags include
<filename>, <para>, <emphasis>, etc.

I would like to drop this support in KDE Frameworks 5.0. There would be a
fully automatic conversion script for sources to resolve KUIT tags in
existing i18n() calls into appropriate target formats. The reasoning is as
follows.

Firstly, in the past 4 years, KUIT tags didn't get to be used very much.
Only 0.56% of all messages (1144 out of 200,000) contain any. Only 5 out of
24 KUIT tags were used more than 100 times (<filename> being the most used
with 333 appearances). This means that both original strategic goals were
not accomplished: text elements still have different formatting across most
of KDE applications (such as whether filenames are singly or doubly quoted,
bold, etc.), and translators still have little additional semantic
indication of what text placeholders are substituted with.

Secondly, XML processing in strings was made somewhat lax, as a compromise
between ease of use, mixing with existing markup (Qt rich text), and not
changing programming habits. Most conspicuously, string arguments
substituted for placeholders are not automatically escaped, e.g. < into &lt;,
which causes silent non-well formedness behind the scene. In the other
direction, people also complained about &lt; inexpectedly becoming <, etc.
(i.e. the programmer didn't know about the XML nature of i18n() and doesn't
want this at all).

Based on these two observations, I myself would drop KUIT and that's it. But
there are a few heavy users, and I'd like to know if they would "strongly
object" to this. Among them: KAlarm, Partition Manager, DrKonqi, libkcdraw...

One automatic question could be: can we have KUIT as option, default off? In
KDE 4 this was not even technically possible, due to one ugly design problem
of i18n(), but I plan to deal with this problem in KDE 5; so it should be
technically possible. But, given the usage statistics above, I'm not sure if
it makes sense spending time on this. (There would also have to be some
redesign, making everything stricter, e.g. automatic escaping on
substitution and no mixing with Qt rich text. This means that current KUIT
users who would like to continue to use it, would have to do some manual
checking and modification in existing code.)

Comments

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Stefan Majewsky at 03/27/2012 - 15:57

On Thu, Mar 22, 2012 at 11:25 AM, Chusslove Illich <caslav. ... at gmx dot net> wrote:
I'm missing one point in this statistic: How big would the percentage
be if KUIT was used in every relevant string?

I suspect that most translated strings are static captions on widgets
and in actions. KUIT is irrelevant here because of its very nature.

Greetings
Stefan

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Chusslove Illich at 03/30/2012 - 04:55

That is the main point of uncertainty. In here:

<a href="http://lists.kde.org/?l=kde-core-devel&amp;m=133258732919686&amp;w=2" title="http://lists.kde.org/?l=kde-core-devel&amp;m=133258732919686&amp;w=2">http://lists.kde.org/?l=kde-core-devel&amp;m=133258732919686&amp;w=2</a>

I made the best estimate I could think of so far, putting it at ~14%. But
this was based on the small sample where KUIT seems to be used thoroughly as
of now.

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Allen Winter at 03/23/2012 - 08:49

On Thursday 22 March 2012 6:25:36 AM Chusslove Illich wrote:
Personally, I have put a lot of time and effort into adding KUIT into my projects
over the years and think it is a great help, even if just for the developers to understand
how the strings are being used.

True, the semantic tags are harder to use and understand for me in the more complex cases.
Sometimes I'm afraid to touch since I'm not sure the implictions of my change.

I'm really surprised at this proposal.
I'm not getting what's broken nor what's causing problems.

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Chusslove Illich at 03/23/2012 - 09:17

I hope we had a small misunderstanding here. David's earlier message was
precisely to clear that up.

What I want to remove are only in-text tags (like <filename>, <emphasis>,
etc). In-context markers (like @action:button, @option:check, etc) would
certainly remain. There is no technical reason to remove them, and they are
used much more than tags. E.g. in kdepim and kdepimlibs, 16.7% of all
messages have context markers, whereas 1.7% have text tags (6.4%/0.6% for
whole of "trunk"). In fact, context markers can be used as-is in any i18n
system with Gettext-like lookup key semantics.

Is it sufficiently less bad now, or should I address your other points? :)

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Allen Winter at 03/24/2012 - 08:05

On Friday 23 March 2012 9:17:43 AM Chusslove Illich wrote:
No need to address my other points especially since they are already being discussed.

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By David Jarvie at 03/22/2012 - 13:22

On Thu, March 22, 2012 10:25 am, Chusslove Illich wrote:
I understand from your email that you are only proposing to remove KUIT
semantic tags, not KUIT context markers. Can you confirm this?

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Chusslove Illich at 03/22/2012 - 13:47

I confirm. They are used much more than tags, and have no problems on their
own; they are simply useful whenever present. They would only have no
functional effect any more (this means dropping /format modifier too).

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By David Jarvie at 03/23/2012 - 09:59

On Thu, March 22, 2012 5:47 pm, Chusslove Illich wrote:
The original intention of enabling consistent formatting of displayed text
via semantic tags seems a very desirable one. Removing the tags seems to
imply that KDE would abandon the aim of presenting a consistent interface
for such items. If an inconsistent interface is generally considered
acceptable, then I can live with it. But if we really want to try to make
these interface elements consistent, we shouldn't drop the existing scheme
without first considering what might replace it.

Removing the functional effects which context markers have, including the
/format modifiers, might have a significant effect if this makes
everything plain text rather than rich text, so at first sight I'm not too
keen on this idea.

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Albert Astals Cid at 03/23/2012 - 14:39

El Divendres, 23 de març de 2012, a les 13:59:04, David Jarvie va escriure:
I agree with David here, the fact that people don't use them does not mean we
should aim at using them. And people don't use them because most people
probably doesn't know, this can be attributed to a lot of things, like for
example us not having a proper "style guide" where you would write "Each time
a filename appears in an user visible message write <filename>%1</filename>".

Other reason is developers not caring about consistency much, we could easily
gather some non-hardcore developers to go other the various i18n messages of a
given app and "fix" them.

Cheers,
Albert

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Thomas Zander at 03/23/2012 - 15:12

On Friday 23 March 2012 19.39.26 Albert Astals Cid wrote:
Looking at the numbers I'm not sure your optimism is warrented; this feature
has been around for many years and its documented on techbase yet its being
used in very very low numbers. (333 times in all of KDE for the filename tag..)
Sure, it may be ignorance. Frankly, I didn't know about this feature.
The fact that developers didn't know about this feature is just as much
education as that they never needed it and asked how to do it.

I think its nice to be optimistic and think that we can get people to fix their
UIs and suddenly get people to care.

But can we be certain enough of succeeding now where we clearly failed before
that this is actually worth stopping the innovations that Chusslove is working
on?

Read those numbers again; its kinda depressing really;

Re: Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Alex Fiestas at 03/23/2012 - 19:14

On Friday, March 23, 2012 08:12:53 PM Thomas Zander wrote:
I think that this feature, as Albert said is something that we should promote
and try to get people to use them.

What we can do thuough is break compatibility and implement them in some other
way since their usage is so low.

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Thomas Zander at 03/24/2012 - 03:50

On Saturday 24 March 2012 00.14.03 Alex Fiestas wrote:
The difference here is that there is a way to get consistent look and feel
without using this feature. Whereas with a11y there is not.
Specifically; of the 24 tags how many can you get people to care about the look
and feel sufficiently to make a difference. If history is any guide, just some,
and just a little bit.

This defending of "Dont take my feature away, I promise to use it from not on"
just sounds hollow to me.
In reality it will be really hard to actually show significant improvements in
message display to a user over plain html usage, it certainly is infinitely
harder to learn.

For reference; how many of these are really showing something different on
screen that app-developers care about?
<a href="http://techbase.kde.org/Development/Tutorials/Localization/i18n_Semantics#Semantic_Tags" title="http://techbase.kde.org/Development/Tutorials/Localization/i18n_Semantics#Semantic_Tags">http://techbase.kde.org/Development/Tutorials/Localization/i18n_Semantic...</a>

In short; the deck is stacked against you, and short of proposing to do the
work, I hope you can take the last 4 years as a guide to how big an uptake
things got.
I personally think we should not tell Chusslove to back out of his plan just
because we *hope* some people other than us will start using this feature.

Good point.

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Albert Astals Cid at 03/24/2012 - 11:10

El Dissabte, 24 de març de 2012, a les 08:50:39, Thomas Zander va escriure:
How?

None, because the bunch of geek developers [mostly] don't care about look and
feel, that's why we need to expand how community to people that care about
polish.

That's nonsense, i'm not defending "my feature" since I as a geek don't care
about look and feel and I've never used this feature, but i recognise the fact
that we *should* be caring about it and finding someone in the greater
community to make sure how messaging to the user is consistent.

Albert

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Albert Astals Cid at 03/23/2012 - 15:26

El Divendres, 23 de març de 2012, a les 20:12:53, Thomas Zander va escriure:
That's only because we are geeks and don't care if half the time a filename
appears as '/home/tsdgeos/foo.txt' or "/home/tsdgeos/foo.txt" or
BOLD/home/tsdgeos/foo.txtBOLD or whatever.

In a polished environment this is important.

IMHO this is something similar to i18n, needs someone that goes after people
and nags them to fix it.

I did not understand that it was stopping any innovation, Chusslove can you
clarify if you want to remove them for the sake of simpler code (which I don't
say it's unimportant) or because they create problems with other features you
are developing?

Yes, they are, but to be honest noone pushed for them, what you expected?

Cheers,
Albert

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Chusslove Illich at 03/24/2012 - 06:03

It's not stopping any innovation as such, since I just want to drop it and
add nothing new. But the system cannot remain as it is, because of too many
quirks. To remain, it would have to be fixed, and to be made optional. Both
these aspects are problematic.

"Fixed" would make it require more discipline. For example, one could no
longer do:

QString problem = i18n("Blah blah <emphasis>foom</emphasis>.");
...
QString report = i18n("Blah blah: <note>%1</blah>", problem);

because substitution would cause autoescaping of any target format tags
(e.g. if <emphasis> was turned into <i>), and show them verbatim. Instead,
one would have to do:

KLocalizedString problem = ki18n("Blah blah <emphasis>foom</emphasis>.");
...
QString report = i18n("Blah blah: <note>%1</blah>", problem);

as only KLocalizedString as argument would not be autoescaped (it would be
enforced to be valid wrt. markup).

"Optional" would cause uncertainty. One could not count on KUIT being
available in a particular section of code, but would have to check 1) which
catalog are messages looked up in 2) does that catalog have KUIT enabled
(optionality would be by-catalog). That someone in doubt does not have to be
a human, but also source code/translation validation tool.

These two implications, combined with low usage as it is, makes me conclude
it is not worth investing the work in fixing the system. Higher discipline
and more uncertainty would mean even less people would use it than they do
now.

(The stakes are somewhat different for the more radically new system that I
describe in that proposal for extending Gettext. The higher discipline
requirement would remain, but is (supposed to be) offset by the fact that
you could use the exact same i18n in any programming language and toolkit,
providing availability of bindings, and use arbitrary target visual formats
transparently for translators; i.e. translators would no longer see the
underlying programming framework. The uncertainty aspect would be mostly
removed, because new option to xgettext would be used on extraction, and all
messages in PO file would get appropriate *-format flag, whether they have
any placeholder or not.)

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Albert Astals Cid at 03/24/2012 - 11:16

El Dissabte, 24 de març de 2012, a les 11:03:32, Chusslove Illich va escriure:
Discipline is not a problem, we are used to the compiler complaining when we
use . instead of -> even if it is obviously what we meant. In fact one of the
problems with the current system is that if you do i18n("Foo %1").arg("LALA")
it still works (depending on the type of kdelibs build you have). It should
totally break and then the developer will realize he's doing something wrong.

I agree optional is a bad idea.

That's fine you're the one doing the work and I'm not going to do it nor try
to force you to do it.

OTOH it's another hurdle for adoption of current code from KDE 4 to KF5, that
originally was said to be "transparent" for developers and each day is getting
to look more like a bigger change.

Cheers,
Albert

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Chusslove Illich at 03/30/2012 - 05:38

_From where I stand, these requirements create the "between a rock and a hard
place" situation. If the markup system is fixed with side-effect of more
discipline required, but the optionality is not introduced, that would
suddenly break a lot of code (runtime), and in a way that could neither be
converted automatically nor even reasonably warned about problematic use. So
the least painful option is to just drop it and provide the conversion
script (which should be fully automatic).

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Chusslove Illich at 03/23/2012 - 11:28

Based on the (lack of) usage so far, I would say that inconsistent UI text
markup is considered acceptable. Or at least too small an issue to be worth
bothering with.

It occured to me that I could examine usage-over-time statistics, since KDE
4.0. Here is the percentage of strings in core (SC) modules containing KUIT
markup, in 6-month steps:

2008-01-01 0.28%
2008-06-01 0.32%
2009-01-01 0.36%
2009-06-01 0.41%
2010-01-01 0.42%
2010-06-01 0.41%
2011-01-01 0.49%
2011-06-01 0.49%
2012-01-01 0.60%

While there is some rise in usage, I would consider a 0.32% rise in 4 years
to support the "tolerable inconsistency" conclusion above.

When KUIT tags are removed on conversion target formats would be heeded,
since they are statically resolvable. So one would end up with some strings
converting to plain text, and other Qt rich text. In other words, it would
become as if these visual formats were used carefully and consistently from
the start.

Even if majority of programmers would rather not bother, I agree that it
would be nice to provide for those who would. So, actually, I have
considered a lot what the replacement might be, one which would avoid
technical issues I observed so far, and provide extra flexibility that I've
seen to be needed. I wrote it up in a proposal for Gettext itself, but there
was little enthusiasm. The proposal is here:
<a href="http://nedohodnik.net/gettextbis/" title="http://nedohodnik.net/gettextbis/">http://nedohodnik.net/gettextbis/</a>. Chapter 4 and section 5.1 deal with
markup, and it is easy to extrapolate back to KDE i18n (revert to %1, %2...
placeholders, and consider ggettext() = ki18n() and igettext() = i18n()).

However, I don't propose implementing this now, for two reasons. First is
that it would be some work in absence of significant number of interested
people (which, admittedly, usually does not stop me...), and the second is
that I have a small hope that in the future we could actually push the full
system as proposed :)

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Oswald Buddenhagen at 03/23/2012 - 19:15

On Fri, Mar 23, 2012 at 04:28:52PM +0100, Chusslove Illich wrote:
p.s.: i still have your epic mails in my inbox, and they perfectly serve
the purpose of giving me a bad conscience about still not having
answered them properly. let alone your paper. :}

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Chusslove Illich at 03/24/2012 - 07:06

I recognize that simply taking into account all messages is somewhat
lacking, but it is also not that obvious one should rather look at messages
with placeholders. Here is the comparision between the two:

total KUIT-tagged ratio
all messages 202995 1144 0.56%
placeholder only 15771 557 3.53%

While ratio is much higher on placeholders, half of all used tags are not in
messages with placeholders.

Maybe the best reference of what should be considered "thorough use" would
be to look at one application that uses it thoroughly. I've seen at least
KAlarm to be such, and for it the statistics are:

total KUIT-tagged ratio
all messages 1037 153 14.75%
placeholder only 125 100 80.00%

_From this it would appear that in all KDE current use of KUIT is 3.8% (0.56/
14.75) of thorough use over all messages, and 4.4% (3.53/80.00) over
placeholder only. Which, being roughly equal, indicates that simply taking
all messages is representative enough... But the "thorough use" sample here
is small, granted.

That's exactly what it seems to me too. So, that small hope is simply this:
make a standalone library available, buildable with different "backends"
(QtCore/QtScript, GLib/SpiderMonkey...), several language/framework
bindings, and see what happens. But I don't say I'll do it, still pondering
between opposed ends of "lot of work" and "low probability of acceptance".
(It also needs some support from Gettext, and in this respect it is
troublesome that Gettext maintainer remained silent on the proposal.)

Well... speaking of relative improvement, doing something outside of PO/
Gettext base would be to me a regression that dwarfs any improvement I
wanted to have :)

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By =?UTF-8?Q?Aur=C... at 03/26/2012 - 05:54

On Sat, 24 Mar 2012 12:06:25 +0100, Chusslove Illich wrote:
I would find it even more interesting (but probably more
difficult/fuzzy to
compute) to have the ratio of messages with KUIT markup over messages
with Qt markup or using quotes.

I like the idea of KUIT markup and would be sad to see it go away.

Aurélien

Re: RFC: i18n: drop KUIT tags in KDE Frameworks 5.0?

By Chusslove Illich at 03/26/2012 - 06:32

That excludes the cases which have no delimitation at all but could have
some, which are not at all infrequent. But, sure-why-not: 0.56% with KUIT
vs. 6.47% with Qt or quotes. (For the latter I first checked for presence of
any Qt tag, and if there was none, for quotes; so messages containing both
weren't counted twice.)