DevHeads.net

First draft of a KHumanDateTimeParser class

Hi,

Two or three days ago, I asked on this mailing list if there was any
implementation of a human-entered date-time parser in the KDE libraries
or
elsewhere.

I received very interesting responses, for instance one that confirmed
that there
isn't any parser like that in kdelibs, and another one that pointed me
to the

During the week-end, I played with Date::Manip and thought about a C++
implementation that could be extensible yet simple to implement and to
localize.

After two days of coding, here is my first draft :

<a href="https://github.com/steckdenis/khumandatetime" title="https://github.com/steckdenis/khumandatetime">https://github.com/steckdenis/khumandatetime</a>

It is a simple parser that reads rules from a XML file. The README.md
file in
the repository explains how to write rules, I hope it is not too
difficult, and
that my English doesn't hurt you too much.

In its current state, this parser is able to understand things like
"two days
ago", "in 3 months", "next Monday at 3.00 pm" or even "16 May 2009" (a
bug in
the English rules, that I just see now, makes parsing "16th of May
2009"
impossible, I have forgotten the "of").

One thing it can't properly parse is complex relations like "next
Tuesday".
Currently, the parser adds one week to the current date, and then sets
its day
of week to "Tuesday". It works if the current day of week is already
Tuesday or
a later day, but if we are a Monday, next Tuesday is tomorrow, not in
one week.

The same problem is present for dates like "Last Monday". If the Monday
of the
current week already passed, the parser will erroneously return the
Monday of
last week, not the Monday of this week.

Fixing this problem may really complicate the parser, as it would
require the
parsing rules to have "if" conditions. Another solution may to
hard-code such
logic in C++.

When I was implementing this parser, I realized that every western
language
will have nearly the same rules, and that duplicating them for every
language
will be a waste of time. What about considering this parser an
experiment
and hard-coding the most useful rules in KCalendarSystem, using i18n()
calls
to translate everything ? With a bit of code, it could be possible to
implement any rule, even the "last Monday" ones. I thought of
implementing
them in KCalendarSystem because the rules seem to be more
calendar-system-specific than language-specific.

Happy testing,
Denis Steckelmacher.

Comments

Re: First draft of a KHumanDateTimeParser class

By David Faure at 04/19/2013 - 01:21

On Tuesday 16 April 2013 15:05:51 Denis Steckelmacher wrote:
That's actually an area of disagreement and confusion.
For some people, next tuesday is indeed in one week, for others, next tuesday
is tomorrow.

Duckduckgo'ing (hehe that doesn't flow as well as googling) .... found
something:

I quote:
<a href="http://linguistlist.org/issues/4/4-983.html" title="http://linguistlist.org/issues/4/4-983.html">http://linguistlist.org/issues/4/4-983.html</a>

Note that kmail has a bit of the opposite functionality: in the message list
it shows "Yesterday", "Monday"... Very simple. No "last" / "next" business :)
But well that's easy because it's always only about the past.

IMHO KHumanDateTimeParser should avoid "next tuesday" stuff.

Re: First draft of a KHumanDateTimeParser class

By Albert Astals Cid at 04/22/2013 - 15:27

El Divendres, 19 d'abril de 2013, a les 08:21:05, David Faure va escriure:
Agreed with David, start small, language parsing is a hard topic, and don't do
oversimplifications like "every western language will have nearly the same
rules", you can see like not even English as a consistent set of rules :D

Cheers,
Albert

P.S: If somewhen you want the opinion of lots of different langauge speakers
you can try <a href="mailto:kde-i18n- ... at kde dot org">kde-i18n- ... at kde dot org</a> that is where our translators live.

Cheers,
Albert