DevHeads.net

KIO::ListJob::entries emits twice for folders with 100+ entries. Why?

Hi,

I'm playing with KIO::listDir() and the signals to get the directory
content and i've hit a strange "issue". Apparently when a folder has
more than 100 entries the entries signal [1] will be emitted twice.
1 with 100 entries
1 with all entries

Also, this only seems to happen if i add:
job->addMetaData("details", "0");

See the full code example [2] that also contains the above line.

On IRC kde__pepo suggested that the first emit might be so that the
app using it has at least some results while waiting for the rest.

Whatever the case is, the api documentation [1] is completely silent about this.

Is there an option to disable this behavior? In my case i don't want
nor need the extra signal and i "was" depending on that signal to
properly fill my model. However, since that might be emitted twice it
screwed up my logic and my personal workaround is to rather just
depend on the results signal (which i can't find in the API docs).
Results only emits once and when expected.

Cheers,
Mark

[1] <a href="http://api.kde.org/4.x-api/kdelibs-apidocs/kio/html/classKIO_1_1ListJob.html#a810353a77053b141c30691265f956dff" title="http://api.kde.org/4.x-api/kdelibs-apidocs/kio/html/classKIO_1_1ListJob.html#a810353a77053b141c30691265f956dff">http://api.kde.org/4.x-api/kdelibs-apidocs/kio/html/classKIO_1_1ListJob....</a>
[2] <a href="http://paste.kde.org/606608/" title="http://paste.kde.org/606608/">http://paste.kde.org/606608/</a>

Comments

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Albert Astals Cid at 11/14/2012 - 17:18

El Dimecres, 14 de novembre de 2012, a les 17:32:35, Mark va escriure:
You mean "result" instead of "results", right? I see it documented in the KJob
class, problaby you want the finished signal, (if it is job being done that you
are waiting for)

Cheers,
Albert

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/14/2012 - 19:06

On Wed, Nov 14, 2012 at 10:18 PM, Albert Astals Cid < ... at kde dot org> wrote:
Yes, i meant result.

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By David Faure at 11/15/2012 - 06:02

On Wednesday 14 November 2012 22:18:08 Albert Astals Cid wrote:
All? That would lead to duplication, I strongly doubt that this is the case,
it would confuse KDirLister very much.
Which ioslave is this about? kio_file, apparently?

`kioclient ls /dir` can also be used to test KIO::listDir from the command-
line, although adding details=0 has to be patched in.
Tried that, no bug.

kioclient ls /k/testbug174144 | sort | wc -l
kioclient ls /k/testbug174144 | sort | uniq | wc -l
both show the same result, 502.

Now trying your testcase...

(C++) dumpEntries()
99
(C++) dumpEntries()
401

Total 500. No duplicates.

Yes entries can be emitted multiple times, but not with duplicates.

No no no. In 99.99% of the cases, result() is the signal you want.
I thought I made that clear in the last revision of the api documentation :(

result() -> to trigger application logic that should run when the job is done
finished() -> to hide progress dialogs, or to delete the job by hand

The difference: result() is not emitted by kill(Silently).

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/15/2012 - 08:12

On Thu, Nov 15, 2012 at 11:02 AM, David Faure < ... at kde dot org> wrote:
Hi David,

My bad, the second signal is only emitting the remainder of whatever
comes after the first signal.
However, i still have the question as to why it happens at all?

The two signal stuff for entries only seems to occur when i set:
job->addMetaData("details", "0");

If i just comment that line out i have just 1 signal from entries. Why
is it working like that? I would expect it to behave the same with or
without addMetaData("details", "0") set. I personally would just like
to get rid of the first signal since in my case all entries are feeded
to a model anyway.

Cheers,
Mark

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By David Faure at 11/15/2012 - 13:26

On Thursday 15 November 2012 13:12:27 Mark wrote:
So that dolphin doesn't keep the user waiting in front of a blank view for a
long time when opening /usr/bin. Better emit the entries in multiple batches,
for incremental rendering.

The buffering of entries happens in SlaveBase::listEntry and is based on the
time measurements and estimates and complex logic ;)
Anyway, this shouldn't matter to your program, whether entries is called it
one go or in multiple batches. If it does, your program is broken.

I see several issues with that sentence.
1) Models must be able to react to dynamic changes (e.g. the user creates a
file), so once they do that, they can also react similarly to "here are 10
entries" followed by "here are 10 more entries" (= inserting rows).
2) Why not just use KDirModel? It handles all this for you.

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/15/2012 - 14:37

On Thu, Nov 15, 2012 at 6:26 PM, David Faure < ... at kde dot org> wrote:
The actual listing (listDir) on /usr/bin is done very rapidly!
Actually, all listings are done very rapidly even on remote locations.
Listing isn't the issue. The thing that makes this "seemingly" slow is
all the stuff that happens "after" the listing is in. Like determining
the mime type, and sorting the massive list. That is what makes it
slow.

Also, funny that you mention dolphin. Since dolphin can't possibly
make use of this. Dolphin is sorting the entries thus needs all of
them to be in before it can even sort. And even "if" dolphin receives
two entries signals (which is the itemsAdded signal from KDirModel)
then it certainly won't speed things up, but rather slow things down
because it would have to do sorting twice. Once on a small set of data
(relatively fast) and once on the full set of data (dog slow).

As a last remark regarding your dolphin example. If you open up a
large folder in dolphin you will see that dolphin waits till it has
all the contents of it. Then it starts displaying those. So in other
terms, it doesn't even benefit!
You can try this out yourself if you want. Go to a console and make a
new folder. Then type:
touch {1..100000}.txt

Now open up your dolphin in that folder and be prepared to wait :)

My program is working just fine.

This is the important part. Read again what i said and what your telling me now.
If i ADD addMetaData("details", "0") i get:
1 signal with the first 100 entries
1 signal with the remainder

If i DON'T add that "addMetaData" line and test the code out on a
folder that certainly takes about 10 seconds to get all the entries
then i'm welcomed by:
1 signal with all entries

You see where this is going? Listing the same folder WITH the
"addMetaData" line is going a lot more rapid then WITHOUT the
"addMetaData" line. Something is very wrong in the time calculation.
The results i'm getting should not be possible if the timing was
correct. Specially considering that the call with "addMetaData" takes
noticeable longer to run yet i only observe one entries signal.

I am actually using both. I made a breadcrumb bar which uses listDir
for every "crumb". I don't need a KDirModel there. For the contents i
am using a KDirModel.

Again, please!! Keep this about the entries signal. My app is just
fine and only serves to illustrate a case. It's the entries signal i'm
interested about since it seems to have an issue as i already clearly
indicated in my first post.

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By David Faure at 11/22/2012 - 06:52

On Thursday 15 November 2012 19:37:49 Mark wrote:
I'm pretty sure you can find a remote enough location where listing will come
in slowly. And there you'll be glad that SlaveBase doesn't buffer all the
entries until it receives the last one.

No, the fact that this is done after the initial listing makes it seemingly
fast, on the contrary. You can start seeing stuff appear fast, and then
mimetype determination happens incrementally. Before we had that, things were
pretty slow.

This is wrong, as Kevin Krammer replied.

Not really slower than doing it in one go. Anyway, sorting is just a bit of
CPU, the chunked entries is for the case of a slow remote server, where some
milli-seconds of CPU aren't the issue, but many-seconds latency.

Don't blame KIO if the new views in dolphin misbehave :-)

This used to be working fine with the Model/View based widgets. If something
regressed with the new hand-written views in dolphin, then dolphin should be
fixed. To test what KIO was written for initially, use the file dialog instead.

I'll trust you on this one, but as I said, wrong test case. Find a slow remote
server on the other side of the planet, and -there- create a few files.

I pointed you to the corresponding code already: SlaveBase::listEntry.
Feel free to debug the logic in that code. I'm happy to be told that it's
wrong, it's not even my code :-)

IMHO it shouldn't matter much how listEntry decides to buffer the stuff, but if
you feel that you must fix it (I agree that your findings look a bit odd), then
please debug the logic in the above method.

OK. But even if you change the logic in listEntry, your code *must* be
prepared for entries() to be emitted multiple times, on slow remote servers.

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/22/2012 - 15:11

On Thu, Nov 22, 2012 at 11:52 AM, David Faure < ... at kde dot org> wrote:
Well, just testing out a linux ftp that hosts a bunch of
archlinux/fedora/debian/ubuntu packages will probably do ;)
I stand corrected :)
Well, i don't know if that specific code path is wrong, but something
certainly is wrong.
-_- I wanted to get rid of the extra signal, not debug the method and
have the additional signals working as intended.. Though i do now
understand why the extra signal is emited in the first place which
does make me come back on that point i made earlier about getting rid
of it. If it works properly then i do see the benefit of it now.
That's no issue. I made sure it can handle as much entries signals as
you can throw at it.:)

To clarify: I am debugging SlaveBase::listEntry now in an attempt to
find out where (or even if) the time based logic is wrong. I will
report back in this thread with the results. Yes, trying it out with a
slow remote server as well just to be sure.

On Thu, Nov 22, 2012 at 12:43 PM, Frank Reininghaus
< ... at googlemail dot com> wrote:
In the dolphin case i would first try to find out how many times you
even get the newItems signal. I'm not quite sure how KDirListener
works internally, but i doubt it sets the metadata details to 0 which
is the only case where i am getting two entries results. If it only
gets one entries signal then i guess you also get only one newItems
signal even where you "should" have received two.
No, that's not the case. It's batching. Or rather, it "should" under
some odd conditions that i'm debugging now
1 signal with "some" entries
1 signal with the remainder

It's not sending a signal with every new item in a directory listing.

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/22/2012 - 19:49

I'm hitting a bit of an issue here in debugging SlaveBase::listEntry..
KIO is not running in the same thread as i'm running which makes
debugging tricky.
However, when i use GDB and set any breakpoints in slavebase.cpp in
the SlaveBase::listEntry funtion it does break but in my main
function..

Adding kdebug/qdebug lines also aren't visible. And i added quite a
few just so i won't miss it :p

So, how can i debug this?

What i can do is grab the logic out ant test it in a separate C++ project.

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By David Faure at 11/23/2012 - 05:42

On Friday 23 November 2012 00:49:11 Mark wrote:
Err, KIO doesn't really support threads. You have to run KIO in the main
thread. Well if you connect the job to slots in another thread and never call
any KJob API from that thread, I guess it can work out ok, but only then.

SlaveBase is in the kioslave, not in your application. So you need to kill
kio_file and you need to type "kdeinit4" in a terminal.

Otherwise kio_file gets started via an old kdeinit4 which has the old libkio
loaded (the one without your debug output).

I recommend kDebug over gdb for this, especially for a method that is called
500 times.

Well yeah you already posted a test program iirc.
Otherwise here's mine: <a href="http://www.davidfaure.fr/2012/kio-listdir.zip" title="http://www.davidfaure.fr/2012/kio-listdir.zip">http://www.davidfaure.fr/2012/kio-listdir.zip</a>

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/23/2012 - 11:26

On Fri, Nov 23, 2012 at 10:42 AM, David Faure < ... at kde dot org> wrote:
That really helped! Thanks a lot.

So my findings thus far.
I think there are two wrong areas here.
1. SlaveBase::listEntry is really using wrong names and types to get
something done. For instance, d->totalSize is a KIO::filesize_t type
and is being used to measure a list count. It's in reality no
"filesize" at all! That just adds to the confusion. I'm not sure - yet
- if the function itself is flawed in terms of time diffs.

2. this is where things get really interesting! The actual
file_unix.cpp listdir function seems wrong to me. [1]
The function FileProtocol::listDir is basically having two distinct
directory listings.
1 for the case where details are 0 (if)
1 for the case where the details are not 0 (else)

Both cases use a different "tactic". If the details are 0 then a
folder is read, file by file, and a listEntry(entry, false); is done
for each file. So here you do have possible IO time.
Now if the details are _not_ 0 then the tactic is different. The
folder is read in it's entirety inside a list (line 397 of [1]). Till
that moment it hasn't done any calls to listEntry(...) yet! Rather
iterates through the list again and builds up the UDSEntry stuff on
line 430 (also [1]).

While the step for details = 0 is faster (visually) it does include IO
time and is therefore probably slower in time compared to details > 0.
The details > 0 is only using CPU time and is probably faster thus
that's probably why i'm not seeing a second signal in the "details >
0" case.

But what to do about it?
My preference is restructuring FileProtocol::listDir to one of the two
cases that it's doing now and adjust FileProtocol::createUDSEntry to
handle details = 0 properly.

What's your opinion here?
Note: These issues that i find never stay on the surface do they..
This one is diving very deep in KIO, the shortcuts one was diving very
deep in KDE internals and even Qt..

[1] <a href="http://quickgit.kde.org/?p=kdelibs.git&amp;a=blob&amp;h=2b47b7cb1d9a31871210e31bc916f0ae41893b63&amp;hb=e51062f18c94dda2b96987df6d73eb46ecd007f7&amp;f=kioslave%2Ffile%2Ffile_unix.cpp" title="http://quickgit.kde.org/?p=kdelibs.git&amp;a=blob&amp;h=2b47b7cb1d9a31871210e31bc916f0ae41893b63&amp;hb=e51062f18c94dda2b96987df6d73eb46ecd007f7&amp;f=kioslave%2Ffile%2Ffile_unix.cpp">http://quickgit.kde.org/?p=kdelibs.git&amp;a=blob&amp;h=2b47b7cb1d9a31871210e31b...</a>
from line 320

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By David Faure at 11/23/2012 - 12:33

On Friday 23 November 2012 16:26:59 Mark wrote:
I understand your confusion. This actually documented, though:

/**
* Call this in get and copy, to give the total size
* of the file
* Call in listDir too, when you know the total number of items.
*/
void totalSize( KIO::filesize_t _bytes );

Well, it was 2000 or so, someone thought reusing a method was a good idea :)

Right. And I'm to blame for the details==0 code.

Hmm, now I see the problem already: I forgot to call totalSize() in that code
path. For speed reasons in that code, I didn't want to iterate multiple times,
so I emit the stuff as it comes out of readdir...

Yes (because it wants to stat() each file, to provide full details).

No, that is definitely wrong. They both start with the same I/O (readdir), and
then for details > 0 you get a lot more I/O (stat for each file).

Isn't it rather that the buffering in SlaveBase gets confused because it
doesn't know the total number of items to come (the famous "totalSize" int),
so it emits the items more often than in the details>0 case, where the number
of items is known?

Moving code to createUDSEntry wouldn't give us much, since the whole point is
"no stat if details==0", so basically it would be an if at the beginning of
the method -- no behavior change overall.

If you want to let SlaveBase know about the total number of items to come,
then you need to create a QList of UDSEntry in the details==0 code path, add
the entries to the list, and then emit them. I don't like this, it uses more
CPU and memory, all this just to give a total to SlaveBase.

I would prefer a fix that makes SlaveBase handle better the case where the
total number of items isn't known.

Or I would just leave it as it is now, because honestly I don't see a big
problem with entries() being emitted in two chunks, it's perfectly valid.

Well, this is 2012. The easy bugs were fixed long ago, only the interesting
ones remain :-)

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/23/2012 - 12:59

On Fri, Nov 23, 2012 at 5:33 PM, David Faure < ... at kde dot org> wrote:
Code re-using: oke. But this? Not ok! :p
Ahh, didn't notice the stat call yet.
I don't think listEntry should should have any knowledge about how
many items are going to pass it's function...
Yes indeed. That would be cleaner then it's currently done.
Yeah, i know. By now i fully agree with you there :) More signals for
the win! The thing i can't figure out is why batching isn't used as
that, batching. Right now it waits for a fixed number (100) and then
decides to batch or not. I would just do that based on time alone and
simply send out a batch when the time difference gets too large. In
that case you would get multiple signals. As in more then 2 if needed
:p
interesting as in the ones that takes days to fix, are not trival to
debug and have magic code paths, Yeah, i certainly saw a few of those
;)

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/23/2012 - 17:23

On Fri, Nov 23, 2012 at 5:59 PM, Mark < ... at gmail dot com> wrote:
Fixed :)
Or well, i have a patch for it now.

Here it is: <a href="http://paste.kde.org/613670/" title="http://paste.kde.org/613670/">http://paste.kde.org/613670/</a>

The changes:
- It's actually doing batching now regardless of the details level
- Batching is time based only (300ms). So if the listing takes longer
then 300ms then all entries thus far will be send. That continues till
all entries are send.
- and (while at it) simplified file_unix.cpp's FileProtocol::listDir
function so that it only follows one path for listing folders. It
still has a check for details in it, but this "looks" cleaner to me.
And there is no more putting it in a list and iterating over that one
for the details != 0 case.

I tested this out on a folder with 100.000 files in it and in my case
i do get 2 entries signals now with no detail level provided (thus
quite a few details).
I get only one signal with all the entries in it if i pass details as
0. Now it behaves as i was expecting it in the first place.

Do you guys want me to put this on reviewboard or can i just push it
to 4.10? My patches are never perfect the first time so i guess you
guys will have some comments first :p

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By David Faure at 11/29/2012 - 09:23

On Friday 23 November 2012 22:23:24 Mark wrote:
This calls gettimeofday for every entry, even when this value isn't needed.
Can you move this to the locations where it's actually needed, for performance
reasons?

Even better would be to port this to QDateTime::currentDateTime or
QTime::currentTime, to make this more portable (to Windows, without kdewin32).
Or to use QTime::elapsed, which might be even simpler to use for this purpose.

It continues ... i.e. it waits for another 300ms before sending the second
batch, right? We don't want to send them one by one, that's for sure.

OK so overall this change means there is no need to call totalSize() at the
beginning of a directory listing. This should be reflected in the SlaveBase
documentation.

1) The comment "fast path, mostly for recursive deletion", should stay under
if (details==0).

2) Don't add spaces inside parenthesis (...) where there weren't any before.
Even though old kdelibs code is a bit inconsistent, the goal is to move
towards the kdelibs coding style - like Qt: no spaces in (...).

3) What happened to the code that was doing a chdir back to the orig dir?

Sounds good.
Please also test:
* the unittests in kdelibs/kio/tests (in particular jobtest and
kdirlistertest, but just run them all with "make test")
* a slow remote server

Yep, please put on reviewboard after fixing the above, it will be easier to
review the new code than to have it intermixed with the old lines in that diff.
I might have more comments then.

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/29/2012 - 11:04

On Thu, Nov 29, 2012 at 2:23 PM, David Faure < ... at kde dot org> wrote:
Good one! I will use QTime::elapsed. The reason for moving
gettimeofday to the top is because it was being called "nearly"
always. Unless you had less then 100 entries, then it would only be
called once in the main else statement. However, this change requires
me to know the time difference every time an entry is inserted so i
moved it to the top of the function.
In the most negative situation that "might" happen. So if you are on a
very slow network drive that for whatever reason only lists 1 entry
every 300 ms then you would indeed get 1 signal per entry. Please do
note, this is a corner case and "usually" doesn't happen. But i think
broadcasting the entries signal every 300 ms with whatever data is
there should be how it's done. (the current patch)
Yes.
Will do.
.. i was trying to follow the current style in there. I will just do
it as Qt does it :)
Ehh, i dropped it :p I saw no need for that to change back. The folder
is being changed to the one your listing prior to actually listing the
folder so i didn't see a need to put that back. Do you?
Will do. I will also add the results in the reviewboard entry.
Right, will do. I hope to add the entry to reviewboard today otherwise
it will be there tomorrow. If you could still reply on the questions i
added in this reply?

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By David Faure at 11/29/2012 - 11:34

On Thursday 29 November 2012 16:04:21 Mark wrote:
Ah, I see.
OK (still, with QTime::elapsed this will be a lot more readable and
maintainable anyway).

Ah, yes, sure that's fine.
I meant in the case of a fast listing, your paragraph above made me wonder if
it was going to send everything one by one after the first 300 ms. Now I see
this isn't the case.

You should really read the comments in the current code :-)
Hint, start at "We restore the path later...".

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/29/2012 - 11:45

On Thu, Nov 29, 2012 at 4:34 PM, David Faure < ... at kde dot org> wrote:
Which it is using at this moment on my computer :)
Looks a lot cleaner indeed!

hehehe oke, i removed the comment ;)
Without joking, i added the path restore part back.

But why is it doing:
#if !defined(PATH_MAX) && defined(__GLIBC__)
char *path_buffer = ::get_current_dir_name();
const KDEPrivate::CharArrayDeleter path_buffer_deleter(path_buffer);
#else
char path_buffer[PATH_MAX];
path_buffer[0] = '\0';
(void) getcwd(path_buffer, PATH_MAX - 1);
#endif

for getting the current path? Can't i just do QDir::currentPath() for
the getting part, store it in a variable and put it back at the end?

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Mark at 11/29/2012 - 15:39

On Thu, Nov 29, 2012 at 4:45 PM, Mark < ... at gmail dot com> wrote:
patch: <a href="https://git.reviewboard.kde.org/r/107520/" title="https://git.reviewboard.kde.org/r/107520/">https://git.reviewboard.kde.org/r/107520/</a>

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Frank Reininghaus at 11/22/2012 - 07:43

2012/11/22 David Faure:
[...]
I just looked at the relevant code (not written by myself) to find out
something about this 'misbehaviour'.

The slot in KFileItemModel which is connected to KDirLister's
newItems(KFileItemList) signal collects all new items in a sort of
cache (m_pendingItemsToInsert), rather than actually inserting them
into the model, until either

a) the completed() signal is received, or
b) two seconds have passed.

To be honest, I cannot say at the moment what the reason for this
design decision is. I could imagine that it is to prevent unnecessary
small, but expensive, updates of model+view if newItems() is emitted
many times with just one KFileItem each before the completed() signal
is received. But it seems that this is actually not even possible, if
I understand correctly?

Best regards,
Frank

Re: KIO::ListJob::entries emits twice for folders with 100+ entr

By Kevin Krammer at 11/15/2012 - 14:55

A bit off topic but:
It doesn't need the full set for sorting the first patch and it doesn't have to
sort the full set when the new entries come in.
It could either insert the new items into the sorted first patch or sort the
second patch and then perform the merge step of a merge sort algorithm.

Cheers,
Kevin