DevHeads.net

Review Request 117789: Optimize word count in PlainTextExtractor.

Review request for kdelibs and Vishesh Handa.

Repository: kfilemetadata

Description
Optimize word count in PlainTextExtractor.

Regular expressions are notoriously slow. Implementing a simple
word-count directly in C++ is much faster, as shown by the benchmark:

Before:
702.0 msecs per iteration (total: 7,020, iterations: 10)
After:
125.5 msecs per iteration (total: 1,256, iterations: 10)

Make the plaintext extractor benchmark more meaningful.

It now operates on a larger file and uses QBENCHMARK to actually get some data.

Diffs
autotests/indexerextractortests.cpp 1cb8e65da7d764eab1923054659ae5841104de2d
src/extractors/plaintextextractor.cpp 536e02d843f24dbbc19035029896b9e696e8b302

Diff: <a href="https://git.reviewboard.kde.org/r/117789/diff/" title="https://git.reviewboard.kde.org/r/117789/diff/">https://git.reviewboard.kde.org/r/117789/diff/</a>

Testing

Thanks,

Milian Wolff

Comments

Re: Review Request 117789: Optimize word count in PlainTextExtra

By Milian Wolff at 05/02/2014 - 05:54

(Updated May 2, 2014, 9:54 a.m.)

Status
This change has been marked as submitted.

Review request for kdelibs and Vishesh Handa.

Repository: kfilemetadata

Description
Optimize word count in PlainTextExtractor.

Regular expressions are notoriously slow. Implementing a simple
word-count directly in C++ is much faster, as shown by the benchmark:

Before:
702.0 msecs per iteration (total: 7,020, iterations: 10)
After:
125.5 msecs per iteration (total: 1,256, iterations: 10)

Make the plaintext extractor benchmark more meaningful.

It now operates on a larger file and uses QBENCHMARK to actually get some data.

Diffs
autotests/indexerextractortests.cpp 1cb8e65da7d764eab1923054659ae5841104de2d
src/extractors/plaintextextractor.cpp 536e02d843f24dbbc19035029896b9e696e8b302

Diff: <a href="https://git.reviewboard.kde.org/r/117789/diff/" title="https://git.reviewboard.kde.org/r/117789/diff/">https://git.reviewboard.kde.org/r/117789/diff/</a>

Testing

Thanks,

Milian Wolff

Re: Review Request 117789: Optimize word count in PlainTextExtra

By Commit Hook at 05/02/2014 - 05:54

This review has been submitted with commit a5b76bbd287d504477a9f27d64747f9bcfe50dbc by Milian Wolff to branch KDE/4.13.

- Commit Hook

On April 26, 2014, 1:15 p.m., Milian Wolff wrote:

Re: Review Request 117789: Optimize word count in PlainTextExtra

By Milian Wolff at 04/30/2014 - 12:00

autotests/indexerextractortests.cpp
<https://git.reviewboard.kde.org/r/117789/#comment39726>

note to self: here and below the indentation is wrong (should be four spaces)

- Milian Wolff

On April 26, 2014, 1:15 p.m., Milian Wolff wrote:

Re: Review Request 117789: Optimize word count in PlainTextExtra

By Mark at 04/29/2014 - 20:08

src/extractors/plaintextextractor.cpp
<https://git.reviewboard.kde.org/r/117789/#comment39696>

Please update this link since it doesn't exist anymore. <a href="http://qt-project.org/doc/qt-5/qregexp.html" title="http://qt-project.org/doc/qt-5/qregexp.html">http://qt-project.org/doc/qt-5/qregexp.html</a> ?

- Mark Gaiser

On April 26, 2014, 1:15 p.m., Milian Wolff wrote:

Re: Review Request 117789: Optimize word count in PlainTextExtra

By Vishesh Handa at 04/29/2014 - 09:33

Ship it!

Thanks! :)

- Vishesh Handa

On April 26, 2014, 1:15 p.m., Milian Wolff wrote: