DevHeads.net

Application usage statistics and targeted user surveys

Hi,

we have talked about the above topics a couple of times in the past, from what
I remember usually agreeing it would be nice to have some more statistical
information about our users, so we know what our applications are used for,
and to measure impact of changes. Similarly, it would be nice to be able to
actually ask our users questions without statistical bias by using out-of-band
channels like blogs or social media. This can be obviously addressed by adding
this into application code, but that raises some valid privacy concerns.

Wanting this for GammaRay I attempted to implement a generic framework for
this, with the goal to make this fully transparent, and give the user full
control over what data is shared, and how often they want to participate in
surveys, ie. make this solid enough on the privacy side that even I would
enable it myself. You'll find the code in Git (kde:kuserfeedback).

Feature-wise it so far contains:
- a set of built-in data sources (app version, Qt version, platform,
application usage time, screen setup, etc) that applications can choose to
enable
- generic data sources for tracking the time ratio a Q_PROPERTY has a specific
value, allowing to track e.g. which application view is used how much
- the ability to add custom/application-specific data sources
- reference widgets for customizing what data you want to share, and showing
exactly what that means, in human readable translated text and if you insists
also all the way down to the raw JSON sent to the server.
- survey targeting using simple C++/JS-like expressions that can access all
the data sources (ie. you can target e.g. only users with high DPI multi-
screen setups)
- configurable encouragement of users to contribute (ie. after X starts and/or
Y hours of usage, repeated after Z months, suggest the user to participate if
they aren't already doing so).
- a management and analytic tool that allows you to manage products and survey
campaigns, and view recorded data using configurable aggregations
- the entire thing works without unique user ids. Fingerprinting can still be
an issue on too small user sets and/or when using too much detail in the data.
- by default all of this is opt-in of course, although technically the API
doesn't prevent applications to change this
- it can deal with multiple products, each product can have different data
sources and survey campaigns

Technically, this consists of the following parts:
- a library that goes into the target application, backward compatible all the
way to Qt4.8/MSVC2010 (needed for my GammaRay use-case), depending only on
QtGui
- a library with the reference widgets, also with extended backward
compatibility
- the server, written in PHP5 and supporting sqlite/mysql/postgresql. Not the
most fun technology, but that stuff is available almost anywhere, and easy to
deploy and maintain
- the management tool, recent Qt5/recent C++, using QtCharts for the data
analysis
- a command line tool for data import/export, useful for eg. automated backups

All of this is LGPLv2+ licensed.

Feedback obviously very welcome, in particular around privacy concerns, or
reasons that would make you enable/disable such a feature.

Regards,
Volker

Comments

Re: Application usage statistics and targeted user surveys

By Aleix Pol at 05/23/2017 - 12:31

On Sun, Apr 23, 2017 at 12:52 PM, Volker Krause < ... at kde dot org> wrote:
Hi volker,
I've been looking into how it works, I wanted to test the tests/orwell
application but I keep getting this error:
./bin/orwell: symbol lookup error: ./bin/orwell: undefined symbol:
_ZN12UserFeedback18CompilerInfoSourceC1Ev

I'm seeing a similar error when running autotests:
********* Start testing of DataSourceTest *********
Config: Using QtTest library 5.9.0, Qt 5.9.0
(x86_64-little_endian-lp64 shared (dynamic) release build; by GCC
6.3.1 20170306)
PASS : DataSourceTest::initTestCase()
PASS : DataSourceTest::testPlatformInfoSource()
PASS : DataSourceTest::testScreenInfoSource()
PASS : DataSourceTest::testPropertyRatioSource()
PASS : DataSourceTest::testMultiPropertyRatioSource()
PASS : DataSourceTest::testApplicationVersionSource()
PASS : DataSourceTest::testQtVersionSource()
PASS : DataSourceTest::testStartCountSource()
PASS : DataSourceTest::testUsageTimeSource()
PASS : DataSourceTest::testCpuInfoSource()
PASS : DataSourceTest::testLocaleInfoSource()
./bin/datasourcetest: symbol lookup error: ./bin/datasourcetest:
undefined symbol: _ZN12UserFeedback16OpenGLInfoSourceC1Ev

I would have looked into fixing it, but I'm not sure I understand why
there's all the RPATH logic in place, so I'd prefer to hear from you
first.

A good next step would also be to get it running on build.kde.org, so
we can identify these issues.

Aleix

Re: Application usage statistics and targeted user surveys

By Volker Krause at 05/25/2017 - 06:33

On Tuesday, 23 May 2017 18:31:35 CEST Aleix Pol wrote:
I have removed the remains of the pre-ECM rpath handling. This also changed
binary output directories on Linux to follow KDE standards, so you might want
to do a clean build to avoid issues with outdated binaries in the build dir.

Indeed, I've requested CI coverage now.

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Volker Krause at 06/03/2017 - 05:49

On Thursday, 25 May 2017 12:33:49 CEST Volker Krause wrote:
Looks like in order to get CI coverage we need to move to kdereview (which is
fine, I think it's ready for that), but that requires to know where this
should end up afterwards.

I guess the candidates are extragear/libs or frameworks? frameworks seems
conceptually like the right place, but putting something there that is still
fairly new and has seen little field testing seems sub-optimal. Opinions?

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Albert Astals Cid at 06/07/2017 - 18:27

El dissabte, 3 de juny de 2017, a les 11:49:00 CEST, Volker Krause va
escriure:
To me it seems a few releases from extragear would make sense before moving to
frameworks and promise full API/ABI compatibility.

OTOH when moving from extreagear to frameworks we may have to rename library
(to have the KF5 suffix) which would break also API (at least at the cmake
level).

How do people feel having libs in extreager having the KF5 "cmake naming" in
the understanding that we're stabilizing them to be part of frameworks
eventually?

Cheers,
Albert

Re: Application usage statistics and targeted user surveys

By Aleix Pol at 06/07/2017 - 19:36

On Thu, Jun 8, 2017 at 12:27 AM, Albert Astals Cid < ... at kde dot org> wrote:
IMO it's a bit weird and unsettling. But then, we're already doing it
for many pim libraries (not in extragear but in applications, still
not part of KF5).

Aleix

Re: Application usage statistics and targeted user surveys

By Volker Krause at 06/10/2017 - 04:11

On Thursday, 8 June 2017 01:36:44 CEST Aleix Pol wrote:
Sounds sensible to me, let's do that.

But isn't the rename the least of our problems if we start in extragear/libs
exactly to be able to still do ABI, API and behavior changes until we are
happy with things for frameworks?

I'll try to get all currently pending API changes in ASAP, and then get it
moved to kdereview within this month.

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Albert Astals Cid at 06/12/2017 - 18:32

El dissabte, 10 de juny de 2017, a les 10:11:44 CEST, Volker Krause va
escriure:
Yes/No, at some point we'll reach some code that we like and we'll say "ok
let's move it to frameworks", apps that had used that last code will still get
an extra ABI break because of the name change.

But oh well, i guess it's ok.

Cheers,
Albert

Re: Application usage statistics and targeted user surveys

By David Faure at 06/12/2017 - 03:13

On samedi 10 juin 2017 10:11:44 CEST Volker Krause wrote:
And that's a bug, and certainly not an example to follow.

When people look at an installed linux distro, all they see is installed libs,
they don't see whether we call it "extragear" or "frameworks". So anything
called somethingKF5something looks like it's part of KF5, with API/ABI
promises.

Yes, exactly. Do it in extragear but *without* KF5 suffix.

Re: Application usage statistics and targeted user surveys

By Aleix Pol at 06/05/2017 - 19:18

On Sat, Jun 3, 2017 at 11:49 AM, Volker Krause < ... at kde dot org> wrote:
+1
Since we're still introducing it to projects maybe it could make sense
to have a couple of releases with it in extragear so people will be
less angry if we needed to break ABI (it's what we did for kirigami,
and I'd say it's worked reasonably well).
At Akademy we can discuss to get it in frameworks if it works for everyone?

Aleix

Re: Application usage statistics and targeted user surveys

By Aleix Pol at 05/24/2017 - 11:38

On Tue, May 23, 2017 at 6:31 PM, Aleix Pol < ... at kde dot org> wrote:
Hey Volker, I figured out this one. Never mind.

I've done a proof of concept integrating it in Discover, here's 2 patches:
<a href="https://phabricator.kde.org/D5960" title="https://phabricator.kde.org/D5960">https://phabricator.kde.org/D5960</a>
<a href="https://phabricator.kde.org/D5961" title="https://phabricator.kde.org/D5961">https://phabricator.kde.org/D5961</a>

Now to proceed I'd like to give a try to whole system including the
server. Do you have documented how to set it up anywhere? Would make
it easier.

Thanks!
Aleix

Re: Application usage statistics and targeted user surveys

By Volker Krause at 05/25/2017 - 10:42

On Wednesday, 24 May 2017 17:38:22 CEST Aleix Pol wrote:
There's still two aspects missing in the integration:
- configure Provider to actually submit (see productIdentifier, feedbackServer
and submissionInterval properties)
- probably add some data sources (in the current form you only get an
indication on how many users you have, and untargeted surveys, nothing more)

The second point will need some more QML wrapper API. I'll look into adding a
QML plugin to KUserFeedback directly for this.

INSTALL contains the deployment documentation, both for the full setup with
authentication on an Apache server, and locally for unsecured testing using
the built-in PHP server.

I've also got a playground server on my own infrastructure now that I can
provide accounts for. And Jan has published his ongoing work on creating a
Docker image for the server here: <a href="https://github.com/KDAB/kuserfeedbackdocker" title="https://github.com/KDAB/kuserfeedbackdocker">https://github.com/KDAB/kuserfeedbackdocker</a>

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Aleix Pol at 06/06/2017 - 09:01

On Thu, May 25, 2017 at 4:42 PM, Volker Krause < ... at kde dot org> wrote:
Hi Volker,
More noob feedback:
I set up a local system I could tinkle with using your colleague's
docker. Worked quite well. But, I was getting an issue, possibly fixed
by this patch:
<a href="https://phabricator.kde.org/D6117" title="https://phabricator.kde.org/D6117">https://phabricator.kde.org/D6117</a>

Now I get to see things being sent on the UserFeedbackConsole
application, but I only see timestamps. I added debug information to
see what is being sent and (after updating the discover patch above)
and I see all sort of data, being delivered. Is it being lost in the
internets? Or am I not looking into it correctly? If I export the
product using UserFeedbackConsole I also only get timestamps :(.

HTH,
Aleix

Re: Application usage statistics and targeted user surveys

By Volker Krause at 06/10/2017 - 04:22

On Tuesday, 6 June 2017 15:01:57 CEST Aleix Pol wrote:
This looks good, I'll try to get that path unit-tested to make sure this works
with sqlite too. However, you should not actually hit this path in the first
place, which is probably also why you are not seeing any data.

Do you see the empty columns for the other data in UserFeedbackConsole, or do
you only see the Timestamp column? In the former case the data is either not
transmitted, or rejected by the server for some reason, we'd need to look at
the JSON payload sent to the server in that case. In the other case, you
probably need to set up a product schema first with UserFeedbackConsole
(easiest via Schema -> Source Templates in the Schema view).

I'll also try your Discover patches here to see if I can reproduce this, the
QML bindings haven't gotten any real use yet, quite possible some stuff
doesn't work there correctly yet.

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Aleix Pol at 06/11/2017 - 19:56

On Sat, Jun 10, 2017 at 10:22 AM, Volker Krause < ... at kde dot org> wrote:
Here's what I'm seeing: <a href="https://imgur.com/a/BmH2B" title="https://imgur.com/a/BmH2B">https://imgur.com/a/BmH2B</a>
I've seen the schema view, I haven't pushed it much. I see I can add
stuff but I'm not sure what it's for. I expected the system to
integrate all data offered, but maybe I need to set the expectations
on the server side?
Either way, this is the information being sent at the moment (I copied
your orwell.qml example sources so far).
{
"applicationVersion": { "value": "5.10.90" },
"compiler": { "type": "Clang", "version": "4.0" },
"platform": { "os": "linux", "version": "arch-unknown" },
"qtVersion": { "value": "5.9.0" },
"startCount": { "value": 76 },
"usageTime": { "value": 34132 }
}

I'll update the patch to adapt to changes in kuserfeedback.

Aleix

Re: Application usage statistics and targeted user surveys

By Volker Krause at 06/13/2017 - 12:56

On Monday, 12 June 2017 01:56:21 CEST Aleix Pol wrote:
Yes, exactly, that's what the schema does. Easiest way to get started is
probably to just import the orwell example schema there, that contains all
existing data sources, or you just create sources from their corresponding
templates.

That is, in the schema view chose schema -> import schema... or schema ->
source template > .... When you are done, select schema -> save schema to
write the changes to the server. Afterwards you should see a lot more columns
and more charts in the analytics view.

User documentation is still fairly limited on this, but there is a start of a
user manual describing the data model, that should help to explain most of the
options you have in the schema view.

That looks sane and shouldn't be the problem indeed.

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Aleix Pol at 06/15/2017 - 09:42

On Tue, Jun 13, 2017 at 6:56 PM, Volker Krause < ... at kde dot org> wrote:
Interesting, yes, adding the schemas in
kuserfeedback/src/console/schematemplates starts to gather more stuff.
I'm thinking that maybe it would make sense to add some API to export
the application's schema?

Now it also would make sense to have some feedback on how this would
be implemented in KDE, on the server side. Sysadmins, have you looked
into it?
Being able to allow maintainers to manage these schemas would be ideal.

Aleix

Aleix

Re: Application usage statistics and targeted user surveys

By Ben Cooksley at 06/15/2017 - 16:22

On Fri, Jun 16, 2017 at 1:42 AM, Aleix Pol < ... at kde dot org> wrote:
Hi Aleix,

At this time I haven't had the time to look into it i'm afraid.
Getting the new CI system launched has been occupying most of my time
as of late.

Cheers,
Ben

Re: Application usage statistics and targeted user surveys

By Aleix Pol at 06/15/2017 - 19:23

On Thu, Jun 15, 2017 at 10:22 PM, Ben Cooksley < ... at kde dot org> wrote:
No worries. In fact, delegate! ;)

Aleix

Re: Application usage statistics and targeted user surveys

By Albert Astals Cid at 05/01/2017 - 18:07

El diumenge, 23 d’abril de 2017, a les 12:52:57 CEST, Volker Krause va
escriure:
Why the weird values in StatisticsCollectionMode ?

Should submissionInterval and encouragementInterval also be a property in
Provider?

Also would be nice to specify the default values for submissionInterval,
encouragementInterval, surveyInterval

Do I gather correctly thta as an app developer the only things I'm actually
interested in are Provider and FeedbackConfigWidget/Dialog? Would be nice to
have some docu saying so

Haven't read much of the code yet, so I'll ask some stuff.

Is there a way for the user to see (locally) the data he has sent to the
servers?

Is there a way for the user to remove the data he has sent to the servers?
Guess not since otherwise we would be able to do a 1:1 mapping

Do we have some way in the server to protect us from people trying to inject
"fake/wrong" data?

I see you protected the data on the server with a user/password.

If the data is really anonymous do we really need user/password ?

And if we actually do need need user/password is there a way to restrict which
data can a user see (i.e. configure that I can see Okular's data but not
Krita's?).

Thanks for working on this :)

Cheers,
Albert

Re: Application usage statistics and targeted user surveys

By Volker Krause at 05/02/2017 - 13:58

Thanks for the review!

On Tuesday, 2 May 2017 00:07:43 CEST Albert Astals Cid wrote:
Extensibility, so we can add more modes later if needed, while still keeping
the order based on how much data is submitted.

I only added properties needed for a QML configuration user interface so far,
but if someone wants to do the entire setup in QML it probably makes sense to
expose the entire API indeed.

(What data you want to share (statisticsCollectionMode) and how often you want
to be bothered by surveys (surveyInterval) are the only two values meant for
user configuration, the rest is supposed to be configured by the application
developer.)

done

Those are the main integration points, yes. You'll also need to add data
sources for Provider to actually report telemetry though, either a built-in
one, or implementing a custom one based on AbstractDataSource.

Added a high-level integration overview to Mainpage.dox.

The default configuration dialog shows you a list of what would be sent at the
time of looking at it, but there is no local logging of the submitted data at
this point.

No. But it's not impossible to achieve I think, without giving up the "no
unique user identification" requirement. The server could generate a unique
random key for each submitted record and send that back to the client. The
client would store these and if desired can request deletion for the
corresponding records.

Both good points, how important do you think they are for acceptance of this?

No. And that could indeed be a problem. We can do some sanity checking, but if
someone insists on vandalizing this you can easily make this entirely useless
by submitting tons of plausible/"valid" data. You can block IP addresses/
ranges on the web server level, but that is rather crude and manual, but
that's as far as my ideas on dealing with this go unfortunately.

It's protecting both read access on the data and write access on product
configuration and survey campaigns, yes. It would probably make sense to
separate those two interfaces, and thus also enabling different access control
for data analysis and product/campaign management.

Good point, I would also argue that for building trust in such a system the
data must be public. However, there are two reasons that still made me protect
it:
(1) if it's world-readable the fact that it is essentially world-writable (see
above problem with submitting wrong data) makes this easily exploitable for
spreading links to illegal content, same as e.g. our pastebin was abused.
(2) we have no operational experience with this and no existing data sets, and
there is the residual risk of fingerprinting if we track too much due to that.

What might work is to make parts of the data that are certainly not
problematic (e.g. just numbers, no free strings) publicly available live, and
have everything else go through human review first.

Assuming this would be connected to identity.kde.org, I think it would be
fine to give all people with commit access read access to the data too, or do
you think we really need to control this per product?

I do see why we might want more control on the product/campaign management
side, so I don't accidentally destroy Okular's data due to not knowing how to
use the tool. It would be much easier if we don't need to restrict this per
product though, but rather just to a group of people who know what they are
doing.

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Albert Astals Cid at 05/11/2017 - 18:05

El dimarts, 2 de maig de 2017, a les 19:58:05 CEST, Volker Krause va escriure:
+1 i think we should start thinking more in "which are the qproperties that
make sense to expose" instead of the "what are the ones that i actually need".

Though i guess adding new qproporties is abi and api compatible it's always
nice if someone that has other needs doesn't need to add the qproperty at a
later stage.

looks good :)

Ok, i guess this would be enough, i mean the user has to trust us anyway,
since even if we showed a log it could be not all data we sent.

Right, sounds doable.

Don't know, as I said, in both cases the user has to trust that what we're
showing is true, since e.g. we could tell them "yes we've deleted the data"
and not really do it.

So maybe it's nice to haves but not really mandatory for a first version?

I have a *very vague memory* of finding how Firefox did this, but can't find
it right now :/

I've just asked on their IRC and will lurk there for a while to see if i get
lucky.

+1, i'd like at least a "read" and an "admin" privilege separation, if i
understand we plan to run this as a "KDE-wide service".

Apply the same solution we made for pastebin? i.e. i think you need an
identity account now?

true, starting "small" may be a better idea.

Probably not?

Makes sense.

Cheers,
Albert

Re: Application usage statistics and targeted user surveys

By Volker Krause at 05/25/2017 - 06:08

On Friday, 12 May 2017 00:05:59 CEST Albert Astals Cid wrote:
Done, and Aleix added properties for SurveyInfo, so we are getting closer to
full QML usability, with Discover being the guinea pig for that.

That's implemented now.

Right, which should now be doable with the option of having read-only access
to the data.

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Albert Astals Cid at 05/01/2017 - 17:49

El diumenge, 23 d’abril de 2017, a les 12:52:57 CEST, Volker Krause va
escriure:
I needed the attached patch for it to build properly on my system.

Cheers,
Albert

Re: Application usage statistics and targeted user surveys

By Aleix Pol at 04/25/2017 - 06:54

On Sun, Apr 23, 2017 at 12:52 PM, Volker Krause < ... at kde dot org> wrote:
Hi Volker,
This is really cool and necessary! I'd really like to see it adopted
in our applications, hoping Krita can be the first of many. ;)

WRT the code, it depends needs Qt4, we can live with it, but isn't it
a bit weird that we have several ECM files forked in it? :/ is it
really that tough of a depenendency? (if so, maybe we should rethink
the whole KF5 approach x'D)

Aleix

Re: Application usage statistics and targeted user surveys

By Volker Krause at 04/25/2017 - 07:57

On Tuesday 25 April 2017 12:54:42 Aleix Pol wrote:
I'll fix that, that was just me being lazy as my Qt4 and Windows development
environments didn't have ECM :)

For GammaRay this isn't an issue anyway, as we ship all required dependencies
(parts of ECM and KF5, and some other stuff), for easier deployment on
embedded targets, and due to the Qt4 backward compatibility requirement. We'd
probably do the same with the application side part of the feedback library.

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Boudewijn Rempt at 04/23/2017 - 08:41

Did I just miss it -- or didn't you tell us where we can find the library?
Curiously enough, there's a gsoc proposal this year for krita that has
pretty much this as its goal...

Re: Application usage statistics and targeted user surveys

By Volker Krause at 04/23/2017 - 12:20

On Sunday, 23 April 2017 14:41:05 CEST Boudewijn Rempt wrote:
Interesting :) I wasn't aware of that, I only knew about the statistics Kexi
collects. Sounds like very similar data as what I'm looking for for GammaRay
too, ie. which features/tools/views are used how often.

The already built-in way of tracking that kind of data is as normalized ratios
(ie. usage time percentage). Absolute counts would also be possible, but are
somewhat more prone to fingerprinting. At the PIM sprint two weeks ago adding
application-side downsampling was proposed as a countermeasure (but has yet to
be implemented).

There are some ideas for making collecting this kind of data easier by adding
code for e.g. counting QAction triggers, or monitoring QItemSelectionModels,
next to the existing property monitoring code. But it probably makes sense to
look at a few potential users first to see how the data is represented there.

Regarding system information (which you basically can get for free once you
have the infrastructure in place anyway), I'd guess details on the OpenGL
stack and the available input devices might be most relevant for Krita, adding
the former is at least already on my todo list.

Regards,
Volker

Re: Application usage statistics and targeted user surveys

By Boudewijn Rempt at 04/23/2017 - 08:44

Nevermind... Found it. My eyes were scanning for a full url :-)