DevHeads.net

Module layout proposal: Split kdegames-data from kdegames

[CC kde-scm-interest for notification only]
[CC kde-buildsystem for feedback on the proposed build system changes]
[CC kde-packagers for feedback on the implied changes to package layouts]
[@CC: please keep discussion on k-c-d and k-g-d only]

Moin moin,

==== EXECUTIVE SUMMARY ====

I propose to move the data files from the kdegames module into a new
kdegames-data module to
1. facilitate the move of the remaining source code to Git (while a
method of storing binary data files in Git efficiently is still being
worked on) and
2. enable developers to fetch and compile the kdegames source without
having to download the data files again.

==== DETAILED PROPOSAL ====

kdegames is among the few modules that have not yet switched to Git.
The main concern is that the kdegames source tree contains tons [1] of
binary data files, which Git is known not to handle well. All
discussions about moving to Git (on scm-interest and games-devel) have
just let to bikeshedding about how to handle the binary files with
git. I propose to postpone this specific problem and move on with the
Git transition *now*, especially since the solution I want to propose
has added benefit.

I propose to split a new module kdegames-data from kdegames, meaning that:

1. kdegames-data should be built and installed before kdegames.
2. Any kdegames application will refuse to start when the
corresponding data files have not been installed.

Number 2 is a protective measure because, currently, about all games
won't run correctly or look utterly broken without proper indication
of the problem.

If this proposal is implemented, the kdegames module, that is: the
source code, can move to Git without worrying about the data files.
These can move at any later point in time, when a wise solution has
been worked out by our Git specialists. I'll repeat to get this clear:
I'm not proposing to let the data stay in SVN forever. What I want is
to disentangle the fates of the source code and the data files, for
the benefit of both.

The added benefit of this solution is that distributions will be
encouraged to package data files separately. Because this data (and
esp. its format) changes very seldomly only, developers will not need
to checkout this giant mass of data from our SCM servers, but can
instead use the packages provided by their distribution. The same
holds true for drive-by contributors: They only have to clone a tiny
repo containing the source code of the game which they want to hack
on, without fetching megabytes of data files which they already had
installed.

If this proposal is accepted not later than by the end of next week, I
will be able to implement the following changes before the soft
feature freeze (Thu, October 27):

1. Create the new module in SVN, move data files from the current
kdegames module. The toplevel directory is still divided by
applications, of course.
2. Setup the buildsystem for kdegames-data using
macro_optional_add_subdirectory() as in kdegames, so that data can be
selected for installation on a per-application basis. Each set of
application data will additionally install an empty marker file which
indicates that data is available for this application (something like
${DATA_INSTALL_DIR}/${APP}/hasdata; please suggest better schemes if
there are).

My roadmap also includes the following changes to be implemented
before the hard feature freeze (Thu, November 10):

3. Adjust the buildsystem of kdegames to exclude applications from
compilation when required data files are missing. The exclusion can be
overridden by a documented CMake switch. This is consistent with how
other runtime dependencies of kdegames are handled (see
kdegames/kajongg/CMakeLists.txt).
4. Make all applications abort with a visible warning when data files
are missing.

In both cases, "data files are missing" is detected by checking for
the existence of the marker files installed by kdegames-data (see
point 2). If a developer for some reason wants to use the application
without the data files, he can do so by touching the marker file
manually.

As has been said before, if there are no objections, I would like to
implement this proposal starting from the end of next week (Sat,
October 22), to make sure that the required changes get into 4.8.

Greetings
Stefan

[1]
$ find -type f -regextype posix-egrep -regex
'.*\.(wav|ogg|svg|svgz|jpg|png)' | xargs du -hsc | grep total
86M total
$ du -hsc (*~(.svn|.git)) | grep total
113M total
The latter uses zsh extended glob: (*~(.svn|.git)) matches everything
except .svn or .git

Comments

Re: Module layout proposal: Split kdegames-dat

By Ian Wadham at 10/15/2011 - 18:31

Reading your proposal carefully, I am not sure exactly what it is you are
proposing.

Are you canvassing an immediate move of KDE Games to git in the
next few weeks (i.e. before the hard feature freeze for 4.8)? If so,
I am strongly opposed to that occurring at such an inopportune time.
I am responsible, as author or maintainer, for about one eighth of the
file size in KDE Games and I am currently trying to make a large change
to one of the games, KSudoku, to fix a series of bug reports that are
nearly two years old and render the game virtually useless to serious
Sudoku enthusiasts. I do not wish to change horses in mid-stream.

Or are you proposing an adjustment to the SVN repositories in
preparation for a move to git at some more opportune time, such as
early in the next release cycle? If there is one ... :-) If so, can you do
this without disrupting my work over the next few weeks? I have spent
a couple of months on it already and have no desire to see it go
down the gurgler.

In general, I am opposed to the change to git altogether and I do not
see what the advantages would be in concrete terms at the coding
desk level. I say this as a veteran source code controller, going
back more years than I care to count. The truth is that KDE Games
is more or less static, except for the few of us who are still left, many
games are unmaintained and others have large backlogs of bugs
to be fixed. So why spend our time on a new source code control
system when we do not *have* much time and *it* would not have
much work to do?

Finally, have you fully thought through the implications of the
changeover for all concerned? I witnessed at first hand on
the build system mailing list the chaos that attended the 4.7
release and the howls from distros. I do not want KDE Games
to get any egg on its face over this, neither with distros nor
with our end-users. Nor do I wish to see any games getting
lost because people do not know where to find them.

All the best, Ian W.

Re: Module layout proposal: Split kdegames-dat

By Albert Astals Cid at 10/15/2011 - 10:58

A Divendres, 14 d'octubre de 2011, Stefan Majewsky vàreu escriure:
As I said before, I disagree with this, it imposes pain into regular
contributors (as now I have to checkout two repos and install two instead of
one and remember where things I want to commit are) in favour of the mythical
drive-by contributor. And then again I do not see the benefit for the drive-by
contributor since obviously he'll still need to checkout the data repository
if he wants to make sure the patch he is making works.

Albert

Re: Module layout proposal: Split kdegames-data from kdegames

By Parker Coates at 10/14/2011 - 17:51

On Fri, Oct 14, 2011 at 14:29, Stefan Majewsky wrote:
Before going any further, I just want to confirm that this particular
proposal is stand-alone and independent of the
per-application-vs-monolithic-repository debate. That seems to be the
case, but I want to make sure I'm not missing anything. This
data-files-with-git issue is a particularly unpleasant one that'll
need to be solved either way.

Is there any reason we can't do all this with a kdegames/data
directory instead of a new top level module? Such a directory could
easily be left behind in the git migration, but in the meantime
kdegames/CMakeLists.txt could be modified to build the data directory
first and the current process of building KDEGames would be pretty
much unchanged. The existing per-application build flags could also be
used as is.

Of course there could reasons for preferring a top level module that
I'm just not seeing.

This seems like a reasonable thing to do regardless of the module
structure or version control system.

Personally I don't feel these advantages are significant enough to
justify doubling the number of packages shipped. Ideally, I would hope
that distributions will be able to work some magic to merge the
application and the data files back together again, to create a single
package per game as they do now. Of course I have no idea if this is
actually doable.

Also, will the move preserve history for the data files? While history
is obviously less essential for data files than it is for source code,
it'd still be nice to have, especially to guide one to where the data
used to live.

No objections from me. Honestly, I don't like this solution; it seems
complicated and awkward. I had really been hoping that a better
solution would be brought forward, but one hasn't, so I see no reason
to delay any further.

Parker

Re: Module layout proposal: Split kdegames-data from kdegames

By =?UTF-8?Q?Nicol... at 10/15/2011 - 17:58

2011/10/14, Parker Coates <parker. ... at gmail dot com>:
Many distros already split data and code into two binary packages to
save space, as then they need one code package for every architecture
but only one package for data.

As long as the move is done with "svn mv" instead of committing all
the files from scratch, history will be preserved.

Re: Module layout proposal: Split kdegames-data from kdegames

By Michael Pyne at 10/15/2011 - 10:18

On Friday, October 14, 2011 18:51:09 Parker Coates wrote:
This would not really play nicely with kdesrc-build users. They'd have to
remove the old svn source directory anyways, but making an SVN non-
proj.kde.org directory nest under a git proj.kde.org directory would be an
exercise in frustration.

Regards,
- Michael Pyne

Re: Module layout proposal: Split kdegames-data from kdegames

By Alexander Neundorf at 10/14/2011 - 14:30

On Friday 14 October 2011, Stefan Majewsky wrote:
IMO today with usually broadband internet access this shouldn't be much of a
concern (especially if these files change only rarely).

Personally I'd say "who cares". At least I don't check anymore how big a
checkout will be before doing the checkout.

I'm not sure I really like this.
The general trend is to split per-application, so there would be e.g. one
repository kolf and one for ksquares.
If data/ goes into one separate repository, this will make it harder once the
sources are split into multiple repositories.
Either each small game would depend on all data files, or each small game
would consist of two packages, one for the data and one for the actual
program.
Or will packager take care of this and create nice packages ?

Would it be possible to simply check whether the directory exists ?
${DATA_INSTALL_DIR}/${APP}/Data/ or something like this ?

Hmm, not sure.
I'd prefer if it would also build when the data files are not present (since
they are not required for building).

This sounds good.

Overall I think it looks like a reasonable plan, if it is really necessary to
keep the binary files out of git (I'd much prefer if they could simply be put
into git).

Alex

Re: Module layout proposal: Split kdegames-data from kdegames

By Stefan Majewsky at 10/14/2011 - 15:29

On Fri, Oct 14, 2011 at 9:30 PM, Alexander Neundorf < ... at kde dot org> wrote:
For all games, the data contributes 400 MB of uncompressable history.
(I think that's the number, although it would be nice if someone who
already ran the kdegames svn2git rules could confirm this.)

And no, broadband internet access is not yet ubiquituous. Even here in
Germany, there is a non-negligible percentage of people on less than 1
MBit/s (sometimes much less). Also, broadband internet access is in a
non-negligible number of cases over UMTS and thus limited by volume
plans, so also the raw download volume matters very much in a lot of
cases.

I vote for splitting the data not because I want to save some
bandwidth. It's because we got numerous complaints with the initial
plan to include the data in the Git repos. Among others, I
specifically remember Aaron caring about drive-by contributors.

Yes, the plan is that each game should be split into two packages.
I've consciously included kde-packagers@ for their feedback on this
plan. My humble impression is that package dependency solvers are fast
enough nowadays to handle multiple packages even for small apps.

Each game depends on its data files only. That's why each set of data
(per application) shall create its own marker file.
macro_optional_add_subdirectory ensures that partial SVN checkouts
continue to work for cases where the developer needs a more current
version of his specific data files.

When I checked last time, directories were not cleaned up correctly by
the uninstall target.

It would be possible to build, but it would require a flag. This is
what packagers want to do (since they probably build kdegames-data
separately), but for the random user, I want to make it obvious that
something's wrong by not compiling about everything.

Greetings
Stefan