Module metadata proposal

Good news, everyone,

the first draft of the module metadata format is now available
for you to comment on. We've decided to go with YAML so it
should be fairly readable. You can view the latest version here:

<a href="" title=""></a>

What is is:
The file defines basic properties of the module such as its
name, version, description, licenses, references to upstream
documentation or its content. Currently only RPM content
is supported but this can be easily extended in the future.
The metadata file is meant as both input and output of the
module build process (don't confuse it with package build
process), with various tools adding various new data to it,
such as vendor and buildsystem identifiers, timestamp of the
build, autogenerated lists of licenses or whatever you can
think of (well, maybe not whatever but close). The output is
then placed in the generated repository, container image or
any other module deliverable and can be processed by tools and
services consuming and delivering modules.

What is isn't:
It's not a SPEC file. It doesn't say how to build individual
packages. And it's not a simple comps group either. It can
and does provide lots more additional data.

It's not perfect and it's constantly evolving. Please, do
comment, ask questions and suggest improvements.



Re: Module metadata proposal

By Colin Walters at 04/15/2016 - 16:17

On Thu, Apr 14, 2016, at 12:35 PM, Petr Šabata wrote:
This all seems really abstract.

How about the first step is - take all of the transitive build dependencies
starting from the kernel, put those in a separate rpm-md repository from
everything else?

That'll drive the answer to lots of other problems like for how long
do we keep synthesizing the "gigantic Everything repo" in addition vs
changing over the fedora-repos package, etc. How to manage migrating
things between them, when a new build dependency appears, etc.

There's of course a lot more possible radical steps to take - to repeat,
I really like the OpenEmbedded code organization, model, toolchain etc.,
but if we can't even manage to split into two repos with any kind of speed,
at least it'll be an informative exercise.

Re: Module metadata proposal

By Petr Sabata at 04/21/2016 - 15:44

On Fri, Apr 15, 2016 at 05:17:09PM -0400, Colin Walters wrote:
This is a draft of the module definition format. You could
consider it "comps with extra metadata" at this point. In the
future we expect to build modules from various other sources,
not just RPMs, and deliver them in various format, not just
RPM repositories.

The format is meant to be abstract and hide build/implementation
details from the module packager. One source should be usable
for all these situations/purposes.

I think this is out of scope of this thread.


Re: Module metadata proposal

By Stephen C. Tweedie at 04/15/2016 - 11:19


How easy is it to modify and change field definitions around?  I see
you've got a version identified for the format already --- good, that's
definitely something we want here.

One thing I think we need is a bit more detail in the module
dependencies.  We don't need them all for the initial task of building
a module and testing its repoclosure; but I think we may well have a
need for (for example)

* Package build deps: what other modules you need to compile packages
in this module.  (This basically defines the build root for the
module, and we want to be able to make sure we're using a consistent
build root with consistent compiler versions etc. for all the
packages in a module.)

* Runtime deps: what other modules need to be enabled by the user at
runtime to use this module.  Eg. library dependencies, CLI tool

* -Devel deps: what other modules need to be enabled by the user to
build applications against this module.

We could also expand on the module ID a bit.  Many packaging systems
use a hierarchical naming scheme --- eg. instead of name: foo, name
might be "".  A maintainer name and reference download
URL/homepage could also be useful here.

But the main place I'd like to see expanded is the package list itself.
In addition to the list of packages included, I think we need:

What is the function of the package?  It may be:

* A runtime component which is part of the official API of the module.
We can do things like verify ABI compatibility on these components
on updates if we want.

* A runtime component which is an internal implementation-detail only
(similar to the distinction between unstable, internal and stable,
external symbols in a library.)  A user should know not to rely on
these components remaining the same on module updates.

We should also record which externally-usable package needs this
internal dependency in this case.

* A -devel package: never needed at runtime; only used if a developer
is building an application against the module.

* Debuginfo.  We could choose to keep debuginfo in the module itself,
marked this way; or we could keep separate debuginfo lookasides or
separate debuginfo modules.  Not sure which way we'll eventually go,
but it would be useful to at least be able to mark packages which
are included only for runtime debugging.

For long-term distro maintenance, it will be *hugely* helpful to be
able to look at content and say "why do we have this package?  Does
anyone actually need it, or is it only there to satisfy some dependency
for an application that was added years ago and might not even need it
any longer?"

Making the distinction between external and internal functionality, and
recording explicitly what needs the internal pieces, will really help
that sort of long-term maintenance, making it much easier to see when
dependencies are no longer needed.

Re: Module metadata proposal

By Petr Sabata at 04/21/2016 - 15:32

On Fri, Apr 15, 2016 at 05:19:04PM +0100, Stephen C. Tweedie wrote:
Should be simple. We just bump the version number if it's a
breaking change. I also maintain a small library that should
provide an abstract API for handling this.

A buildrequires field was present in an earlier draft but I
removed it as it wasn't (and still isn't) entirely clear what
it actually means to build a module. I expect to put it back
once this is more clear.

You can define versioned module runtime dependencies in the
requires field.

Currently -devel (and other) subpackages are included if the
fulltree option is set to true. This is the default.

This is an interesting idea. Noted.

At the moment we list the "main" components of the module.
Other packages, such as the related subpackages, source RPMS
or debuginfos are automagically included if the fulltree option
is enabled (again, defaults to true).

Dependencies of the listed components that aren't provided by
any of the required modules are also included in the module if
the dependencies option is enabled (also defaults to true).

I would say the listed components could be considered the
"official API" and the bundled (an ugly word, I know)
dependencies would be the internal implementation detail.

You could also place comments in the (source control tracked)
YAML file for extra information. Of course these wouldn't
be normally visible to any processing tools but I don't think
that's important. Correct me if I'm wrong.

These are available via the fulltree feature, without the need
of listing them explicitly.

The same as above. We might split fulltree into two or more
options later, if required.

If it's a bundled dependency, an implementation detail, it would
be remove automatically the next time you build the module --
in case none of your main components needs it anymore.

Again, extra info could be added as comments.

Or do you think it'd be useful to have these as separate fields?


Re: Module metadata proposal

By Steve Grubb at 04/21/2016 - 15:57


How does this scheme compare with SWID? All common criteria protection profiles
are calling out for SWID tags. Rather than having to pay for the ISO standard,
NIST has a copy of nearly the same thing here:

<a href="" title=""></a>

The creation of SWID tags are expected to be done as part of the build
process. But there has to be some metadata that gets fed into build process to
cover things like product name, web site, license, etc.

It would be really good if we can align all of this to support SWID tag


On Thursday, April 21, 2016 04:32:02 PM Petr Šabata wrote:

Re: Module metadata proposal

By Petr Sabata at 04/21/2016 - 16:16

On Thu, Apr 21, 2016 at 04:57:48PM -0400, Steve Grubb wrote:
Thank you, this is great. I'll go through it.


Re: Module metadata proposal

By King InuYasha at 04/21/2016 - 15:37

On Thu, Apr 21, 2016 at 4:32 PM, Petr Šabata < ... at redhat dot com> wrote:
What makes this different from how comps metadata works? I look at
this, and I wonder why we don't just evolve the comps format and
perhaps make it easier to construct comps data. I honestly don't see a
reason to add yet another metadata format when it doesn't seem to make
sense. I understand why you use YAML for input, as that makes it
easier for people to structure the information, but perhaps
dynamically translating that into comps information would allow us to
reuse the infrastructure we already have.

Metadata proliferation is evil, please don't contribute to it unnecessarily.

Re: Module metadata proposal

By Petr Sabata at 04/21/2016 - 16:12

On Thu, Apr 21, 2016 at 04:37:35PM -0400, Neal Gompa wrote:
Everyone involved in the initial discussion was against XML and
YAML won over JSON for its readability and features (such as,
well, comments).

This is a working defintion that will definitely change over
time. If it proves useful, we might as well extend the comps
format and the tooling could translate into it on the fly.
This isn't definitive.