DevHeads.net

Fedora 31 System-Wide Change proposal: Modify Fedora 31 to use CgroupsV2 by default

<a href="https://fedoraproject.org/wiki/Changes/CGroupsV2" title="https://fedoraproject.org/wiki/Changes/CGroupsV2">https://fedoraproject.org/wiki/Changes/CGroupsV2</a>

== Summary ==
The kernel has had some support for CgroupsV2 for some time, and yet
no one has used it because it is not on by default. There are lots of
new features and fixes over CgroupsV1 that it is time to reveal to the
user community.

== Owner ==
* Name: [[User:dwalsh| Daniel J Walsh]]
* Email: < ... at redhat dot com>

== Detailed Description ==
Enablement of the CgroupsV2 by default will allow tools like systemd,
container tools and libvirt to take advantage of the new features and
many fixes in Cgroups V1. A lot of the functionality in VGroups V1
has been rewritten to fix fundamental flaws in its design.

The reason CGroupsV2 by default has been blocked is that the Container
tools and someone the Virtualization tools did not have support. We
believe that the time is right to try to move these tools along to
take advantage of this kernel feature. In order to begin testing these
features more widely we believe we need to have a platform like
Rawhide to test on and get others to test as well.

The main features of CgroupsV2 we would like to take advantage of in
the container world is delegation of cgroup hierarchies. Allowing
tools like podman to be able to use CGroups in rootless mode, would be
a large advance.

== Benefit to Fedora ==
Fedora is known for being a leading platform for the enablement of new
kernel functions, and this would continue its legacy. The world will
eventually move to CGroupsV2 and Fedora should lead the way.

== Scope ==
* Proposal owners:
The largest changes required to make this Change is to get containers
runtimes like RUNC to work with the change. After RUNC has support
for CgroupsV2 we need to move container engines like Podman, CRI-o,
Buildah and Moby into support CgroupsV2.

* Other developers:
We need to find other tools that have built the CGroupsV1 API into
themselves and get them to support CGroupsV2.

Known packages:

- libvirt: The team is already working on this.

- JVM: Uses Cgroups file system to check for allocated memory for
the JVM, will have to use and understand the CgroupV2 mechanism to
discover these sessings.

- Snap package does not run with CGroupV2:
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1438079" title="https://bugzilla.redhat.com/show_bug.cgi?id=1438079">https://bugzilla.redhat.com/show_bug.cgi?id=1438079</a>

- Systemd will need to be modified to set the new default to cgroupv2

* Release engineering: [https://pagure.io/releng/issue/8509 #8509]
* Policies and guidelines:
* Trademark approval: N/A (not needed for this Change)

== Upgrade/compatibility impact ==
Any tools or scripts that an administrator used to manually configure
the CGroupsV1 will have to be modified to CGroupsV2. Luckily if these
tools took advantage of systemd interfaces they should not require
changes.

== How To Test ==
Make sure different tools that use cgroups continue to work when
booted into the new system. Make sure containers, virtual machines
and the Jave Virtual Machine still work properly. Convert the VM's of
the Container tools like CRI-O, Buildah, Podman for run on Rawhide and
make sure their test suites completely pass. Will request that the
libvirt team and JVM teams similarly change their test platforms.

== User Experience ==
We believe that at this point their will be no or very little user
experience change, unless he is an administrator looking to modify the
system Cgroups using the cgroupsfs.

One potential problem will be container images that expect to be
running in a CgroupV1 environment. Some container engines leak the
Cgroup Hierarchy into containers so that tools within the container
can look at how much memory the cgroup gives them for example. These
tools might break with the change, but they should be adjusted quickly
over time, and I don't really see a way to avoid this.

== Dependencies ==
Currently there are no known changes to the package requirements for
this change.

== Contingency Plan ==
* Contingency mechanism: If the container tools and virtualization
tools do not work at beta and do not look like they will be ready for
beta freeze, then we revert to CgroupsV1 and try again in Fedora 32
* Contingency deadline: Beta Freeze
* Blocks release? Yes

Comments

Re: Fedora 31 System-Wide Change proposal: Modify Fedora 31 to u

By Zbigniew =?utf-... at 07/04/2019 - 05:21

On Wed, Jul 03, 2019 at 04:23:24PM -0400, Ben Cotton wrote:
Actually it's enough to set 'systemd.unified-cgroup-hierarchy' on the
kernel command line to test. I think this should be mentioned, so
people can test already in F29 or F30 or rawhide before the default is
changed.

We know that cgroupv2 already (and for a long time...) works better
than v1, so I'd rather make the switch unconditional, using the usual
phrasing of "In the unlikely case catastrophic problems are discovered
with v2, the default will be reverted to v1.".

Reverting the change is not the only possibility. We could simply
say that people who run software incompatible with v2 should modify
their kernel command line to override the default (systemd.unified_cgroup_hierarchy=0).

I'm sure a majority of machines do not run any containers, and they
will benefit from the change, and only the minority who is using
lagging software will need to adjust.

E.g. the complete lack of activity on <a href="https://bugs.launchpad.net/snapd/+bug/1678342" title="https://bugs.launchpad.net/snapd/+bug/1678342">https://bugs.launchpad.net/snapd/+bug/1678342</a>
and <a href="https://bugs.launchpad.net/snapd/+bug/1801664" title="https://bugs.launchpad.net/snapd/+bug/1801664">https://bugs.launchpad.net/snapd/+bug/1801664</a> suggests that
snapd will be caught dead in the water when F31 is released.
But IMHO it's likely that they will not be ready when F32 is released
either. I think we simply need to accept that not everything will be
ready at any given point, and select users will always have to undo the
switch locally.

Zbyszek

Re: Fedora 31 System-Wide Change proposal: Modify Fedora 31 to u

By Daniel J Walsh at 07/08/2019 - 09:50

On 7/4/19 5:21 AM, Zbigniew Jędrzejewski-Szmek wrote:
Their has not been much progress on runc development for this, which
might be a blocker.

In the Podman/Buildah world, we have support for crun, an alternate OCI
Runtime replacement for runc.  crun supports cgroupsv2.

There has been little movement in Kubernetes and OpenShift for adding
this support, but there has also been little incentive, since no OS Has
moved to it.

Can systemd turn on the cgroupsv2 by default in Rawhide, to see what
complaints happen.

Re: Fedora 31 System-Wide Change proposal: Modify Fedora 31 to u

By King InuYasha at 07/08/2019 - 11:00

On Mon, Jul 8, 2019 at 10:39 AM Daniel Walsh < ... at redhat dot com> wrote:
I would be shocked if Fedora having it switched on would matter. We
don't have a recent version of OpenShift or Kubernetes in our
repositories...

Re: Fedora 31 System-Wide Change proposal: Modify Fedora 31 to u

By Daniel J Walsh at 07/08/2019 - 15:00

On 7/8/19 11:00 AM, Neal Gompa wrote:

Re: Fedora 31 System-Wide Change proposal: Modify Fedora 31 to u

By Florian Weimer at 07/04/2019 - 08:42

* Zbigniew Jędrzejewski-Szmek:

Do you know if cgropvs2 makes process termination reporting less or more
exact? It's this bug:

<https://bugzilla.kernel.org/show_bug.cgi?id=154011>

I think in the past, cgroups changes have occasionally made this issue
significantly worse, to the extent that it was no longer possible to run
tests reliably that spawn and join many threads (but never running more
than a reasonable count in parallel).

Thanks,
Florian

Re: Fedora 31 System-Wide Change proposal: Modify Fedora 31 to u

By Zbigniew =?utf-... at 07/04/2019 - 09:20

On Thu, Jul 04, 2019 at 02:42:15PM +0200, Florian Weimer wrote:
No idea if v2 affects this specific issue with RLIMIT_NPROC and
threads in any way. But in a related matter — v2 makes improves the
reporting of empty cgroups, which makes process termination detection
much easier. This fixes some corner cases and bugs with service termination
in systemd.

Zbyszek

Re: Fedora 31 System-Wide Change proposal: Modify Fedora 31 to u

By Tomasz Torcz at 07/03/2019 - 16:34

On Wed, Jul 03, 2019 at 04:23:24PM -0400, Ben Cotton wrote:
What about Kubernetes? Will it work?
(Also upgrade would be nice, we ship 1.13 and latest is 1.15)