DevHeads.net

Xen / EC2 release criteria proposal

Hey folks! I'm starting a new thread for this to trim the recipient
list a bit and include devel@ and coreos@.

The Story So Far: there is a Fedora release criterion which requires
Fedora to boot on Xen:

"The release must boot successfully as Xen DomU with releases providing
a functional, supported Xen Dom0 and widely used cloud providers
utilizing Xen."

We (QA group) had a discussion about removing this criterion entirely.
That has now morphed into the idea that we should tweak it to be
focused exclusively on the "widely used cloud providers utilizing
Xen"...by which we mean EC2. At the time this criterion was invented,
all EC2 instance types used Xen; now, some still use Xen, and some use
KVM.

So it seems like this would also be a good opportunity to revisit and
nail down more specifically exactly what our cloud requirements are.
bcotton suggested that we require two sample instance types to be
tested, c5.large (KVM) and t3.large (Xen). (I've also mailed Thomas
Cameron, ex-of Red Hat, now of Amazon, for his opinion, as it seemed
like it might be worthwhile - he's promised to get back to me).

So, for now, let me propose this as a trial balloon: we rewrite the
above criterion to say:

"Release-blocking cloud disk images must be published to Amazon EC2 as
AMIs, and these must boot successfully and meet other relevant release
criteria on c5.large and t3.large instance types."

Notes:

1. The test matrix template has an Openstack column, but we never
actually covered Openstack in the release criteria. I think we should
continue to leave it out of the criteria and also remove the column
from the matrix. In the past we tested Openstack boot in the infra
Openstack, but that has gone away or is going away...that means a) we
can't test on Openstack so easily any more and b) a lot of the reason
to bother testing on Openstack is gone. This is up for debate, but if
we believe it's important to test on Openstack and block on working in
that environment we need to have a reliable way to *do* that.

2. The test matrix template also has a 'Local' column which is for
testing locally with testcloud, but I don't think we need to
specifically require that to work in the criteria. It's more of a
testing convenience thing, so even if no-one tests on EC2 we at least
test that the image boots in testcloud; testcloud isn't an environment
you'd actually use to do anything practical in.

3. I believe this wording is generic enough to cover us if we, e.g.,
want to start blocking on CoreOS images. All we have to do is declare
that some CoreOS image is 'release-blocking', and it's instantly
covered by this requirement.

Comments

Re: Xen / EC2 release criteria proposal

By Dusty Mabe at 08/11/2019 - 12:14

On 8/9/19 8:56 PM, Adam Williamson wrote:
Hey Adam!

Sounds good to me if we trim it down to a few instance types that we think
will cover Xen and KVM based booting in EC2. For Fedora CoreOS we'll be doing
some automated testing in EC2. I don't know if we have a certain set of instance
types we'll be using for that, but the information that Matt provided should
help us decide.

Dusty

Re: Xen / EC2 release criteria proposal

By W. Michael Petullo at 08/10/2019 - 09:24

I am a long time Xen/Fedora user. In fact, I rely on Fedora as my Dom0. I
acknowledge that there are not too many of us, and I further acknowledge
that mandatory testing often goes unperformed. The Fedora Xen mailing
list is exceedingly low-volume.

Michael Young has in the past done a lot of the heavy lifting surrounding
Xen support, and I am very grateful for his work.

Xen Dom0 is particularly tenuous. Dropping (for a good reason) GRUB's
multiboot2 module left Xen unable to boot under EFI [e.g., 1].

All of that said, there are good reasons to choose Xen over KVM. Xen's
architecture and full support for libvmi come to mind. (Of course,
there are good reasons to choose KVM too.)

Perhaps we could go one more release with the status quo to give the
Xen/Fedora community a last chance to rally and demonstrate a willingness
to perform the necessary testing and maintenance. I suspect we are
all quite busy, so we might find ourselves admitting that broader Xen
support will be relegated to the standard means of maintenance rather
than rising to the status of blockers.

[1] <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1703872" title="https://bugzilla.redhat.com/show_bug.cgi?id=1703872">https://bugzilla.redhat.com/show_bug.cgi?id=1703872</a>

Re: Xen / EC2 release criteria proposal

By Laura Abbott at 08/10/2019 - 01:31

On 8/10/19 2:56 AM, Adam Williamson wrote:
Thanks,
Laura

P.S. For those who might be interested in keeping this working in
the kernel, testing is good but bisecting and identifying fixes
to bring in is much more valuable simply because it's what's missing
at the moment.

Re: Xen / EC2 release criteria proposal

By Nico Kadel-Garcia at 08/09/2019 - 23:53

On Fri, Aug 9, 2019 at 8:57 PM Adam Williamson
< ... at fedoraproject dot org> wrote:
How difficult is this to accomodate?Amaxon Dom0... well, they've got
their own developers tweaking their own kernel, both for their
hypervisors and for Amazon Linux, and they do seem to absorb leding
edge kernel technologies. It's the rest of us, using the other
technologies such as Xen, from the CentOS community, KVM from the Red
Hat commercial community, Virtualbox and VMWAre guests, that I think
are more likely to run into difficulties.

Commercially, and for developers, they're all still in use. As a
DevOps person, I can appreciate that testing resources are limited.

The *tiny* instances are still often used for test environments.

As a Fedora user, and a cloud user, this makes complete sense. It gets
very expensive in money and manpower to test *)everything*, especialy
if you're at the bleeding edge of software development. Well defined
test criteria are our friend.

Re: Xen / EC2 release criteria proposal

By Adam Williamson at 08/10/2019 - 01:00

On Sat, 2019-08-10 at 00:53 -0400, Nico Kadel-Garcia wrote:
So there's two factors behind the idea of dropping support for
straight-up Xen domU support:

1. It just doesn't get tested. It's been in the criteria for years and
we've had various promises from various folks to test it, but...it just
doesn't happen. Each cycle we end up scrambling to have someone test it
in a hurry a week before release. Once again after we sent out this
proposal people have promised to test it, but...honestly, after the
last two go-rounds I'm finding it harder to believe in that.

2. What's the justification for it? Xen isn't our supported virt stack,
that's KVM. It is also just not that popularly used by Fedora users in
my experience. People ask about running Fedora on VMware, VirtualBox
and Parallels a lot, and we don't block on those. Xen doesn't often
come up, yet we block on it.

KVM we already block on, as it's Fedora's supported virt stack. And
yeah, we've never blocked on VirtualBox or VMWare even though they're
widely used. So just blocking on Xen seems a little arbitrary.

Sure, but we can't practically commit to testing every instance type
(there's a ton). The aim is, pick a reasonable sample that will give us
pretty good confidence that the others will work too. If 'large' works,
is 'tiny' likely to not work? And vice versa? This is definitely
something we still need to nail down, the types suggested so far are
just bcotton's proposal. Perhaps it would make sense to go for smaller
types rather than larger ones, as you can envisage a scenario where the
larger types are fine but smaller ones have issues due to resource
problems or something...of course, we shouldn't pick a type with fewer
hardware resources than we actually intend to support...

Thanks for the feedback!