DevHeads.net

Re: [CentOS] Btrfs going forward, was: Errors on an SSD drive

Changing the subject since this is rather Btrfs specific now.

On Fri, Aug 11, 2017 at 5:41 AM, hw <hw@gc-24.de> wrote:
Yes.

The block layer has no faulty device handling, i.e. it just reports
whatever problems the device or the controller report. Where md/mdadm
and md/LVM have implemented policies for ejecting (setting a device to
faulty) a block device. Btrfs does not do that, it'll just keep trying
to use a faulty device.

So you have to setup something that monitors for either physical
device errors, or btrfs errors or both, depending on what you want.

There's 1500 to 3000 line changes to Btrfs code per kernel release.
There's too much to backport most of it. Serious fixes do get
backported by upstream to longterm kernels, but to what degree, you
have to check the upstream changelogs to know about it.

And right now most backports go to only 4.4 and 4.9. And I can't tell
you what kernel-3.10.0-514.10.2.el7.x86_64.rpm translates into, that
requires a secret decoder ring near as I can tell as it's a kernel
made from multiple branches, and then also a bunch of separate
patches.

Red Hat are working on a new user space wrapper and volume format
based on md, device mapper, LVM, and XFS.
<a href="http://stratis-storage.github.io/" title="http://stratis-storage.github.io/">http://stratis-storage.github.io/</a>
<a href="https://stratis-storage.github.io/StratisSoftwareDesign.pdf" title="https://stratis-storage.github.io/StratisSoftwareDesign.pdf">https://stratis-storage.github.io/StratisSoftwareDesign.pdf</a>

It's an aggressive development schedule and as so much of it is
journaling and CoW based I have no way to assess whether it ends up
with its own set of problems, not dissimilar to Btrfs. We'll just have
to see. But if there are underlying guts in the device-mapper that do
things better/faster/easier than Btrfs, the Btrfs devs have said they
can hook into device-mapper for these things to consolidate code base,
in particular for the multiple device handling. By its own vague time
table it will be years before it has "rough ZFS features" and again
estimating bootloader support, and to what degree other distros pick
up on it, it very well could end up being widely adopted, or it could
be a Red Hat only thing in practice.

Canonical appears to be charging ahead with OpenZFS included by
default out of the box (although not for rootfs yet I guess), and that
has an open ended and possibly long window before legal issues get
tested. But this is by far the most cross platform solution: FreeBSD,
Illumos, Linux, macOS. And ZoL has RHEL/CentOS specific packages.

But I can't tell you for sure what ZoL's faulty device behavior is
either, whether it ejects faulty or flaky devices and when, or if like
Btrfs is just tolerates it.

The elrepo.org folks can still sanely set CONFIG_BTRFS_FS=m, but I
suspect if RHEL unsets that in RHEL 8 kernels, that CentOS will do the
same.

Comments

Re: Btrfs going forward, was: Errors on an SSD drive

By Warren Young at 08/11/2017 - 14:12

On Aug 11, 2017, at 11:00 AM, Chris Murphy < ... at colorremedies dot com> wrote:
That is one of the open questions about Stratis: should its stratisd act in the place of smartd?

Vote and comment on its GitHub issue here:

<a href="https://github.com/stratis-storage/stratisd/issues/72" title="https://github.com/stratis-storage/stratisd/issues/72">https://github.com/stratis-storage/stratisd/issues/72</a>

I’m in favor of it. The daemon had to be there anyway, it makes sense to push SMART failure indicators up through the block layer into the volume manager layer so it can react intelligently to the failure, and FreeBSD’s ZFS is getting such a daemon soon so we want one, too:

<a href="https://www.phoronix.com/scan.php?page=news_item&amp;px=ZFSD-For-FreeBSD" title="https://www.phoronix.com/scan.php?page=news_item&amp;px=ZFSD-For-FreeBSD">https://www.phoronix.com/scan.php?page=news_item&amp;px=ZFSD-For-FreeBSD</a>

I rather doubt btrfs will be compiled out of the kernel in EL8, and even if it is, it’ll probably be in the CentOSPlus kernels.

What you *won’t* get from Red Hat is the ability to install EL8 onto a btrfs volume from within Anaconda, the btrfs tools won’t be installed by default, and if you have a Red Hat subscription, they won’t be all that willing to help you with btrfs-related problems.

But will you be able to install EL8 onto an existing XFS-formatted boot volume and mount your old btrfs data volume? I guess “yes.”

I suspect you’ll even be able to manually create new btrfs data volumes in EL8.

openSUSE defaults to btrfs on root, though XFS on /home for some reason:

<a href="https://goo.gl/Hiuzbu" title="https://goo.gl/Hiuzbu">https://goo.gl/Hiuzbu</a>

Stratis: <a href="https://stratis-storage.github.io/StratisSoftwareDesign.pdf" title="https://stratis-storage.github.io/StratisSoftwareDesign.pdf">https://stratis-storage.github.io/StratisSoftwareDesign.pdf</a>

The main downside to Stratis I see is that it looks like 1.0 is scheduled to coincide with RHEL 8, based on the release dates of RHELs past, which means it won’t have any kind of redundant storage options to begin with, not even RAID-1, the only meaningful RAID level when it comes to comparing against btrfs.

The claim is that “enterprise” users don’t want software RAID anyway, so they don’t need to provide it in whatever version of Stratis ships with EL 8. I think my reply to that holds true for many of us CentOS users:

<a href="https://github.com/stratis-storage/stratis-docs/issues/54" title="https://github.com/stratis-storage/stratis-docs/issues/54">https://github.com/stratis-storage/stratis-docs/issues/54</a>

Ah well, my company has historically been skipping even-numbered RHEL releases anyway due to lack of compelling reasons to migrate from the prior odd-numbered release still being supported. Maybe Stratis will be ready for prime time by the time EL9 ships.

The Red Hat/Fedora developers are well aware that they started out ~7 years behind when they pushed btrfs forward as a “technology preview” with RHEL 6, and are now more like 12 years behind the ZFS world after waiting in vain for btrfs to catch up.

Basically, Stratis is their plan to catch up on the cheap, building atop existing, tested infrastructure already in Linux.

My biggest worry is that because it’s not integrated top-to-bottom like ZFS is, they’ll miss out on some of the key advantages you have with ZFS.

I’m all for making the current near-manual LVM2 + MD + DM + XFS lash-up more integrated and automated, even if it’s just a pretty face in front of those same components. The question is how well that interface mimics the end user experience of ZFS, which in my mind still provides the best CLI experience, even if you compare only on features they share in common. btrfs’ tools are close, but I guess the correct command much more often with ZFS’ tools.

That latter is an explicit goal of the Stratis project. They know that filesystem maintenance is not a daily task for most of us, so that we tend to forget commands, since we haven’t used them in months. It is a major feature of a filesystem to have commands you can guess correctly based on fuzzy memories of having used them once months ago.

Correct. ZFS-on-root-on-Ubuntu is still an unholy mess:

<a href="https://github.com/zfsonlinux/zfs/wiki/Ubuntu" title="https://github.com/zfsonlinux/zfs/wiki/Ubuntu">https://github.com/zfsonlinux/zfs/wiki/Ubuntu</a>

Lacking something like zfsd, I’d guess it just tolerates it, and that you need to pair it with smartd to have notification of failing devices. You could script that to have automatic spare replacement.

Or, port FreeBSD’s zfsd over.

Re: Btrfs going forward, was: Errors on an SSD drive

By hw at 08/11/2017 - 14:39

Can I use that now?

Redundancy is required.

How do you install on an XFS that is adjusted to the stripe size and the number of
units when using hardware RAID? I tried that, without success.

What if you want to use SSDs to install the system on? That usually puts hardware
RAID of the question.

That leaves them unable to overcome the disadvantages of hardware RAID.
I don´t want the performance penalty MD brings about even as a home user.
Same goes for ZFS. I can´t tell yet how the penalty looks with btrfs,
only that I haven´t noticed any yet.

And that brings back the question why nobody makes a hardware ZFS controller.
Enterprise users would probably love that, provided that the performance issues
could be resolved.

I´m more for getting rid of it. Just try to copy a LV into another VG, especially
when the VG resides on different devices. Or try to make a snapshot in another VG
because the devices the source of the snapshot resides on don´t have enough free
space.

LVM lacks so much flexibility that it is more a cumbersome burdon than anything else.
I have lost a whole VM when I tried to copy it, thanks to LVM. It was so complicated
that the LV somehow vanished, and I still don´t know what happened. No more LVM.

Re: Btrfs going forward, was: Errors on an SSD drive

By hw at 08/11/2017 - 13:37

Chris Murphy wrote:
I want to know when a drive has failed. How can I monitor that? I´ve begun
to use btrfs only recently.

So these kernels are a mess. What´s the point of backports when they aren´t
done correctly?

This puts a big stamp "stay away from" on RHEL/Centos.

So in another 15 or 20 years, some kind of RH file system might become
usable.

I´d say the need to wake up because the need for features provided by ZFS
and btrfs already exists since years. Even their current XFS implementation
is flawed because there is no way to install on an XFS that is adjusted to
the volume of the hardware RAID the XFS is created on as it is supposed to
be.

That can be an advantage.

What is the state of ZFS for Centos? I´m going to need it because I have
data on some disks that were used for ZFS and now need to be read by a
machine running Centos.

Does it require a particular kernel version?

You can monitor the disks and see when one has failed.

Sanely? With the kernel being such a mess?

Re: Btrfs going forward, was: Errors on an SSD drive

By Mark Haney at 08/11/2017 - 13:17

On Fri, Aug 11, 2017 at 1:00 PM, Chris Murphy < ... at colorremedies dot com>
wrote:

As for a hardware problem, the drives were ones purchased in Lenovo
professional workstation laptops, and, while you do get lemons
occasionally, I tried 4 different ones of the exact same model and had the
exact same issues. Its highly unlikely I'd get 4 of the same brand to have
hardware issues. Once I went back to ext4 on those systems I could run the
devil out of them and not see any freezes under even heavy load, nor any
other hardware related items. In fact, the one I used at my last job was
given to me on my way out and it's now being used by my daughter. It's been
upgraded from Fedora 23 to 26 without a hitch. On ext4. Say what you
want, BTRFS is a very bad filesystem in my experience.

Re: Btrfs going forward, was: Errors on an SSD drive

By hw at 08/11/2017 - 13:52

Mark Haney wrote:
What´s the alternative? Hardware RAID with SSDs not particularly designed for
this application is a bad idea. Software RAID with mdadm is a bad idea because
it comes with quite some performance loss. ZFS is troublesome because it´s not
as well integrated as we can wish for, booting from a ZFS volume gives you even
more trouble, and it is rather noticeable that ZFS wasn´t designed with
performance in mind.

That doesn´t even mention features like checksumming, deduplication, compression
and the creation of subvolumes (or their equivalent). It also doesn´t mention
that LVM is a catastrophy.

I could use hardware RAID, but neither XFS, nor ext4 offer the required features.

So what should I use instead of btrfs or ZFS? I went with btrfs because it´s
less troublesome than ZFS and provides features for which I don´t know any good
alternative. So far, it´s working fine, but I´d rather switch now than experience
desaster.

Re: Btrfs going forward, was: Errors on an SSD drive

By Warren Young at 08/11/2017 - 14:29

On Aug 11, 2017, at 11:52 AM, hw <hw@gc-24.de> wrote:
That sounds like outdated information, from the time before CPUs were fast enough to do parity RAID calculations with insignificant overhead.

Those are both solvable problems, involving less resources than Fedora/Red Hat are throwing at Stratis. Therefore, we can infer that they don’t *want* to solve those problems.

You don’t get the vastly superior filesystem durability ZFS offers without a performance hit. Any competing filesystem that comes along and offers the same features will have the same hit.

If you want burnin’ speed at all costs, use ext4.

That is intended to be in Stratis, but not until 3.0, which is not yet even scheduled.

This is part of what I meant by my speculation in a prior post that Stratis won’t be ready for prime time until EL9. Plan accordingly.

Also Stratis 3.0.

That should be possible with the earliest testable versions of Stratis, as LVM2 provides this today:

<a href="https://goo.gl/2U4Uio" title="https://goo.gl/2U4Uio">https://goo.gl/2U4Uio</a>

I will grant that it’s an utter mess to manage by hand with the current tools. Fixing that is one of the primary goals of the Stratis project.

Complaining about it on the CentOS list is not the right way to fix it. If you want Stratis to not suck, they’re targeting the first releases of it for Fedora 28.

There’s also the GitHub issue tracker. Let them know what you want Stratis to be; the Stratis developers are not likely to schedule features “properly” if they misunderstand which ones matter most to you.

My sense is that the Linux-first hardware RAID card market is shrinking and maybe even dying.

With 10-dumb-SATA motherboards readily available, I think it’s time for hardware RAID to die in Linux, too, if only in the workstation and low-end server segment, where 10-ish drives is enough.

One of your options is to take advantage of the fact that CentOS major releases overlap in support: EL7 will still be supported when EL9 comes out. All of this should be greatly clarified by then.

Re: Btrfs going forward, was: Errors on an SSD drive

By Gordon Messmer at 08/11/2017 - 14:21

On 08/11/2017 10:52 AM, hw wrote:

That's not usually the case in my experience. Battery-backed write
caches make benchmarks like bonnie++ look amazing, but in real workloads
I typically see better performance from md RAID.