F20 Self Contained Change: Snapshot and Rollback Tool

= Proposed Self Contained Change: Snapshot and Rollback Tool =
<a href="" title=""></a>

Change owner(s): Stephen Gallagher < ... at redhat dot com>, Colin Walters
< ... at redhat dot com>

With the advent of thinly-provisioned LVM pools, it has become possible for us
to implement full-system LVM snapshotting for recording rollback points. We
are planning to support this for yum updates and eventually fedup upgrades
going forwards. This change request notes the addition of new tools provided
by the roller-derby project to present an interface and a CLI for managing and
initiating rollbacks.

== Detailed description ==
The roller-derby project will be providing a library and a CLI for creating,
labeling and managing LVM snapshots (plus non-LVM backups of /boot), oriented
primarily towards rpm-managed data, but useful beyond that. The yum plugin
"yum-plugin-fs-snapshot" will be updated to consume this library and save the
system state in a compatible format. The roller-derby CLI tool will provide an
interactive and scriptable interface for manipulating these snapshots and
determining when to remove older ones. It will also allow the tagging of
snapshots as "known-good", to be skipped when automatically-trimming for
space. The roller-derby project will likely provide a small daemon to keep
track of the available space in the LVM pool to proactively clean up snapshots
before the system runs out of space.

In order to prevent "loss" of data when rebooting into an snapshot, the
roller-derby CLI will allow saving a snapshot of the current state before
rolling back and will provide tools to allow mounting of that current state to
recover changes that have occurred since the rollback point.

== Scope ==
The scope of this project is the completion of the initial release of the
roller-derby project and the inclusion of thinly-provisioned LVM as an option
in the Anaconda installer [1].

Proposal owners: We need to complete the roller-derby project. Other than the
Anaconda change referenced above, all dependencies are available in Fedora

Other developers: OS Installer Support for LVM Thin Provisioning
Release engineering: N/A (not a System Wide Change)
Policies and guidelines: N/A (not a System Wide Change)

[1] <a href="" title=""></a>


Re: F20 Self Contained Change: Snapshot and Rollback Tool

By orion at 03/06/2014 - 19:27

On 07/17/2013 04:39 AM, Jaroslav Reznik wrote:

I this project dead? I'm casting about to tools to manage lvm snapshots and
roller-derby sounded promising. Any other tools out there?

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Chris Murphy at 03/06/2014 - 19:52

There's a recent snapshot/rollback thread on the desktop list that relates to this. LVM thin provisioning support is in the Fedora 20 installer (it fails to produce a bootable system, the post-install fix for which is listed in Fedora 20 common bugs).

However, if OSTree is used, then there isn't a hard requirement on either LVM Thin Provisioning or Btrfs.

Chris Murphy

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Josh Boyer at 03/07/2014 - 09:31

On Thu, Mar 6, 2014 at 6:52 PM, Chris Murphy < ... at colorremedies dot com> wrote:
I don't think that's accurate. OSTree doesn't touch /home from what I
remember. It is only concerned with /usr and to as minimal a degree
as possible /etc. People likely still want snapshot and rollback for
their actual _data_ as well.


Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Chris Murphy at 03/07/2014 - 12:10

Which part? OSTree doesn't require either LVM thinp or Btrfs. It works on plain ext4 or XFS.

Orion didn't mention /home, and Roller Derby doesn't directly address it either. Both yum-plugin-fs-snapshot and snapper can, but snapshots coincide with system updates. More useful is a regularly timed snapshot of the user's home, .e.g. hourly with age based clean-up.

Chris Murphy

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By orion at 03/07/2014 - 13:04

On 03/07/2014 09:10 AM, Chris Murphy wrote:
I'm actually not that interested in tying in with yum updates etc. I'm just
looking for a tool that might help with managing LVM snapshots in general -
and specifically for managing snapshots of VMs. Something I could perhaps say
have take a snapshot every X hours and keep the latest Y snapshots.

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Chris Murphy at 03/07/2014 - 13:41

I don't think Roller Derby applies here. Virt-manager and virsh support VM snapshots.

You could schedule snapshots with a script using virsh. But I don't know that it will create LVM snapshots, and even if it did you wouldn't want it to because they're slow. There soon will be LVM thinp support in libvirt but I don't think it's there yet. Instead, use qcow2 files for this. In my "Fedora 20 installation as a benchmarking tool" tests, the fastest installs I got were Btrfs in the guest, writing into a qcow2 file with xattr +C on a Btrfs host, with the unsafe cache setting. Even plain ext4 on an LV wasn't faster.

Chris Murphy

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Colin Walters at 03/07/2014 - 10:30

On Fri, Mar 7, 2014 at 8:31 AM, Josh Boyer < ... at fedoraproject dot org>
Many competently-maintained systems already have backup solutions for
data though. In fact, Anaconda defaults to having it on a separate
partition in some configurations precisely so that one can just blow
away the root partition and preserve /home.

Also, remember than on an OSTree system, /home is just a symlink to
/var/home - *all* local mutable state lives in /var.

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Miloslav =?UTF-... at 03/07/2014 - 09:51

2014-03-07 14:31 GMT+01:00 Josh Boyer < ... at fedoraproject dot org>:

(Choosing a random point in the conversation...)

I'm starting to think that snapshots are *never* the right tool, at best a
local optimization:

- For the OS and application code and static data: What we really want
is the ability to reinstall/redeploy this data if it became lost or
corrupted. We don't really want point-in-time snapshots; snapshots are
only a local optimization allowing us to "redeploy the version that has
been installed yesterday". An ideal technology would allow "instant"
deployment of both old and new versions (redeploying and old version and
deploying a new version have structurally the same effect on a filesystem),
then snapshots wouldn't be needed.
- For users' data: What we really want is backups--definitely on a
different disk, ideally off-site. An ideal technology would allow
continuous replication of the data elsewhere. Snaphots are at best a way
to quickly access a backup from the past hour, but are not at all a
replacement for a backup.
- For configuration: What we really want is a VCS, dealing with
changesets, documenting who has changed what, when and why. Snapshots are
a really poor VCS.

Obviously we don't have all that technology that we "really want", or at
least not in a way that is ready to deploy, but we kind of have snapshots.
Let's just not think that snapshots are "right".

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Chris Murphy at 03/07/2014 - 13:32

If is to be more stable/production oriented than previous Fedora's, then the problem Roller Derby is attempting to solve also changes. I think OSTree may eventually address the OS/application coherent updates problem better than the far less granular snapshot strategies to date, but it remains to be seen if we're going to have the same problem or concerns that initiate the desire for rollbacks in the first place.

If all we're looking to do in the near term is make yum/dnf and Gnome offline updates safer, that could happen relatively quickly with existing tools. But it would require a hard dependency on either LVM thinp or Btrfs snapshots, and changes to perform the update in a chroot on the snapshots rather than the active tree. But that's still significantly easier than maintaining dozens or hundreds of snapshots which both yum-plugin-fs-snapshot and snapper do.

Windows and OS X don't do atomic updates either. Windows essentially becomes unusable as updates are applied. OS X application updates require the application to be quit first, which it'll offer to do and then relaunch after the update; while system updates are applied only after user logout, and then the system reboots. But both their "OS trees" (system binaries minus apps) are static. They're essentially identical on every deployment. So they have a known initial quantity and quality, being updated. So they don't have nearly as much failsafe testing of the actual update process because of this. The updates themselves just don't fail. Therefore a rollback is a reinstallation. They don't even keep the old kernel around when it's updated, while our GRUB menu to fallback to the prior kernel is a kind of rollback.

So it sounds to me like OSTree could enable maybe a dozen common trees (rather than almost infinite today). Since they're common, they're also relatively stable, aided by the fact their start and end states during the course of an update are known. But multiple trees are also more flexible than the Windows/OS X paradigm where they have basically one tree, the only variation of which is its version as provided by those companies.

To do VCS correctly requires application opt-in, and an API to manage it. How do I get revision control with file formats that don't support it like RTF, txt, PNG, TIFF, etc? Well it's non-trivial because when I share the document via email or copy it to a thumb drive, it necessarily contains only one version: current. Yet when it's backed up and restored, versions must present? So where and how are the versions stored, and how does the user interact with the application in an intuitive manner?

One possible idea:
<a href="" title=""></a>

This isn't using fs snapshots at all.

Chris Murphy

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Miloslav =?UTF-... at 03/07/2014 - 13:43

2014-03-07 18:32 GMT+01:00 Chris Murphy < ... at colorremedies dot com>:

Yes, ideally; but actually it would still be useful for every
fedora-server-role-manage subcommand to:
1) check if /etc has changed at all since the last committed state; if so,
(git commit -m 'Unknown unmanaged changes between $timestamp and
$timestamp'), and optionally alert an admin
2) perform the primarily role of the subcommand as intended
3) (git commit -m 'performed $command for $user'), unless the admin has
explicitly disabled autocommit. (Asking for a rationale in the change log
should be an option but probably not default.)

We could then expand this from fedora-server-role-manage to other tools.

The way we already do this with git: git gives you full revision control,
and we lack the tools show the differences in a nice UI or to do a
reasonable three-way merge. But note that revision control is still quite
useful in this scenario: it gives you the ability to go back in time, to
track responsibility and rationale for changes, and if you really need to,
you can compare the content manually.

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Reindl Harald at 03/07/2014 - 09:41

Am 07.03.2014 14:31, schrieb Josh Boyer:
i don't think people *really* like to restore a snapshot of /usr
without /var/lib/rpm if they only know what that means at the end

Re: F20 Self Contained Change: Snapshot and Rollback Tool

By Colin Walters at 03/07/2014 - 10:26

On Fri, Mar 7, 2014 at 8:41 AM, Reindl Harald <h. ... at thelounge dot net>
On an rpm-ostree system, /var/lib/rpm is a symlink to /usr/share/rpm.

And that's only because I wanted to avoid depending on a small patch
to rpm to have it look in /usr/share/rpm automatically if /var/lib/rpm
doesn't exist.

If you follow this, you'll realize this also means it's immutable - rpm
not involved in the upgrade process. When you download a new tree,
you also download an entire new copy of the rpmdb. And yes, it's an
efficiency hit. On the other hand, every upgrade is atomic.