LVM snapshots considered evil

Running out of space in an LVM snapshot is disasterous.

Scenario

  • XEN server running 3 virtual machines with some variant of Ubuntu
  •  Those 3 virtual machines need upgrading 

Plan

repeat for each virutal machine

  • snapshot root-volume of virtual machine (ignore that it is mounted; important data resides on separate partitions anyway)
  • chroot into snapshot
  • execute dist-upgrade in chroot
  • reboot virtual machine with snapshot as root
  • deal with fallout from upgrade
  • If satisfied, merge snapshot back
  • reboot again, original volume as root

or so I thought.

Problems with this approach

  • dist-upgrade sees running services in the xen DomU and tries to restart them --> better run upgrade in it's own instance
  •  I eventually ran out of snapshot-space, this proves to be really fatal. As the filesystem in  the snapshot was mounted and suddenly the underlying block device (=snapshot) became invalid, the update bombed out with I/O-errors. Once full, the snapshot is invalid. There is no way to go from invalid snapshot -> valid snapshot!

In the end, upgrade did eventually work in the snapshot, given enough time and space.
Stuff like this should be much easier with btrfs (if and when it works, that is).