DevHeads.net

VDO killed my server

Folks

I was impressed with the description of VDO (Virtual Device
Optimizer?) in the RedHat documentaion, so much that I tried to use
it. The tutorials led me to a few commands. I built a VDO device on
top of two USB disks which I made into a Logical Volume, and I was ready to go.

In my test case, I had a file set of about 600 GB. There was 5 TB of
space between the two disk LVMs. So, I thought, let's see if I can
activate deduplication and compression, and see if VDO can take two,
or three, or four identical copies of that file set, at different
points in the file system tree.

Needless to say, all worked well with the first set. It took 24
hours to copy. The second set took another 24 hours, and all seemed
well. As I was copying the third set, I started to observe some
problems. The computer was serving other functions (internal DHCPD,
DNS, internal HTTPD), and these started to fail. There were no
obvious alerts or warnings from VDO, but the other functions of the
system started to die. The diagnostics from JOURNALCTL were vague
(failure to create a file...), but when I want looking with 'df', all
the file systems seemed to have enough room for everything. Even the
'top' program showed available space in the pools it revealed.

After hours of my internal clients complaining, I finally removed the
'mount' in /etc/fstab that loaded the VDO system, killed the file
copies, and rebooted. The system then resumed normal healthy
functions, but without the VDO files.

It my mind, there are a few points:

- If VDO is competing for a finite resource (Memory?), it probably
should start posting warnings, and eventually rejecting new files
when the pool is nearly full. Or maybe, use a pool other than what
the other services use so as to minimize the impact on them.
- The documentation talks about 'tuning', but if this resource is one
of concern, please don't bury it in the footnotes to the appendix.
- Using VDO on top of LVM seems to be the logical way to use
deduplication for a large file set, yet the use cases don't seem to
cover this (unless I misread them)

I have reverted back reluctantly to using ZFS for this function.

Have others had issues with VDO?

David Kurn
Linux amateur

Comments

Re: VDO killed my server

By Yan Li at 09/03/2018 - 14:25

Interesting observation! I'm thinking about trying VDO too.

On 09/03/2018 11:40 AM, david wrote:
USB connections are notorious flimsy. They are prone to randomly
dropping ops and silent intermittent connection breaks, and are known to
cause a lot of hard-to-debug problems when being used with more complex
filesystems, such as ZFS and btrfs. Of course we shouldn't blame USB for
every problem, but I wouldn't be surprised if USB is playing naughty here.

Did these failures to create a file occur only on the file system on VDO
or also on other file system?

How about free memory? What did `free -m` say?

Definitely.

I agree. Tuning should only affect performance, never normal functionality.