Log-dedicated loop device throughput and time overhead on btrfs 4.x


This is again about real world numbers (which I like so much for being authentic ;-). The context being the throughput and time overhead of a loop device, as a poor or late man’s replacement for a real disk partition, jep I know, on copy-on-write filesystem btrfs, exclusively dedicated for logging, that is appending over and over. Why? The loop device may overflow with data without affecting the underlying filesystem, see Btrfs subvolume quota still in its infancy with btrfs version 4.2.2 for more why’s and what I tried to get btrfs subvolume with quota to work. By the way, about that though, see debian org Btrfs for a down-to-earth assessment of btrfs to date, even uttering a recommendation from what version number (4.4) to start off at the earliest near production. Anyway, what follows, adapts a test setup as in Performance of loopback filesystems, prime credits go there, and expands the layout somewhat for the btrfs C or nodatacow flag. Here we go.

Have this baseline, if you like, test on the raw iron. Well, its not raw iron really, its vmware and tons of storage below, and the shown performance is terrible I know and I only take one testset of bs / count but that won’t matter. There’s something to start off.

mkdir /tmp/loop0
dd if=/dev/zero bs=1M of=/tmp/loop0/file oflag=sync count=1000
1048576000 bytes (1.0 GB) copied, 8.9605 s, 117 MB/s
1048576000 bytes (1.0 GB) copied, 6.52867 s, 161 MB/s
1048576000 bytes (1.0 GB) copied, 5.35716 s, 196 MB/s
1048576000 bytes (1.0 GB) copied, 5.48745 s, 191 MB/s
1048576000 bytes (1.0 GB) copied, 5.14736 s, 204 MB/s

Next set up the loop device and perform the same test set. Notice that i use truncate to create the loop device file to save time and storage space as seen with df (in contrast to ls). The resulting numbers dither around the half of the throughput and the double of the time but this is expected behaviour. A loop device is another layer below the files and the driver nedds to do its work.

truncate /tmp/loop1 -s 10G
mkfs -t ext4 /tmp/loop1
mkdir /tmp/loop2
mount -t ext4 -o loop /tmp/loop1 /tmp/loop2
chown $USER /tmp/loop2/

dd if=/dev/zero bs=1M of=/tmp/loop2/file oflag=sync count=1000
1048576000 bytes (1.0 GB) copied, 10.2384 s, 102 MB/s
1048576000 bytes (1.0 GB) copied, 9.4476 s, 111 MB/s
1048576000 bytes (1.0 GB) copied, 12.9802 s, 80.8 MB/s
1048576000 bytes (1.0 GB) copied, 9.85787 s, 106 MB/s
1048576000 bytes (1.0 GB) copied, 9.92259 s, 106 MB/s

With btrfs as the native filesystem, copy-on-write comes into play. In order to raise performance, at the expense of integrity, of course, the doc’s recommend to set the C or nodatacow flag for directories or files. Note that for directories, newly created files within a directory will inherit the flag, for (existing) files, the flag will only affect 0 byte empty files. I just set the stage for both preconditions to be shure. Now the numbers render slightly better but they do not approximate the baseline values, really.

mkdir /tmp/loop3
chattr +C /tmp/loop3
touch /tmp/loop3/loop4
chattr +C /tmp/loop3/loop4
truncate /tmp/loop3/loop4 -s 10G
mkfs -t ext4 /tmp/loop3/loop4
mkdir /tmp/loop5
mount -t ext4 -o loop /tmp/loop3/loop4 /tmp/loop5

dd if=/dev/zero bs=1M of=/tmp/loop5/file oflag=sync count=1000
1048576000 bytes (1.0 GB) copied, 9.83415 s, 107 MB/s
1048576000 bytes (1.0 GB) copied, 8.80162 s, 119 MB/s
1048576000 bytes (1.0 GB) copied, 8.6191 s, 122 MB/s
1048576000 bytes (1.0 GB) copied, 9.54655 s, 110 MB/s
1048576000 bytes (1.0 GB) copied, 9.09187 s, 115 MB/s

What next? Living with that? Two-third of the original performance? But stop, this loop device will only be exclusively dedicated for logging, that is appending over and over, nothing else. Is this another load characteristic that might mitigate the performance penalty? I also suppose, that writing will not sync each and every append, like simulated with oflag=sync above. Why not try to set up another test case like this? At first, let dd do its job without synching explicitely, then synchronize to the filesystem and additionally measure the real, user and sys time (What do real user and sys mean in the output of time) of the operation. Staggered… the resulting real time is about to compare. Obviously, dd performs really fast on the native filesystem (again: /tmp/loop0/file) but the final sync is expensive in terms of real time (i/o waits, I suppose). On the other hand, dd on the loop device (again: /tmp/loop5/file) appears to be much slower but the real time is just the same, a lot of i/o work carried out underway in sys time.

# native then loop
time (dd if=/dev/zero bs=1M of=/tmp/loop0/file count=1000; sync)
1048576000 bytes (1.0 GB) copied, 0.827664 s, 1.3 GB/s
real    0m4.363s
user    0m0.002s
sys     0m1.318s
time (dd if=/dev/zero bs=1M of=/tmp/loop5/file count=1000; sync)
1048576000 bytes (1.0 GB) copied, 4.30142 s, 244 MB/s
real    0m4.724s
user    0m0.001s
sys     0m2.119s

# native then loop
time (dd if=/dev/zero bs=1M of=/tmp/loop0/file count=1000; sync)
1048576000 bytes (1.0 GB) copied, 0.940642 s, 1.1 GB/s
real    0m4.466s
user    0m0.007s
sys     0m1.418s
time (dd if=/dev/zero bs=1M of=/tmp/loop5/file count=1000; sync)
1048576000 bytes (1.0 GB) copied, 4.05028 s, 259 MB/s
real    0m4.616s
user    0m0.004s
sys     0m2.140s

On this note, let’s have a test case, where a log file append will be simulated with the real, user and sys time be measured for different amounts of append calls and different (real-world) log line lengths (this test has even been executed in parallel on the files, taking write concurrency onto the file and the storage into account, see Asynchronous block loop I/O, without much deviation from the following results). Interestingly, as it seems, I can’t imagine that for real but I don’t have the time to run a sufficiant number of tests, the loop device finally even performed better than the native one, who know’s.

# native then loop
time (for i in `seq 1 10000`; do echo "this is my incredible log message from
 outer space" >> /tmp/loop0/file; done)
real    0m0.307s
user    0m0.211s
sys     0m0.096s
time (for i in `seq 1 10000`; do echo "this is my incredible log message from
 outer space" >> /tmp/loop5/file; done)
real    0m0.394s
user    0m0.269s
sys     0m0.125s

# native then loop
time (for i in `seq 1 100000`; do echo "this is my incredible log message from
 outer space, i post it here and there and everywhere" >> /tmp/loop0/file; done)
real    0m4.303s
user    0m3.242s
sys     0m1.062s
time (for i in `seq 1 100000`; do echo "this is my incredible log message from
 outer space, i post it here and there and everywhere" >> /tmp/loop5/file; done)
real    0m4.338s
user    0m3.287s
sys     0m1.052s

# native then loop
time (for i in `seq 1 100000`; do echo "this is my incredible log message from
 outer space, i post it here and there and everywhere - to repeat - this is my
 incredible log message from outer space, i post it here and there and everywhere - to
 repeat - this is my incredible log message from outer space, i post it here and there
 and everywhere - to repeat - this is my incredible log message from outer space, i
 post it here and there and everywhere - to repeat - this is my incredible log message
 from outer space, i post it here and there and everywhere - to repeat - this is my
 incredible log message from outer space, i post it here and there and everywhere" >> /tmp/loop0/file; done)
real    0m13.086s
user    0m11.851s
sys     0m1.227s
time (for i in `seq 1 100000`; do echo "this is my incredible log message from
 outer space, i post it here and there and everywhere - to repeat - this is my
 incredible log message from outer space, i post it here and there and everywhere - to
 repeat - this is my incredible log message from outer space, i post it here and there
 and everywhere - to repeat - this is my incredible log message from outer space, i
 post it here and there and everywhere - to repeat - this is my incredible log message
 from outer space, i post it here and there and everywhere - to repeat - this is my
 incredible log message from outer space, i post it here and there and everywhere" >> /tmp/loop5/file; done)
real    0m12.723s
user    0m11.597s
sys     0m1.123s

I know about current discussions concerning loop device performance, Friends Don’t Let Friends Run Docker on Loopback in Production, but think that a loop device is nevertheless a reasonable option if you just want to size cage a couple of logging directories or files without having an option to add a new disk or want to employ user based quota.

Have fun, Peter

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s