Donate-large

If you like this blog, please make a little donation.

It's a secure process, and with your generosity I will be able to review more hardware and software.

Just click the "Donate" button below and follow the easy instructions, and I will thank you eternally.

Monday, February 17, 2014

ZFS on read-only devices

On my last post I described my experiments to get along the best way to archive ROM sets with the biggest space saving.
In a nutshell, ZFS got a best-for-money status, except for the RAM usage.
My idea after the tests was to check if it was possible to mount a ZFS pool recorded in a read-only device, like a CD/DVD/BD.
Why here?
They’re cheap, small, and usually, more durable (quite depends on disc quality, but most of the failures are bitwise not full disc, and this can be solved with error recovery algorithms).
But ZFS was not designed with these discs in mind, so some magic was expected.
All of this is done on Linux. Should be doable on BSD, Solaris and Mac OS X with not many differences.
First of all, I did not test ZFS on packet writing, if it works at all optical drives seeking time and speed will be a hell.
So, first of all, I created a dummy file to act as the writable zpool before recording it to disc, with the following command:
dd if=/dev/zero of=zpool.bin bs=2048 conv=sparse count=<sectors>
<sectors> being size as following:
  • 333,000 for 74-min CD-Rs and CD-RWs
  • 360,000 for 80-min CD-Rs and CD-RWs
  • 2,298,496 for single-layer DVD-Rs
  • 4,171,712 for double-layer DVD-Rs
  • 2,295,104 for single-layer DVD+Rs
  • 4,173,824 for double-layer DVD+Rs
  • 12,219,392 for single-layer Blu-rays
  • 24,438,784 for double-layer Blu-rays
  • 48,878,592 for triple-layer Blu-rays
  • 62,500,864 for quad-layer Blu-rays
Once the dummy file is created you need to create the zpool. While you could create a RAIDZ pool for error recovery, it would require you to have 3 optical drives with 3 discs inserted, at the same time, at least, so I suggest for another solution (I’m still searching one, you can comment your ideas).
For the zpool creation:
zpool create -o comment="Put some disc identification here" -o ashift=11 -o failmode=continue -O mountpoint=/mnt/myzfsdvd -O checksum=sha256 -O compression=gzip-9 -O dedup=on -O atime=off -O devices=off -O exec=off -O setuid=off myzfsdvd zpool.bin
Explaining each options:
  • comment=“Put some disc identification here”, as it’s for a removable disc you need some description of what you’re going to store on it. You can set it up later
  • ashift=11, this means that the underlying block device (that is, the optical disc) uses 2048 bytes/sector
  • failmode=continue, not a good idea for the system to panic or stop when something happens to the optical disc!
  • mountpoint=/mnt/myzfsdvd, good to tell a mountpoint. You can change this on mount, but setting a default prevents security risks
  • checksum=sha256, this is just personal taste
  • compression=gzip-9, optical discs are only written once, so let that writing time be used the best
  • dedup=on, same as above, deduplicate blocks
  • atime=off, no sense to store access time on read-only discs
  • devices=off, to disable support of device nodes
  • exec=off, to prevent executing code from the disc, you may want to do, but I didn’t
  • setuid=off, to prevent setuid
Once this is done, ZFS will automatically mount the pool on the specified mountpoint, so you can start copying things.
You may be tempted to change recordsize (block size) to equal hardware sector for optical discs (2048 bytes), DO NOT DO IT, it will make ZFS be marginally slower, and compression and deduplication work worse.
And for RAM usage on deduplication, don’t worry, with the disc out RAM will get freed, and even using quad-layer Blu-rays, the deduplication table will never be bigger than 1Gb.
Time to unmount the ZFS file so we can record it to disc
zpool export myzfsdvd
Now recording it. If you’re going to record on CD (really? sure? ok, as you wish) be sure that it is recorded single-session, mode 1, data, finished. Following line works for CD, DVD and BD:
cdrecord -v driveropts=burnfree -dao -data zpool.bin
Once it is recorded, you’ll be tempted to mount it, and if you do, you’ll see the failure:
~ # zpool import -a
~ # zpool list
NAME         SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
~ # 
So, where’s the pool? Let’s try it specifying our pool name:
~ # zpool import zfstestdvd
cannot import 'zfstestdvd': no such pool available
~ #

It took me a couple of minutes to find what the problem of that unspecific error was, as it clearly was trying to mount it. It is trying to write!
~ # zpool import -o readonly=on -a -d /dev/
~ # zpool list
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
myzfsdvd  4.34G  1.90G  2.45G    43%  2.09x  ONLINE  -
~ #
I used /dev because sometimes it decided to not mount it from the DVD-ROM drive (/dev/sr0).
Now the pool works perfectly, except for some secondary effects:
  • zpool scrub denies to work, because pool is read-only, instead of checking without correcting.
  • zdb is unable to find the pool at all.

Cold-storage and live-storage of ROM and ISO sets, or "compression vs deduplication"

This weekend I’ve been thinking about cold storage of my software archive, that include some ROM and ISO sets. The idea was to find the better way to store them in Blu-ray recordable discs.

Considering how usually two sets can contain almost the same data (for example “Tomb Raider” for Saturn and PlayStation can contain same audio, movies and maps, differing only on the small executables), or even the same set (all of the “Super Mario World” clones differ only in a small part of the ROM), both solid-mode compression and deduplication offer really high promises.

So, for the purposes of checking how promises behold, I took two systems, Nintendo Game Boy and Nintendo Game Boy Color, along with all the sets I have from them: Cowering (aka GoodTools), TOSEC, no-intro and NonGood.

For compression I used torrent7z, that given the same files (and filenames) will get the same compressed file exactly using 7zip (and, from it, the LZMA algorithm), and GoodMerge, a tool that takes all the clones from the Cowering set and compresses them together (merges) with 7zip at maximum, to get the maximum space savings.

For the deduplication, first I got theoretical estimations using an application I developed specifically for that (DedupStat, you can get it here, GPL, open source) and then created ZFS (a highly complex and feature-able filesystem available on Solaris, FreeBSD, Linux and Mac OS X, but not Windows, at all) pools with both deduplication and compression enabled.

Cowering set contains 7,930 ROMs, TOSEC contains 135 ROMs, no-intro.org contains 2,980 ROMs and NonGood contains 138 ROMs.

They make 11,183 files for a total of 7,598,579,214 bytes (~7,246Mb).

Test computer is an Intel Core 2 E6400 @ 2.13Ghz, Linux 3.9.0-server (Sabayon), 8Gb DDR2-800 RAM, torrent7z 0.9.1beta (7zip 4.65), ZFSonLinux 0.6.2-r3, on a Maxtor/Seagate STM3320820A.

First of all, let’s test compression.

Cowering (GoodMerge7z + torrent7z) + TOSEC (torrent7z) + no-intro (torrent7z)
5,394 files for a total of 1,407,450,696 bytes (~1,342Mb, 18.52%), took 4,813.013 seconds, approx. 0.27887927590956 Mb/sec

Cowering (GoodMerge7z + torrent7z) + TOSEC (torrent7z) + no-intro (torrent7z), deduplicated on 4096 bytes/block (typical filesystem)
5,394 files for a total of 946,782,208 bytes (~902Mb, 12.46%), took 4,972.689598 seconds, approx. 0.269924264109135 Mb/sec

This of course gets us a lot of space savings, compression alone makes the sets be only about 18% of the original size. But making this (only compression speed is tested, but decompression speed is not blazing fast either) takes a lot of time, almost an hour and a half. Deduplication over the compressed sets can still give more savings, because there is repeated data between sets (compression is per-ROM), and in the same set (only Cowering merges clones and hacks).

The next test I did was deduplication estimations only, using my tool.

Cowering + TOSEC + no-intro, deduplicated on 2048 bytes/block (CD/DVD/BD blocksize)
11,186 files for a total of 1,746,548,736 bytes (~1,665Mb, 22.99%), took 480.059208 seconds, approx. 15.0939714919498 Mb/sec

Cowering + TOSEC + no-intro, deduplicated on 4096 bytes/block (typical filesystem blocksize)
11,186 files for a total of 1,871,421,440 bytes (~1,784Mb, 24.63%), took 513.864285 seconds, approx. 14.1009994496893 Mb/sec

Cowering + TOSEC + no-intro, deduplicated on 131072 bytes/block (typical ZFS blocksize)
11,186 files for a total of 3,480,354,816 bytes (~3,319Mb, 45.80%), took 425.701966 seconds, approx. 17.0212979472122 Mb/sec

On the smallest blocksize, more duplicate blocks are found, giving biggest savings, but still quite far from compression. However, in any case, it’s about 10 times faster.

As Cowering contains lots of hacks, clones, and bad dumps, that may not be interesing for some people, I tested both compression and deduplication with the other sets alone.

TOSEC + no-intro, uncompressed
3,256 files for a total of 2,685,937,612 bytes (~2561Mb)

TOSEC (torrent7z) + no-intro (torrent7z)
3,256 files for a total of 795,382,502 bytes (~758Mb, 29.61%)

TOSEC + no-intro, deduplicated on 2048 bytes/block (CD/DVD/BD)
3,256 files for a total of 1,412,812,800 bytes (~1,347Mb, 52.60%), took 181.490064 seconds, approx. 14.1109653253525 Mb/sec

TOSEC + no-intro, deduplicated on 4096 bytes/block (typical filesystem)
3,256 files for a total of 1,491,755,008 bytes (~1,422Mb, 55.54%), took 168.831435 seconds, approx. 15.1689760855258 Mb/sec

TOSEC + no-intro, deduplicated on 131072 bytes/block (typical ZFS)
3,256 files for a total of 2,127,953,920 bytes (~2,029Mb, 79.23%), took 248.102553 seconds, approx. 10.3223444057023 Mb/sec

Things seem worse here for deduplication, being twice as big as compressed sets.

But all of this, is highly theoretical, so better test with a real life scenario. I created three ZFS pools, all of them with deduplication and compression enabled, for the three block sizes (2048 as in CD/DVD/BD, 4096 as typical by other filesystems and real sector size of Advanced Format hard disks and 131072, default ZFS blocksize). Note: ZFS calls its block size “recordsize”, and dynamically uses a bigger or smaller than the configured one, depending on the stored data.

Cowering + TOSEC + no-intro, ZFS device, dedup=on, compression=gzip-9, recordsize=2048
11,188 files for a total of 1,904,936,848 bytes (~1,817Mb, 25.07%), took 10,986.604 seconds, approx. 0.659580853138233 Mb/sec

Cowering + TOSEC + no-intro, ZFS device, dedup=on, compression=gzip-9, recordsize=4096
11,188 files for a total of 1,515,458,176 bytes (~1,445Mb, 19.94%), took 8,865.023 seconds, approx. 0.817432017876539 Mb/sec

Cowering + TOSEC + no-intro, ZFS device, dedup=on, compression=gzip-9, recordsize=131072
11,188 files for a total of 1,493,678,664 bytes (~1,424Mb, 19.66%), took 1,185.156 seconds, approx. 6.114430201097515 Mb/sec

While compression is called gzip, it should really be called deflate, as that’s the name of the algorithm, used in ZIP files, and theoretically, marginally faster and less powerful than LZMA (the 7zip algorithm). “-9” marks maximum compression.

Clearly ZFS does not behave very well when the recordsize is made smaller than the default, getting worse deduplication, worse compression, and marginally slower speeds (8 to 10 times slower).

Before achieving a conclusion, a thing about RAM usage must be noted. 7-zip RAM usage depends on dictionary size (and so, compression power), and only happens once (on compression). On decompression, RAM usage is marginally lower. On the contrary, deduplication ram usage depends on total blocks (unique and duplicate, with duplicate counting only as 1 block), and its usage is more or less permanent as long as the deduplicated volume is attached.

In case of ZFS, each block takes 320 bytes of RAM. For this set, meaning:

3710244 blocks of 2048 bytes each, taking 1Gb of RAM.

1855122 blocks of 4096 bytes each, taking 566Mb of RAM.

57972 blocks of 131072 bytes each, taking 18Mb of RAM.

Fortunately you can use a SSD to store the deduplication table, so instead of 1Gb/566Mb/18Mb of RAM it would be 1Gb/566Mb/18Mb of SSD.

More than a conclusion, a list of advantages and disadvantages should be noted:

Advantages of 7zip

  • Biggest space savings
  • RAM usage is not permanent, only when working with the archives

Disadvantages of 7zip

  • Slow as floppy drives
  • Merged archives need to be traversed and uncompressed fully before accessing one of the ROMs it contains
  • Not supported by any emulator afaik, and not completely supported by any ROM manager (Romcenter does not support them, Romulus says it does but it does not work, CLRMAME Pro does if not solid-compressed so defeating merging benefits)


Advantages of deduplication

  • Faster
  • Depending on deduplication software used (if it provides a live volume) it can allow ROMs to be accessed by emulators and managers

Disadvantages of deduplication

  • RAM usage MAY be permanent, depending on deduplication software

 

Advantages of ZFS with deduplication and compression enabled

  • Best of both worlds, space savings almost as high as 7zip and speeds almost as fast as deduplication alone
  • Supported in practically every operating system (but Windows)
  • Allows live access, so supported by emulators and ROM managers
  • RAM usage can be moved to SSD, that’s several times cheaper and bigger.

Disadvantages of ZFS with deduplication and compression enabled

  • Contrary to what deduplication alone can give, smaller blocks give less space savings and speed, and higher RAM and CPU usage.
  • Not supported at all by Windows
  • RAM usage is permanently 320 bytes per block

Comparing advantages vs disadvantages, ZFS seems the best solution for archival right now. While other filesystems may get same features in the future (btrfs for example), currently they don’t support them or are experimental or vendor-locked (btrfs is Linux-only)