ZFS Guide
If you care about your data, you should use ZFS. Personally, I think this is the best choice for most use cases where data integrity is a factor. If you are building a NAS then you should really use ZFS. A close contender would be BTRFS.
What is ZFS?
ZFS is an advanced, next generation file system. It solves many of the problems with existing filesystems. It combines the features of a filesystem and a volume manager in one piece of software. It is a 128-bit file system so it provides ridiculous, astronomically large limits to FS size ( 256 quadrillion Zettabytes ) and individual file size (16 exbibytes ).
It provides:
- RAID
- snapshots
- integrity checking
- automatic repair
- much more …
Without integrity checking, your data is at risk of being corrupted. ZFS protects data from otherwise undetectable corruption such as bit rot.
Super Tiny ZFS Cheat Sheet
This will bring you from zero to semi-competent ZFS admin in about 30 seconds.
sudo apt update | get latest info from repo |
sudo apt install zfsutils-linux | install |
sudo zpool create pool1 mirror sdc sdd | RAID 1 Mirror |
sudo zpool create pool1 mirror sdb sdc mirror sdd sde | RAID 10 |
sudo zpool status | show status |
sudo sudo zpool replace pool1 sdd sde | replace failed disk |
sudo zpool destroy pool1 | destroy pool |
sudo zpool scrub pool1 | scrub pool |
ZFS Quick Start
WARNING - Read the section on pool scrubbing. It is important.
NOTE - On older versions of Ubuntu you had to use a different package and add an additional repository.
Pools can be created using entire disks, slices / partitions, or files.
TIP - When creating a Zpool you can use “-f” to force if there is an error about no EFI label but potential information in the MBR.
Install on Ubuntu 16.04/18.04
sudo apt update
sudo apt install zfsutils-linux
whereis zfs
Info Commands
Before setting up a pool, you can check what devices are available on your system:
sudo fdisk -l
You can list out existing zpools with the list command and you can check the status of your zpools with the status command.
sudo zpool list
sudo zpool status
Striping - RAID 0
You can create a basic striped pool like this. It will have no redundancy but should provide improved performance and increased space. Note, even though you don’t get redundancy with RAID 0 striping, ZFS still gives you integrity checking. I still wouldn’t do this unless you don’t care about your data.
sudo zpool create new-pool /dev/sdb /dev/sdc # striped
Mirroring - RAID 1
If you want redundancy, you can create a basic 2 disk mirror like this.
sudo zpool create new-pool mirror /dev/sdb /dev/sdc # Mirrored, mounted at /new-pool
If you want a three-way mirror for additional redundancy, you can do that too. You can specify as many disks as you like.
sudo zpool create tank mirror ada1 ada2 ada3 # 3 disks all mirrored together
RAID 10 - Striped Mirror
You can create a RAID 10 device also. This basically strips over two mirrors.
You can create a RAID 10 pool like this:
sudo zpool create example mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde
Vdevs might look like this: “mirror-0” and “mirror-1”.
Note that you can also expand a RAID 1 pool by adding another mirror. This will also result in a RAID 10 pool.
RAID Z
You can also create RAID Z file systems if you like. Here are some examples showing how you would do that for RAID Z1/2/3.
zpool create tank raidz1 ada1 ada2 ada3 # create pool with a RAID Z1 vdev
zpool create tank raidz ada1 ada2 ada3 # also without the 1
zpool create tank raidz2 ada1 ada2 ada3 ada4 ada5 # RAID Z2
zpool create tank raidz3 ada1 ada2 ada3 ada4 ada5 ada6 # RAID Z3
You can read more about what RAID Z in another section further down in this document.
Alternate Mount Point
The default mount point would match the name of the pool. For example a pool called test1 would be mounted at /test1. You can specify an alternate mount point with the -m switch.
sudo zpool create -m /alt-location new-pool mirror /dev/sdb /dev/sdc
Destroy a Pool
WARNING - You will lose all data on a pool when you destroy it.
You can destroy a pool like this:
sudo zpool destroy new-pool
Create a File System
NOTE - each zpool will have a filesystem created by default.
You can create and destroy additional file systems like this:
sudo zfs create test-pool1/dataset1 # create FS
sudo zfs destroy mypool/tmp # destroy an FS
If you create a new FS named “test-pool1/dataset1”, the default mount point will be /test-pool1/dataset1.
Using a File as a Device
dd if=/dev/zero of=example.img bs=1M count=2048
sudo zpool create pool-test /home/user/test1.img /home/user/test2.img
sudo zpool status
Expanding
- zpool add - This adds a vdev to a pool. This generally results in more storage space.
- zpool attach - This attaches a device to a vdev in the pool. This generally results in more redundancy.
Attaching disks:
You can attach a Disk like this. It will add ada4 to whichever vdev has ada1. This results in no capacity increase.
zpool attach tank ada1 ada4
You can detach a disk like this.
zpool detach tank ada4
Adding vdevs:
Adding a RAID Z1 vdev to a pool is done like this. This will increase capacity.
zpool add tank raidz1 ada4 ada5 ada6
Adding a mirrored vdev to a pool is done like this:
zpool add tank mirror ada4 ada5 ada6
zpool add OurFirstZpool ada4 # add disk to pool
Replacing a Failed Drive
We have an entire separate guide for this here:
Things You can Do
Spares:
raid-z only?
zpool add geekpool spare c1t3d0 # adding a spare to a pool
zpool status # shows up under status
zpool autoreplace=on mpool # set auto replace to on
Dry run:
use -n for a dry run when creating a pool
zpool create -n geekpool raidz2 c1t1d0 c1t2d0 c1t3d0
Export / Import Pool
- writes all unwritten data
- removes the pool from the system
zpool export geekpool # export it
zpool list # won't show the pool anymore
zpool export -f geekpool # force if something is mounted
zpool import # show pools that can be imported
zpool import -d / # show pools that use files as devices
zpool import tank1 # import by name
zpool import 940735588853575716 # import by ID
zpool import -d / geekfilepool # import pool with files as devices
zpool import -f geekpool # force
Quotas, reservations
- Quota - FS can’t use more than this amount.
- Reservations
- This much is reserved for this FS. It won’t be available to other FS.
- When a quota is defined first, a reservation can’t be higher than quota.
zfs set quota=500m geekpool/fs1
zfs set reservation=200m geekpool/fs1
zfs list # how much is shown of each?
Set mount point:
zfs set mountpoint=/test geekpool/fs1
df -h |grep /test
Intent Log
ZIL (ZFS Intent Log)
Add a disk to be used for the intent log.
This can speed up writes. Typically you would use a fast disk like an SSD.
sudo zpool add mypool log /dev/sdg -f
ZFS Cache Drives
Cache drives add a layer of caching between RAM and the main storage drives. Typically, you would use a faster SSD for caching and larger mechanical drives for main storage.
You can add a cache drive to a zpool like this.
sudo zpool add mypool cache /dev/sdh -f
Compression
lz4 is considered a good, fast, and safe option.
sudo zfs set compression=on mypool/projects
sudo zfs set compression=gzip-9 mypool
sudo zfs set compression=lz4 mypool
sudo zfs get compressratio # check compresion level
ZFS Snapshots
- read only copy of FS
- saves state of FS at a that point
- can be used to roll back
- can extract files from the snapshot
sudo zfs snapshot -r mypool/projects@snap1 # create snapshot
sudo zfs list -t snapshot # list snapshots
rm -rf /mypool/projects # destroy all files
sudo zfs rollback mypool/projects@snap1 # rollback to snapshot
sudo zfs destroy mypool/projects@snap1
ZFS Clones
- writable copy of FS
- can only be created from a snapshot
- snapshot can’t be destroyed until all clones are destroyed
sudo zfs snapshot -r mypool/projects@snap1
sudo zfs clone mypool/projects@snap1 mypool/projects-clone
ZFS Send and Receive
- send - A snapshot can be streamed to a file or other location.
-
receive - A stream can be received to create a new filesystem
- these are great for backups
Backup a snapshot to a file. Then restore it to a new FS.
sudo zfs snapshot -r mypool/projects@snap2
sudo zfs send mypool/projects@snap2 > ~/projects-snap.zfs
sudo zfs receive -F mypool/projects-copy < ~/projects-snap.zfs
Zip and encrypt:
zfs send mybook/testzone@20100719-1600 | gzip | openssl enc -aes-256-cbc -a -salt > /storage/temp/testzone.gz.ssl
openssl enc -d -aes-256-cbc -a -in /storage/temp/testzone.gz.ssl | gunzip | zfs receive mybook/testzone_new
Backup with SSH:
zfs send mybook/testzone@20100719-1600 | ssh testbox zfs receive sandbox/testzone@20100719-1600
ZFS Ditto Blocks
- more copies of data for additional redundancy
- spread 1/8th of the disk apart for dingle device pools or on another device for multi device pools
sudo zfs set copies=3 mypool/projects
ZFS Deduplication
If two blocks are duplicates, one will be deleted and both references will point to the same block.
- Trade off - saves space, uses up more memory due to in memory deduplication tables
- Approximately 320 bytes of memory are needed per deduplicated block.
- Write performance will decrease as this table grows.
Setting up deduplication is usually not worth it.
sudo zfs set dedup=on mypool/projects
Pool Scrubbing
Scrubbing a pool will run an integrity check on everything in the pool. ZFS checks integrity when data is read. If you aren’t reading your data you won’t be able to detect bit rot. This is why you need to run a scrub on a regular basis.
How often should a ZFS pool be scrubbed? The general rule of thumb is that you should run a scrub once a month at the very minimum. Scrubbing your pools weekly is much better.
sudo zpool scrub mypool
sudo zpool status -v mypool
TIP - Run a scrub as a cronjob.
Testing
dd if=/dev/urandom of=/mypool/random.dat bs=1M count=4096 # populate test data
md5sum /mypool/random.dat
sudo dd if=/dev/zero of=/dev/sde bs=1M count=8192 # simulate a failure
sudo zpool scrub mypool # check for issues
sudo zpool status
sudo zpool detach mypool /dev/sde # remove disk
sudo zpool attach mypool /dev/sdf /dev/sde -f # add it back ( to the same vdev that has sdf )
sudo zpool scrub mypool # scrub again
Reusing Disks
Use zpool labelclear after a zpool destroy if you plan to reuse the disks. — example here —
ZFS RAID Levels
ZFS supports several different options for RAID levels. Each of these has advantages and disadvantages.
RAID Z
This is similar to RAID 5. It is different in that it uses both dynamic stripe width and copy on write functionality to avoid the write hole error. There are three different types of RAID Z, RAID Z1, Z2, and Z3. RAID Z1 is similar to RAID 5, Z2 is similar to RAID 6, and Z3 is similar to RAID 7.
RAID Levels
- RAID 0 - striped
- can’t remove vdevs without destroying the pool
- RAID 1 - mirror
- RAID-Z1
- 3 disk min, only 1 disk can die
- can’t attach to vdev but can add more vdevs to the pool
- RAID-Z2
- 4 disk min, only 2 disk can die
- RAID-Z3
- 5 disk min, only 3 disk can die
- RAID 10
- 4 disk min
- achieved by striping across 2 mirrors
Things You Should Know
Considerations
-
ZFS performs poor with less than 2 GB of RAM
-
WARNING - You still need backups. ZFS and other RAID systems are not a substitute for regular backups. The possibility of an entire Zpool being wiped out is a very real threat. If garbage is written to the filesystem, this can be written to all mirrored disks. A faulty disk controller, corrupted metadata, user errors, or malware can all destroy your filesystems.
- Error Correcting RAM (ECC) is recommended
-
Data deduplication feature consumes a lot memory, use compression instead.
- COW ( copy on write ) - ZFS uses copy on write. This means that when data is copied, it just creates a pointer to the original data until a change is actually made. This makes snapshots very practical.
Terms
- Dataset - filesystem. These are created inside a Zpool.
- vdevs (virtual devices) - Grouping of storage providers into various RAID configurations. These are usually disks, partitions, or files.
- Storage providers – spinning disks or SSDs
- Zpools – Aggregation of vdevs into a single storage pools.
History
Really, Really brief summary:
ZFS was originally created at Sun Microsystems. They made it an open source project. Oracle bought Sun Microsystems and has closed any changes they make to the code. There now exists a project called OpenZFS. This is what most projects are using now.
References