ZFS Replace Disk
Eventually, disks fail. ZFS was designed with this in mind. The ZFS replace disk steps are pretty easy. Just be careful and make sure you are operating on the correct device.
NOTE - Most of these commands need to be run as root. We are going to assume you are just logged in as root. You may also just want to prepend “sudo “ to each command. Depending on your environment, that might be the right way to do things.
NOTE - All device names and pool names used here are example names ( ex: ada5, c1t1d0, sde3 ). You will probably need to change them to match what is in your system. Don’t blindly follow my instructions. Pay attention to what you are doing.
Transient Errors
If an error is not likely to affect the health of the disk in the future, it is considered transient. If it is likely to effect disk health in the future it can be considered persistent. Learn more about these in This Oracle ZFS Doc. Transient errors can just be cleared using the following:
zpool clear tank c1t1d0
This can help to avoid replacing a disk when it isn’t necessary.
ZFS Replace Disk Steps
This is based on my testing using a VM and a physical host with Ubuntu.
Step 0 - Check The Pool
Start by verifing that a disk is bad and that it needs to be replaced. You can do this with the zpool status command as shown here.
zpool status
Step 1 - Add a New Disk
- add a new disk
- optionally remove the old disk
The first thing to do is to add a new replacement disk. If you have the space and connectors in your system, you may be able to do this. If you don’t want both the new and old drive in your system at the same time you can also choose to remove the old drive at this point (only if it is a redundant setup). You may choose to remove it now for several reasons. You may only have one place to connect the disk and need to swap it directly with the old disk. You may just want to get it done sooner to reduce the number of steps.
more sub steps here …
Step 2 - Replace It
Here you specify the old device followed by the new device. If you have a redundant configuration data will be copied from other good disks to the new disk. If it isn’t redundant, data will be copied from the old device to the new device. This command should also detach the old drive for you. Once that is done you will be able to remove it physically.
zpool replace tank c1t1d0 c2t0d0
zpool offline tank c1t1d0
zpool remove tank c1t1d0
If the old disk is already removed from the system and a new device has replaced it with the same device name, you can use the following command instead:
This didn’t work:
zpool replace tank c1t1d0
This worked:
zpool offline pool1 sdd
zpool remove pool1 sdd
zpool attach -f pool1 sdc sdd
Step 3 - Wait For Resilvering to Complete
Before your pool will be back to normal it will need to sync data over to your new disk. It will remain in a degraded status while the data syncs. This data syncing process is called resilvering. It may take a very long time depending on the size of your disks and on how much data you have. You can watch as it progresses with the following command.
zpool status mypool1
Step 4 - Physically Remove the Old Drive
At this point you can physically remove the old drive. If it is hot swappable you should be able to just pull it out. Otherwise, you will want to schedule a time to shutdown your system. If you already removed this in step one then just skip this step.
Potential Issues
Some people have had issues trying to specify devices by ID. If the bad disk has already been removed from the system you might not be able to specify it by ID. If this is the case you can try specifying it by device name or by GUID. You can find these with the following commands:
zdb # find GUID
zdb -l /dev/sda1 # in case just 'zdb' didn't work
zpool status -g # find GUID
zpool status -L # find device name, resolving links
If zdb doesn’t output anything you can try specifying the device:
Thoughts
Depending on your level of redundancy it can be more than a little concerning to have a failed disk. Once you get the disk replaced and your pool is resilvered you can feel good about being in the clear. It always feels good to have a working zpool. Knowing that your drives are fully redundant is a nice feeling. Also, knowing exactly how to replace a disk gives you a greater sense of security as well.
References
- Really Useful Information
- ZFS Replace Disk Issues
- Oracle Docs 1
- Oracle Docs 2
- Oracle Docs 3
- Oracle Docs 4