Change from zfs to mdadm and increase RAID5-Size

I’ve created via zfs a RAIDZ, which is roughly something like a RAID5 in the traditional way. It contained three disks and as time goes by, the resulting array (roughly double the size of a single disk) was nearly full.

I got an identical vendor and size disk like the first three disks and added it to the case. But the RAIDZ-expansion-feature is at the time of this writing still not completed and far from being included in Ubuntu releases.

So I came up with a plan to get a bigger array without loosing any data and without backing up all the data to an external system (I simply didn’t have *that* many spare disks).

mdadm has a „–grow“-feature, so I had to copy everything to a mdadm-Raid, but without any additional disks.

The plan looked like this:

Initial 3 disks in RAIDZ/zfs, set one as offline

Use the offline disk and the recently added new disk to a new mdadm (RAID-5), degraded from the start.

Copy everything from RAIDZ to mdadm.

Destroy the remaining zfspool and add the remaining two disks to mdadm, growing, reshaping and resyncing all in one big step.

The only possible setback would be a drive failure on ZFS after degrading the ZFS or a drive failure in the first two disks of the new mdadm. – This would always be a full disaster. So – fingers crossed – and everything worked fine. It took a few days (copy and resyncing is slow), but it finally worked.

To make sure that I don’t mess up, I created a demo-script to test the several steps and whether my idea worked at all.

It comes in two steps, the first one works until the start of resyncing the mdadm. The second one should be started AFTER the resyncing has finished. In the small demo files I used this happend very fast, in reality this could take days.

#!/bin/bash
mkdir -p /test
cd /test # Separates Verzeichnis
cd test/
rm -f 1.disk
rm -f 2.disk
rm -f 3.disk
rm -f 4.disk
losetup -D
umount /test/mnt
mdadm --stop /dev/md0
mdadm --remove /dev/md0
rm -f /test/backupfile.mdadm

echo "##### Creating images"
dd if=/dev/zero of=1.disk bs=1M count=256
dd if=/dev/zero of=2.disk bs=1M count=256
dd if=/dev/zero of=3.disk bs=1M count=256
dd if=/dev/zero of=4.disk bs=1M count=256
DISK1=$(losetup --find --show ./1.disk)
DISK2=$(losetup --find --show ./2.disk)
DISK3=$(losetup --find --show ./3.disk)
DISK4=$(losetup --find --show ./4.disk)
parted ./1.disk mklabel gpt
parted ./2.disk mklabel gpt
parted ./3.disk mklabel gpt
parted ./4.disk mklabel gpt
parted -a optimal -- ./1.disk mkpart primary 0% 100%
parted -a optimal -- ./2.disk mkpart primary 0% 100%
parted -a optimal -- ./3.disk mkpart primary 0% 100%
parted -a optimal -- ./4.disk mkpart primary 0% 100%

echo "##### Starting zfs pool on disk 1, 2, 3"
zpool create origtank raidz ${DISK1} ${DISK2} ${DISK3}

echo "##### zpool status"
zpool status -v origtank

echo "##### Creating test file on /origtank"
dd if=/dev/zero of=/origtank/data bs=1M count=300

echo "##### Setting third disk as faulty"
zpool offline origtank ${DISK3}

echo "##### zpool status"
zpool status -v origtank

echo "##### ls -lA /origtank; df -h /origtank"
ls -lA /origtank; df -h /origtank

echo "##### Creating new md0 from disk3 and disk4"
#parted -s ./3.disk mklabel gpt
#parted -s ./4.disk mklabel gpt
#parted -s -a optimal -- ./3.disk mkpart primary 0% 100%
#parted -s -a optimal -- ./4.disk mkpart primary 0% 100%
wipefs -a ${DISK3}
wipefs -a ${DISK4}
parted -s ${DISK3} set 1 raid on 
parted -s ${DISK4} set 1 raid on 
mdadm --create /dev/md0 -f --auto md --level=5 --raid-devices=3 ${DISK3} ${DISK4} missing

echo "##### mdstat"
cat /proc/mdstat
mdadm --detail /dev/md0

echo "## # ## Formatting /dev/md0"
sleep 2
mkfs.ext4 /dev/md0

echo "##### Mount md0"
mkdir /test/mnt
mount /dev/md0 /test/mnt

echo "##### ls -lA /test/mnt; df -h /test/mnt"
ls -lA /test/mnt; df -h /test/mnt

echo "## # ## Copy data"
sleep 2
# rsync --delete -avPH /origtank/ /test/mnt
rsync -avPH /origtank/ /test/mnt

echo "##### ls -lA /test/mnt; df -h /test/mnt"
ls -lA /test/mnt; df -h /test/mnt

echo "##### Creating NEW test file on /origtank"
dd if=/dev/zero of=/origtank/dataNEW bs=1M count=30

echo "## # ## Copy NEW data"
sleep 2
# rsync --delete -avPH /origtank/ /test/mnt
rsync -avPH /origtank/ /test/mnt

echo "##### ls -lA /test/mnt; df -h /test/mnt"
ls -lA /test/mnt; df -h /test/mnt

echo "## # ## destroying pool"
sleep 2
zpool destroy origtank

echo "## # ## Adding disks to md0"
sleep 2
mdadm --add /dev/md0 ${DISK1} ${DISK2}
mdadm --grow --raid-devices=4 /dev/md0 --backup-file=/test/backupfile.mdadm
cat /proc/mdstat

After the successful resync of the mdadm, you can resize the filesystem:

resize2fs /dev/md0
cat /proc/mdstat

This only takes a few minutes (even on very huge disks), but be patient! You can see the progress by looking at mdadm –detail.

echo "##### mdstat"
cat /proc/mdstat
mdadm --detail /dev/md0

echo "##### ls -lA /test/mnt; df -h /test/mnt"
ls -lA /test/mnt; df -h /test/mnt