User Tools

Site Tools


Btrfs for root and ZFS for data with ssd cache

We have the following hardware:

  • 1x 512GB nvme
  • 4x 3TB sata
  • 2x 5TB sata

And we want the following:

  • Root on 200GB of nvme in mirror with another sata
  • Bios doesn't boot from nvme, then boot must be copied on all satas
  • UEFI partition also in all satas
  • A fast only nvme partition of 200GB
  • Be able to do backups of important directories (/home, /etc, and others) on external HD, maybe using snapshots
  • Use many of the satas together for a big long storage “data” space with the rest of nvme as cache

How can ZFS help here:

  • ZFS can't be used as /boot directory. Use ext4
  • Use ZIL and SLOG as write cache for nvme for data volume
  • Use L2ARC as read cache for nvme for data volume

We will do this:

  • A btrfs for root, swap and “fast” associated with the nvme alone (no redundancy, but /etc will be gitted)
  • /boot and UEFI partitions in several disks for booting
  • A pool for the rest of the disks using some level of RAIDz1, it will have a small (32GB) SLOG and a bigger(128GB) L2ARC from the nvme disk
  • A pool in the backup disks for backuping using zfs znapshot/replication/clone

Steps followed

  • Wipe nvme disk (use gdisk /dev/nvme0n1 (o command)
  • Create SLOG 32GB partition with bf00 type
  • Create L2ARC 128GB partition with bf00 type
  • Create Btrfs partition on (rest of 512GB disk, leave a bit for ssd discarding) type 8300
  • Create Btrfs filesystem on Btrfs partition:
sudo mkfs.btrfs -L tardis /dev/nvme0n1p3
  • Create /, home, var, fast and “swap” subvolumes (swap is really just a special file)
sudo mkdir /mnt/btrfsbase
sudo mount -t btrfs -o defaults,noatime /dev/nvme0n1p3 /mnt/btrfsbase
sudo btrfs subvolume create /mnt/btrfsbase/activeroot
sudo btrfs subvolume create /mnt/btrfsbase/home
sudo btrfs subvolume create /mnt/btrfsbase/var
sudo btrfs subvolume create /mnt/btrfsbase/fast
sudo btrfs subvolume create /mnt/btrfsbase/swap
sudo mkdir /mnt/newroot/
sudo mount -t btrfs -o defaults,noatime,subvol=activeroot /dev/nvme0n1p3 /mnt/newroot/
  • Install the minimal system:
sudo debootstrap bullseye /mnt/newroot/
  • Configure etc/hostname and etc/hosts correctly
  • Configure etc/apt/sources.list and etc/network/interfaces correctly
  • Bind mount virtual filesystems:
sudo mount --rbind /dev/ /mnt/newroot/dev/
sudo mount --rbind /proc/ /mnt/newroot/proc/
sudo mount --rbind /sys /mnt/newroot/sys/
  • Chroot to the new /
sudo chroot /mnt/newroot
  • Mount the rest of the btrfs filesystem volumes
mount -t btrfs -o defaults,noatime,subvol=home /dev/nvme0n1p3 /home/
mount -t btrfs -o defaults,noatime,subvol=var /dev/nvme0n1p3 /mnt/
cd /var
mv * /mnt/
umount /mnt
mount -t btrfs -o defaults,noatime,subvol=var /dev/nvme0n1p3 /var
mkdir swap
mount -t btrfs -o defaults,noatime,subvol=swap /dev/nvme0n1p3 /swap/
  • Some extra basic configurations:
ln -s /proc/self/mounts /etc/mtab
apt update
apt upgrade
apt install --yes locales console-setup
dpkg-reconfigure locales
dpkg-reconfigure tzdata
apt install --yes dpkg-dev linux-headers-amd64 linux-image-amd64
apt install --yes zfs-initramfs btrfs-progs dosfstools
echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf
  • Prepare /boot and efi/biosgpt partitions from normal sata HD:
mkfs.vfat /dev/sdd2 # efi
mkfs.ext4 /dev/sdd3 # /boot
mount /dev/sdd3 /mnt
cd /boot
mv * /mnt
umount /mnt
mkdir -p /boot/efi #efi
mount /dev/sdd3 /boot
mount /dev/sdd2 /boot/efi # efi
apt install --yes grub-pc # non-efi
apt install --yes grub-efi-amd64 shim-signed # efi
dpkg --purge os-prober
apt install emacs
cp /usr/share/systemd/tmp.mount /etc/systemd/system/
systemctl enable tmp.mount
grub-probe /boot
update-initramfs -c -k all
grub-install /dev/sdd # non-efi
grub-install --target=x86_64-efi --efi-directory=/boot/efi \
  --bootloader-id=debian --recheck --no-floppy # efi
apt install --yes openssh-server
  • Edit /etc/fstab:
LABEL=tardis					/mnt/btrfsbase btrfs defaults,noatime			0 0
LABEL=tardis   					/	       btrfs defaults,noatime,subvol=activeroot	0 0
LABEL=tardis   					/home	      btrfs defaults,noatime,subvol=home	0 0
LABEL=tardis   					/var	      btrfs defaults,noatime,subvol=var		0 0
LABEL=tardis   		      	    		/swap	      btrfs defaults,noatime,subvol=swap	0 0
UUID="11d65735-75d0-44ce-a158-abd01d98b318"	/boot	      ext4  defaults,noatime			0 0

New system

  • Reboot to new system

Create a ZFS pool for main big data!

  • option 1: Stripping
sudo zpool create tardis  /dev/sdb4 /dev/sde4 /dev/sdf4
  • option 2: raidz1 (with 3 new disks in raidz1 configuration of 2xdisk space)
sudo zpool create tardis raidz1 /dev/sdb4 /dev/sde4 /dev/sdf4

* option 3: raidz1 but degraded. (without one disk at the beginning) Solution: Use a sparse file as one of the starting disks [10].
truncate -s 8T tardisfake.img (Use the same exact size of the real disk partition) 
sudo zpool create tardis raidz1 /dev/sde4 /dev/sdf4 /home/memeruiz/tardisfake.img
sudo zpool offline tardisnew /home/memeruiz/tardisfake.img
  • Add log and cache devices:
sudo zpool add tardis log /dev/nvme0n1p1
sudo zpool add tardis cache /dev/nvme0n1p2
  • Create encrypted volume
sudo zfs create -o encryption=on -o keyformat=passphrase tardis/data
  • Create compressed subvolume
sudo zfs create tardis/data/compressed
sudo zfs set compression=on tardis/data/compressed
  • Avoid data to be mounted automatically with zfs:
sudo zfs set mountpoint=legacy tardis/data

How to mount encrypted volume

  • If pool was already activated, deactivate it!
sudo zpool export tardis
  • Reactivate pool with -l option to ask for decryption passphrase:
sudo zpool import tardis -l
  • Mount fs:
sudo mount -t zfs tardis/data /srv
  • Extending a pool:
zpool add tardis /dev/<diskfile>

Some tricks

offline l2arc (cache) or zil (log) devices

zpool offline poolname device
zpool offline -t poolname device (until next reboot)

online l2arc (cache) or zil (log) devices

zpool online poolname device

add l2arc (cache) or zil (log) devices

zpool add poolname cache device #cache
zpool add poolname log device  #log

remove l2arc (cache) or zil (log) devices

zpool remove poolname device

If a pool becomes damaged or some: Permanent error in some files

errors: Permanent errors have been detected in the following files: 

You can fix that be doing a short scrubing and then stopping:

sudo zpool scrub tardis
# wait a bit
sudo zpool scrub -s tardis

Then error messages are gone

In case you want to interchange (replace) a pool disk (no redundancy).

For example: if this disk is getting damaged: you may want to replace it with a new one, some files may be broken, but using:

sudo zpool replace tardis old_dev new_dev

This will copy everthing from the zfs from that disk to the new one, attach the new one and detach the old one. In this way, it is not necessary to copy the whole disk with dd.

Create a zfs pool using files instead of real disks

mkfile 100m file1
zpool create geekpool /file1

Create a ZFS raidz1 en degrated mode

  • Create a sparse file with the same size of your harddrives. Then remove it before putting files to it.


zfs_root_cache.txt · Last modified: 2021/01/31 23:55 (external edit)