Table of Contents
Btrfs for root and ZFS for data with ssd cache
We have the following hardware:
- 1x 512GB nvme
- 4x 3TB sata
- 2x 5TB sata
And we want the following:
- Root on 200GB of nvme in mirror with another sata
- Bios doesn't boot from nvme, then boot must be copied on all satas
- UEFI partition also in all satas
- A fast only nvme partition of 200GB
- Be able to do backups of important directories (/home, /etc, and others) on external HD, maybe using snapshots
- Use many of the satas together for a big long storage “data” space with the rest of nvme as cache
How can ZFS help here:
- ZFS can't be used as /boot directory. Use ext4
- Use ZIL and SLOG as write cache for nvme for data volume
- Use L2ARC as read cache for nvme for data volume
We will do this:
- A btrfs for root, swap and “fast” associated with the nvme alone (no redundancy, but /etc will be gitted)
- /boot and UEFI partitions in several disks for booting
- A pool for the rest of the disks using some level of RAIDz1, it will have a small (32GB) SLOG and a bigger(128GB) L2ARC from the nvme disk
- A pool in the backup disks for backuping using zfs znapshot/replication/clone
Steps followed
- Wipe nvme disk (use gdisk /dev/nvme0n1 (o command)
- Create SLOG 32GB partition with bf00 type
- Create L2ARC 128GB partition with bf00 type
- Create Btrfs partition on (rest of 512GB disk, leave a bit for ssd discarding) type 8300
- Create Btrfs filesystem on Btrfs partition:
sudo mkfs.btrfs -L tardis /dev/nvme0n1p3
- Create /, home, var, fast and “swap” subvolumes (swap is really just a special file)
sudo mkdir /mnt/btrfsbase sudo mount -t btrfs -o defaults,noatime /dev/nvme0n1p3 /mnt/btrfsbase sudo btrfs subvolume create /mnt/btrfsbase/activeroot sudo btrfs subvolume create /mnt/btrfsbase/home sudo btrfs subvolume create /mnt/btrfsbase/var sudo btrfs subvolume create /mnt/btrfsbase/fast sudo btrfs subvolume create /mnt/btrfsbase/swap sudo mkdir /mnt/newroot/ sudo mount -t btrfs -o defaults,noatime,subvol=activeroot /dev/nvme0n1p3 /mnt/newroot/
- Install the minimal system:
sudo debootstrap bullseye /mnt/newroot/
- Configure etc/hostname and etc/hosts correctly
- Configure etc/apt/sources.list and etc/network/interfaces correctly
- Bind mount virtual filesystems:
sudo mount --rbind /dev/ /mnt/newroot/dev/ sudo mount --rbind /proc/ /mnt/newroot/proc/ sudo mount --rbind /sys /mnt/newroot/sys/
- Chroot to the new /
sudo chroot /mnt/newroot
- Mount the rest of the btrfs filesystem volumes
mount -t btrfs -o defaults,noatime,subvol=home /dev/nvme0n1p3 /home/ mount -t btrfs -o defaults,noatime,subvol=var /dev/nvme0n1p3 /mnt/ cd /var mv * /mnt/ umount /mnt mount -t btrfs -o defaults,noatime,subvol=var /dev/nvme0n1p3 /var mkdir swap mount -t btrfs -o defaults,noatime,subvol=swap /dev/nvme0n1p3 /swap/
- Some extra basic configurations:
ln -s /proc/self/mounts /etc/mtab apt update apt upgrade apt install --yes locales console-setup dpkg-reconfigure locales dpkg-reconfigure tzdata apt install --yes dpkg-dev linux-headers-amd64 linux-image-amd64 apt install --yes zfs-initramfs btrfs-progs dosfstools echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf
- Prepare /boot and efi/biosgpt partitions from normal sata HD:
mkfs.vfat /dev/sdd2 # efi mkfs.ext4 /dev/sdd3 # /boot mount /dev/sdd3 /mnt cd /boot mv * /mnt umount /mnt mkdir -p /boot/efi #efi mount /dev/sdd3 /boot mount /dev/sdd2 /boot/efi # efi apt install --yes grub-pc # non-efi apt install --yes grub-efi-amd64 shim-signed # efi dpkg --purge os-prober apt install emacs cp /usr/share/systemd/tmp.mount /etc/systemd/system/ systemctl enable tmp.mount grub-probe /boot update-initramfs -c -k all update-grub grub-install /dev/sdd # non-efi grub-install --target=x86_64-efi --efi-directory=/boot/efi \ --bootloader-id=debian --recheck --no-floppy # efi apt install --yes openssh-server
- Edit /etc/fstab:
LABEL=tardis /mnt/btrfsbase btrfs defaults,noatime 0 0 LABEL=tardis / btrfs defaults,noatime,subvol=activeroot 0 0 LABEL=tardis /home btrfs defaults,noatime,subvol=home 0 0 LABEL=tardis /var btrfs defaults,noatime,subvol=var 0 0 LABEL=tardis /swap btrfs defaults,noatime,subvol=swap 0 0 UUID="11d65735-75d0-44ce-a158-abd01d98b318" /boot ext4 defaults,noatime 0 0
New system
- Reboot to new system
Create a ZFS pool for main big data!
- option 1: Stripping
sudo zpool create tardis /dev/sdb4 /dev/sde4 /dev/sdf4
- option 2: raidz1 (with 3 new disks in raidz1 configuration of 2xdisk space)
sudo zpool create tardis raidz1 /dev/sdb4 /dev/sde4 /dev/sdf4 * option 3: raidz1 but degraded. (without one disk at the beginning) Solution: Use a sparse file as one of the starting disks [10].
truncate -s 8T tardisfake.img (Use the same exact size of the real disk partition) sudo zpool create tardis raidz1 /dev/sde4 /dev/sdf4 /home/memeruiz/tardisfake.img sudo zpool offline tardisnew /home/memeruiz/tardisfake.img
- Add log and cache devices:
sudo zpool add tardis log /dev/nvme0n1p1 sudo zpool add tardis cache /dev/nvme0n1p2
- Create encrypted volume
sudo zfs create -o encryption=on -o keyformat=passphrase tardis/data
- Create compressed subvolume
sudo zfs create tardis/data/compressed sudo zfs set compression=on tardis/data/compressed
- Avoid data to be mounted automatically with zfs:
sudo zfs set mountpoint=legacy tardis/data
How to mount encrypted volume
- If pool was already activated, deactivate it!
sudo zpool export tardis
- Reactivate pool with -l option to ask for decryption passphrase:
sudo zpool import tardis -l
- Mount fs:
sudo mount -t zfs tardis/data /srv
- Extending a pool:
zpool add tardis /dev/<diskfile>
Some tricks
offline l2arc (cache) or zil (log) devices
zpool offline poolname device zpool offline -t poolname device (until next reboot)
online l2arc (cache) or zil (log) devices
zpool online poolname device
add l2arc (cache) or zil (log) devices
zpool add poolname cache device #cache zpool add poolname log device #log
remove l2arc (cache) or zil (log) devices
zpool remove poolname device
If a pool becomes damaged or some: Permanent error in some files
errors: Permanent errors have been detected in the following files:
You can fix that be doing a short scrubing and then stopping:
sudo zpool scrub tardis # wait a bit sudo zpool scrub -s tardis
Then error messages are gone
In case you want to interchange (replace) a pool disk (no redundancy).
For example: if this disk is getting damaged: you may want to replace it with a new one, some files may be broken, but using:
sudo zpool replace tardis old_dev new_dev
This will copy everthing from the zfs from that disk to the new one, attach the new one and detach the old one. In this way, it is not necessary to copy the whole disk with dd.
Create a zfs pool using files instead of real disks
mkfile 100m file1 zpool create geekpool /file1
Create a ZFS raidz1 en degrated mode
- Create a sparse file with the same size of your harddrives. Then remove it before putting files to it.
https://www.reddit.com/r/zfs/comments/8enkt8/can_i_create_a_new_raidz_in_degraded_mode/
References
https://linuxhint.com/configuring-zfs-cache/
https://www.howtoforge.com/tutorial/how-to-use-snapshots-clones-and-replication-in-zfs-on-linux/
https://wiki.archlinux.org/index.php/btrfs#Subvolumes
https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Subvolumes
https://wiki.gentoo.org/wiki/Btrfs/Native_System_Root_Guide
https://mohankumar-k.blogspot.com/2018/05/how-to-resolve-zpool-permanent-errors.html
https://docs.oracle.com/cd/E36784_01/html/E36835/gbbvf.html
https://www.thegeekdiary.com/zfs-tutorials-creating-zfs-pools-and-file-systems/
https://www.delphix.com/blog/delphix-engineering/openzfs-code-walk-metaslabs-and-space-maps
[10] https://www.mail-archive.com/zfs-discuss@opensolaris.org/msg22993.html
https://unix.stackexchange.com/questions/322352/create-raid-z2-in-degraded-state-possible