OpenZFS on Arch Linux: A Practical How-To

OpenZFS on Arch Linux: A Practical How-To

Introduction

OpenZFS (often simply called "ZFS") has long been valued for its robust data integrity, flexible storage configurations, and high-performance caching. While Arch Linux doesn't provide ZFS packages in its official repositories due to licensing conflicts, the community has made it possible to install and maintain ZFS through the ArchZFS project.

In this blog post, we'll walk through:

  • Installing OpenZFS on Arch Linux - You can also use all example commands on any ZFS enabled workstation or server.
  • Creating pools with mirrors, stripes, and RAID-Z variations
  • Adding SLOG (Separate Intent Log) and cache (L2ARC)
  • Managing devices (offline, online, resilvering)
  • Migrating datasets with zfs send, and zfs receive
  • Best practices for day-to-day administration and maintenance

1. Installing OpenZFS on Arch Linux

1.1 Enable the ArchZFS Repository

Because Arch Linux does not distribute ZFS packages directly, you’ll need to add the archzfs repository to your pacman.conf.

# First, edit pacman.conf
sudo vim /etc/pacman.conf

# Add the following lines at the bottom: escape,i

[archzfs]
Server = https://archzfs.com/$repo/x86_64

# Save and exit. escape escape :x! to save and write the file.

1.2 Synchronize and Install

After adding the repo:

sudo pacman -Sy
sudo pacman -S zfs-linux zfs-utils

Depending on your kernel variant (e.g., linux-lts), install matching ZFS packages:

sudo pacman -S zfs-linux-lts zfs-utils-lts

1.3 Enable ZFS Services

ArchZFS provides systemd services to automatically import and mount your pools at startup:

sudo systemctl enable zfs-import-cache.service
sudo systemctl enable zfs-mount.service
sudo systemctl enable zfs-import.target
sudo systemctl enable zfs.target

Optional - Reboot once installation is complete:

sudo reboot

2. Pool Creation Basics

ZFS organizes storage into pools (zpools). Each pool is composed of one or more vdevs (virtual devices), which can be:

  • Single disk (a vdev using one disk)
  • Mirror (Mirror formed of one or multiple pairs of disks)
  • RAID-Z variants (RAID-Z1, RAID-Z2, RAID-Z3)
  • Striped (multiple disks combined for capacity and performance, but no redundancy)

2.1 Basic Terminology

  • Stripe (RAID-0(Zero)): No redundancy; data is written across multiple devices for speed, but if one disk fails, the pool fails.
  • Mirror: Each block is duplicated across two or more disks, offering protection from disk failures, superior read performance, but higher disk usage.
  • RAID-Z1/2/3: Similar to RAID-5/6 concepts but with ZFS improvements, allowing the pool to tolerate 1, 2, or 3 disks failing.

2.2 Creating a Striped Pool

Useful for testing or for data you don’t mind losing if a disk fails:

We should always use /dev/disk/ by-id by-label by-path or by-uuid to find our disks so they do not change on BIOS or EFI changes, we shall use by-id

smalley@demoa:~$ ls /dev/disk/by-id/
scsi-3600224806e10a63be9260b6b6048cab1  scsi-360022480f2e2ed849175d59f4ae8a49a  wwn-0x600224808c00ff5aa6788569077f7e16
scsi-36002248079f9f66f426ea82fb0957801  wwn-0x600224806e10a63be9260b6b6048cab1  wwn-0x60022480f2e2ed849175d59f4ae8a49a
scsi-36002248085f7a4ffce559da2bfab1561  wwn-0x6002248079f9f66f426ea82fb0957801
scsi-3600224808c00ff5aa6788569077f7e16  wwn-0x6002248085f7a4ffce559da2bfab1561
# Example: stripe using two disks
smalley@demoa:~$ sudo zpool create mypool /dev/disk/by-id/scsi-3600224806e10a63be9260b6b6048cab1 /dev/disk/by-id/scsi-36002248085f7a4ffce559da2bfab1561

2.3 Creating a Mirrored Pool

A two-disk mirror:

smalley@demoa:~$ sudo zpool create mymirror mirror  /dev/disk/by-id/scsi-3600224806e10a63be9260b6b6048cab1 /dev/disk/by-id/scsi-36002248085f7a4ffce559da2bfab1561

You can also create three-disk or four-disk mirrors by listing more devices.

2.4 Creating a RAID-Z Pool

For RAID-Z1 with three disks:

smalley@demoa:~$ sudo zpool create myraidz raidz /dev/disk/by-id/scsi-3600224806e10a63be9260b6b6048cab1 /dev/disk/by-id/scsi-36002248085f7a4ffce559da2bfab1561 /dev/disk/by-id/scsi-3600224808c00ff5aa6788569077f7e16

For RAID-Z2 or Z3, simply specify raidz2 or raidz3 and include enough disks:

smalley@demoa:~$ sudo zpool create myraidz2 raidz2 /dev/disk/by-id/scsi-3600224806e10a63be9260b6b6048cab1 /dev/disk/by-id/scsi-36002248085f7a4ffce559da2bfab1561 /dev/disk/by-id/scsi-3600224808c00ff5aa6788569077f7e16 /dev/disk/by-id/scsi-36002248079f9f66f426ea82fb0957801 

3. Advanced Configurations: Combining Mirrors and Stripes

ZFS also supports mixing mirrored vdevs in a larger stripe.

3.1 Multiple Mirrored Vdevs in One Pool

Suppose you have 4 disks and want two mirrored pairs:

smalley@demoa:~$ sudo zpool create bigpool \
  mirror dev/disk/by-id/scsi-3600224806e10a63be9260b6b6048cab1 /dev/disk/by-id/scsi-36002248085f7a4ffce559da2bfab1561 \
  mirror /dev/disk/by-id/scsi-3600224808c00ff5aa6788569077f7e16 /dev/disk/by-id/scsi-36002248079f9f66f426ea82fb0957801 

This effectively stripes the two mirrored vdevs. You can add more mirror pairs later to expand the pool.


4. Adding SLOG and Cache (L2ARC)

4.1 Separate Intent Log (SLOG)

The ZFS Intent Log (ZIL) handles synchronous writes. By default, the ZIL resides on your main pool. A dedicated SLOG device (e.g., an SSD) can speed up these writes. Note that an SLOG device is not a write cache; it just helps confirm writes quickly.

# Add an SSD (e.g. /dev/nvme0n1 but use the dev disk by-id format) as an SLOG device
smalley@demoa:~$ sudo zpool add mypool log /dev/disk/by-id/scsi-360022480f2e2ed849175d59f4ae8a49a

If the SSD fails, the pool remains intact, but the latest in-flight synchronous writes could be lost if not mirrored. For critical data, consider mirroring the SLOG:

smalley@demoa:~$ sudo zpool add mypool log mirror /dev/disk/by-id/scsi-36002248079f9f66f426ea82fb0957801 /dev/disk/by-id/scsi-36002248079f9f66f426ea89fb0957206

4.2 Level 2 ARC (L2ARC)

ZFS’s primary read cache (ARC) lives in RAM. L2ARC is a secondary cache placed on faster storage (SSD/NVMe) to hold data evicted from the ARC.

# Add an SSD as L2ARC
smalley@demoa:~$ sudo zpool add mypool cache /dev/disk/by-id/scsi-36002348079f9f66f426ea89fb0957209

Data is still on the main pool; if the L2ARC device fails, no data is lost. L2ARC is best used if you have enough RAM to track the metadata for the cached blocks.


5. Disk Management: Offline, Online, and Replacement

5.1 Taking a Disk Offline

If a disk starts failing or you need to remove it:

smalley@demoa:~$ sudo zpool offline mypool /dev/disk/by-id/scsi-3600224806e10a63be9260b6b6048cab1

The pool will show a degraded state. Data remains accessible if sufficient redundancy exists.

5.2 Bringing a Disk Online

After maintenance or replacement, you can bring it back:

smalley@demoa:~$ sudo zpool online mypool /dev/disk/by-id/scsi-3600224806e10a63be9260b6b6048cab1

5.3 Replacing a Failed Disk

If /dev/sda fails, replace it with /dev/sdz:

smalley@demoa:~$ sudo zpool replace mypool /dev/disk/by-id/scsi-3600224806e10a63be9260b6b6048cab1 /dev/disk/by-id/scsi-3600224808c00ff5aa6788569077f7e16

ZFS then resilvers the new disk.


6. Resilvering and RAID-Z Benefits

6.1 Resilvering Explained

Resilvering is ZFS’s process of rebuilding a disk’s data (in mirrors or RAID-Z) after a replacement or reintroduction. It checks blocks on all other disks, recomputes missing data or parity, and writes to the new disk.

smalley@demoa:~$ sudo zpool status mypool

This command shows resilver progress and estimates.

6.2 RAID-Z Level Benefits

  • RAID-Z1: Tolerates 1 disk failure
  • RAID-Z2: Tolerates 2 disk failures
  • RAID-Z3: Tolerates 3 disk failures

Larger arrays often benefit from higher redundancy (Z2 or Z3). You must balance capacity, performance, and fault tolerance.


7. Migrating Data with zfs send and zfs receive

7.1 Creating Snapshots

smalley@demoa:~$ sudo zfs snapshot mypool/mydataset@snap1

7.2 Sending the Snapshot

smalley@demoa:~$ sudo zfs send mypool/mydataset@snap1 > mydataset_snap1.zfs

You can pipe directly to another system:

smalley@demoa:~$ sudo zfs send mypool/mydataset@snap1 | ssh user@otherserver "zfs receive backup/mydataset"

7.3 Incremental Sends

Send only differences between snapshots:

smalley@demoa:~$ sudo zfs send -i mypool/mydataset@snap1 mypool/mydataset@snap2 | ssh user@otherserver "zfs receive backup/mydataset"

8. Additional Administration and Management Tips

  1. Scrubs and Health Checks
    • Use zpool scrub mypool regularly to detect and correct data issues.
    • Check status: zpool status mypool
  2. Compression and De-duplication
    • De-duplication is resource-heavy—only enable it if you have ample RAM.
  3. Encryption
    • OpenZFS supports native encryption on some platforms. Usage varies by distro.
  4. Snapshots and Rollbacks
    • Create snapshots frequently for backups.
  5. Backing Up Configuration
    • Keep a copy of /etc/pacman.conf and any ArchZFS config changes.

Regularly export your pool cache:

smalley@demoa:~$ sudo zpool set cachefile=/etc/zfs/zpool.cache mypool

Roll back if necessary:

smalley@demoa:~$ sudo zfs rollback mypool/mydataset@snap1

Enable compression on datasets:

smalley@demoa:~$ sudo zfs set compression=lz4 mypool/mydataset

Conclusion

Whether you’re setting up a basic home lab with mirrored HDDs or a mission-critical server environment leveraging RAID-Z2 with dedicated SLOG and L2ARC, ZFS is built for data integrity, flexibility, and performance. Arch Linux users can harness these strengths via the ArchZFS repository, gaining access to advanced storage features that rival any enterprise solution.

Key Takeaways:

  • Use mirrors or RAID-Z for redundancy and data protection.
  • SLOG devices accelerate synchronous writes, while L2ARC extends read caching beyond system RAM.
  • Offline and replace disks carefully, and let ZFS resilver automatically.
  • zfs send and zfs receive enable powerful, incremental backups and migrations.

With this guide, you can confidently administer OpenZFS on Linux, from initial installation to advanced day-to-day management. Happy ZFS-ing!