We ❤️ Open Source

A community education resource

15 min read

Why are my ZFS disks so noisy?

Use these troubleshooting techniques to optimize the pool topology and block size of a ZFS disk drive.

Earlier this year, a user at Practical ZFS asked a deceptively simple question:

I have a Proxmox pool of three four-wide RAIDz1 vdevs. Whenever my VM is running, all twelve drives chatter audibly at least once per second, all day long. How come, and can I fix that?

There is, of course, a simple answer to this simple question: the VM in question is probably just trickling a constant stream of writes to disk, and those writes are getting sync’d once per second to the underlying metal.

Unfortunately, this simple answer doesn’t get to the heart of the question, and doesn’t offer any potential mitigations or solutions either. In order to completely answer the question–and offer some potential fixes–we have to dive down into some pretty deep fundamentals of how ZFS works, and why.

Answering this “simple” question required so many digressions, it felt a bit like asking the user to build Johnny Cash’s Cadillac just so they could learn how to wash it!

Awkward vdev width

One of the first things that leaps to mind when looking at this user’s problem is their topology: They have twelve total drives, arranged in three RAIDz1 vdevs of four drives apiece. This isn’t an ideal width.

In order to understand why, we need to talk about how RAIDz works. In a nutshell, each block–the fundamental unit of ZFS storage data, whose size is defined by the recordsize or volblocksize properties as appropriate–is split into n-p pieces, where n is the total number of drives in the RAIDz vdev and p is the parity level.

Since block size must be an even power of 2, this means that n-p must also be an even power of 2, if you want the pieces of each block to divide up evenly amongst the drives in the vdev. Let’s see how that works out in practice:

  • A 128KiB block is stored on a three-wide RAIDz1 vdev. 128KiB / (3-1) == 64KiB, so each drive gets 16 4KiB sectors written to it. The 128KiB block therefore occupies precisely 128KiB on disk (before parity), making this is an ideal width.
  • A 128KiB block is stored on a four-wide RAIDz1 vdev. We can’t divide 128KiB evenly among three drives, so what we do instead is write the data in full stripes, followed by a partial-width stripe. This works out to ten stripes of three data sectors and one parity sector, followed by a narrow stripe with the remaining two sectors, and one parity sector for them. This adds up to an aggregate storage efficiency of 32 data sectors / 44 total sectors == 72.7%, not the 75% naively expected…but this is at the default filesystem blocksize of 128KiB, not Proxmox’s default volblocksize of 8KiB or 16KiB. That’s going to make things much worse.
  • A 128KiB block is stored on an eight-wide RAIDz2 vdev. That means we”’ll need to write thirty-two data sectors–and since 32/6==5.3, we’ll have to write our block in five full-width stripes of six data sectors + two parity sectors, followed by a narrow stripe of two data sectors + two parity sectors. This comes out to the same 32 data sectors / 44 total sectors we saw for a four-wide Z1–again, only 72.7% rather than the 75% we naively expected.

Now, none of this is the end of the world–OpenZFS founding developer Matt Ahrens famously stated that he learned to stop worrying and love RAIDz, and you should too. In particular, pools which mostly store compressible data are minimally affected by other-than-ideal vdev widths, because OpenZFS compression results in off-sized stripes anyway.

To be clear, I don’t disagree with Matt–the impact of off-size vdev widths is usually pretty minimal, and I rarely advise anyone to tear down a working pool just because they used an odd width.

With that said, if you’re building a new pool from scratch, the impact of offsize widths on both performance and efficiency is noticeable, so there’s no reason not to go ahead and build your pools optimally…and in this particular case, we aren’t just worried about performance and storage efficiency, we’re worried about noise.

We’re also working with very small blocksizes, which–as we’re about to see in the next section–amplifies all of these issues significantly.

Awkward block size

The next issue is a bit more subtle, and requires some knowledge of a ZFS-powered distribution to spot–the user is running Proxmox, a Debian-based “appliance” distribution which makes it easy to operate virtual machines (VMs) stored on top of OpenZFS.

Proxmox is a very opinionated distribution–it wants you to use zvols (block storage datasets) rather than filesystems (file storage datasets) for your virtual machines, and it defaults to very small volblocksize–8KiB for all but the most recent release, which bumps that up to 16KiB.

volblocksize=8K is a bad idea

Remember the math we did earlier, figuring out whether a vdev width was ideal or not? In Proxmox, using the default settings, there generally isn’t any ideal RAIDz vdev width–the blocks are too small to divvy up between them.

For anyone using Proxmox version 7 or earlier, the 8KiB volblocksize parameter means each block only has two sectors of data–and therefore, cannot be split evenly among more than two data drives.

Our noise-beleaguered questioner is running four-wide Z1, which means that each 8KiB block–two sectors of data–can only occupy three total sectors (two data and one parity). That’s only three drives wide–so OpenZFS stores each 8KiB block on only three drives, each of which writes a single sector only.

Since we are only writing to three disks per stripe, we’re only getting the 67% storage efficiency of a three-wide Z1, not the 75% storage efficiency we might naively expect from a four-wide Z1–or even the 72.7% we’d expect from that four-wide Z1 once we take extra stripes into account.

We’re also writing a single sector to each drive, with ruinously bad effect on both performance and drive noise–we’re maximizing the potential for fragmentation, which means maximal drive head seeks, minimal performance, and maximal drive noise.

volblocksize=16K isn’t much better

Beginning with Proxmox 8, the default volblocksize for new VMs is 16KiB, not 8KiB. This is still awfully narrow for many common use cases, but it’s at least not quite as punishing as the old 8KiB value.

As we saw in the previous section, we can’t split a block evenly into three pieces–so we’ll need one narrow stripe at the end of every block. Unfortunately, we’re only working with 16KiB at a time, not the 128KiB we looked at earlier–so the impact is more punishing.

Each 16KiB block must be split into four 4KiB sectors. Three of those sectors plus a parity sector go on one full-width stripe, then the remaining sector gets its own parity sector in a second, narrower stripe–leaving us with aggregate storage efficiency of 4 data sectors / 2 parity sectors == 67%.

The good news is, this does mean that on every block read or written, half of our drives get to do 8K I/O instead of 4K I/O. That will improve performance slightly, and might decrease noise a little…but probably not enough to notice, in either case.

Minimizing noise and maximizing performance

Now that we’ve built Johnny Cash’s Cadillac by hand, we can finally talk about washing it.

To recap, our user is unhappy with the amount of drive noise their Proxmox server is making. We know that the server has twelve drives, organized into three four-wide RAIDz1 vdevs. This is the user’s first Proxmox server, which means they’re almost certainly using default settings–meaning either volblocksize=8K, or volblocksize=16K.

As we’ve already learned, the combination of zvols, small volblocksize, and off-size RAIDz vdev widths means we’re writing more sectors than we really need to–potentially, a lot more sectors than naive napkin math might suggest.

Unfortunately, none of these poor configuration choices are easy to remedy–RAIDz vdevs cannot currently be reshaped, and volblocksize is immutable once set. So we’re looking at tearing down pretty much everything to hopefully resolve the problem.

Pool topology improvements

For pool topology, we can assume that our user was looking for that naive 75% storage efficiency that four-wide Z1 implies (but does not typically deliver). What we don’t know, yet, is how important that promised additional storage actually is.

If our user is really focused on that 75% storage efficiency, they’re pretty much out of luck–you don’t hit that number until you get to ten-wide RAIDz2, which is an ideal width offering a naive SE of 80%.

Our user does have twelve drives, so a ten-wide Z2 is at least possible–but our user would also be dropping from three vdevs to one, and would need a minimum volblocksize of 32KiB (eight 4KiB sectors) just to get full width writes at one sector per drive.

If the user is less focused on total capacity, it probably makes more sense to consider one of these topologies:

  • Six 2-wide mirrors: six vdevs + full volblocksize writes to each disk + double IOPS for reads == maximum performance with minimum noise, at 50% SE and single redundancy
  • Four 3-wide Z1: four vdevs + half volblocksize written to each disk == very high performance with significantly reduced noise, at 67% SE and single redundancy
  • Three 4-wide Z2: three vdevs + half volblocksize written to each disk == high performance with significantly reduced noise, at 50% SE and dual redundancy
  • Two 6-wide Z2: two vdevs + quarter volblocksize written to each disk == moderate performance with possibly reduced noise, at 67% SE (maximum, see next section) and dual redundancy

Volblocksize improvements

Generally speaking, the larger volblocksize is, the higher maximum throughput may be achieved–at the expense of increasing latency experienced per individual I/O operation. The maximum overall performance is achieved when volblocksize matches or slightly exceeds the most common random I/O operation size in the underlying workload.

This means that for a PostgreSQL DB using 8KiB pages, you want to use a volblocksize of either 8KiB or 16KiB. Although 8KiB is a direct match, 16KiB might be preferable–larger volblocksize offers higher compression potential, and larger I/O per individual disk (meaning higher per-disk performance). For our specific original complaint–noise–we definitely want to lean toward the higher volblocksize, because larger individual operations means fewer seeks, which in turn means less “platter chatter!”

However, most VMs aren’t dedicated PostgreSQL VMs. Even in the world of database engines–the most latency-sensitive applications–MySQL InnoDB defaults to 16KiB pages, and MSSQL generally defaults to 64KiB pages. Meanwhile, VMs used for bulk storage of “Linux ISOs” will mostly be moving entire multi-GiB files!

For a general-purpose VM, I recommend 64KiB volblocksize–this aims at a sweet spot between minimal latency and maximal throughput, with decent potential for compression.

Putting it all together

With volblocksize=16KiB on a pool consisting of three 4-wide Z1 vdevs, our user is experiencing a lot of unnecessary inefficiency as well as unwanted noise. Each write goes to only three of the four drives in each vdev, and consists of only a single sector per drive–absolutely maximizing fragmentation, which in turns minimizes performance and maximizes noise, all while screwing up the extra storage efficiency that our user was likely expecting to get.

Although we don’t know precisely what workload our user expects to see in their VM, we can assume a mixture of fileserving and desktop user interface stuff–in other words, “general-purpose use.” For this use case, we typically want each block to be around 64KiB.

If we rebuild the pool with four three-wide Z1 instead of three four-wide Z1, and change volblocksize from 16KiB to 64Kib, we will significantly increase the user’s storage efficiency and performance, as well as decreasing their experienced noise.

Let’s walk through the process of writing 64KiB to our original pool, and the same 64KiB to a redesigned pool using the same twelve drives:

  • Our original pool uses 16KiB blocks and four-wide Z1 vdevs. Each block written is split into four two-sector wide writes (three data and one parity). It therefore requires four total blocks, and sixteen individual write operations, to commit 64KiB of data.
  • Our redesigned pool uses 64KiB blocks and three-wide Z1 vdevs. Each block written is split into three eight-sector wide writes (two data and one parity). It therefore requires only one total block, and three individual write operations, to commit the same 64KiB of data.

When we compare these two setups, we can see that our user’s original pool requires a whopping five times more individual write operations for every 64KiB committed to disk than our revised setup does.

Obviously, this means our revised setup will enjoy much higher performance than the original–and thanks to the lessons learned in earlier sections, we also know that we’ll get improved on-disk storage efficiency to go along with it.

But more importantly–since the issue that caught our user’s attention in the first place was noise, not performance or capacity–20% of the total write operations issued means 20% of the opportunities to cause a drive’s head to seek, so we can expect our twelve-drive pool to be a lot less “chatty” while it works!

If that’s not good enough, a pool of six two-wide mirrors requires only two sixteen-sector-wide writes to commit the same 64KiB of data. That’s only 12.5% of the individual write operations the original setup needed!

Visualization

It may help to see a visual representation of how 64KiB of data will be written to several of the topologies and volblocksizes we’ve configured, from least performant (and noisiest) to most performant (and quietest).

We’ll show all twelve of our drives as they’re ordered in the pool topology, then we’ll show each sequential write operation necessary to commit 64KiB to that pool. Greyed-out areas of the graph don’t represent wasted space–they simply note that that drive was not used for that write operation.

ZFS 64KiB write RAIDz visualization 4 wide Z1 8K output
Image credits: Jim Salter, CC-BY-SA 4.0

If we’re working with Proxmox 7 or below, the default volblocksize is 8KiB. You simply can’t stretch an 8KiB block all the way across a four-wide RAIDz1 vdev, so we consistently write 3-wide stripes across groups of four drives.

This gives us terrible results for both performance and noise–each of our drives is forced to perform individual 4KiB (single-sector) operations on every vdev write.

We needed to issue 24 individual disk write operations, each only a single sector wide… which leaves us hoping that OpenZFS at least manages to order most of those single-sector operations contiguously both now and in the future.

ZFS 64KiB write RAIDz visualization 4 wide Z1 16K output
Image credits: Jim Salter, CC-BY-SA 4.0

Beginning with Proxmox 8, the default volblocksize is 16KiB. This is generally an improvement, but leaves us with a bit of a mixed bag due to our off-sized vdevs.

In order to write 16KiB blocks, OpenZFS issues each vdev one full stripe write (one sector to each drive) followed by a partial stripe write. The first stripe contains three of the four data sectors we need to write, along with another sector of parity. The second stripe consists of the remaining one data sector we need to write, along with a parity sector for it.

This mish-mash allows your drives to perform 8KiB operations roughly half the time, and 4KiB operations the other half. It is an improvement, but we’re still looking at 18 individual disk write operations to commit our 64KiB of data to disk.

ZFS 64KiB write RAIDz visualization 3 wide Z1 output
Image credits: Jim Salter, CC-BY-SA 4.0

This time, we’re looking at ideal-width Z1 vdevs plus 64KiB volblocksize, with dramatic differences. We only need to write a single block, and that single block only lights up three total drives.

We committed the same 64KiB to disk, but gave the drives six times fewer opportunities to seek (remember: seeks both drop performance and irritate us with noise) when we did!

We also used the same total 24 4KiB sectors on-disk as the two four-wide Z1 vdevs did. Take this lesson to heart–just because a vdev is slightly wider doesn’t mean it’s necessarily going to offer greater storage efficiency.

ZFS 64KiB write RAIDz visualization mirrors output
Image credits: Jim Salter, CC-BY-SA 4.0

Finally, we have my favorite topology for relatively small pools–mirror vdevs. We’re still using volblocksize=64K, so we’re still only issuing a single block write to our pool–but now we’re only issuing it to two disks, not three, saving us ⅓ the potential for platter chatter that our 3-wide RAIDz1 offered above.

Our mirror vdevs offer us significantly better performance than even the 3-wide RAIDz1 vdevs: we get half again the vdevs and double the read IOPS per vdev. Fault tolerance also increases slightly–each vdev still only survives a single disk failure, but there are fewer disks to fail per vdev.

If you want greater failure resistance, consider 4-wide or 6-wide RAIDz2 instead.

An alternate approach

Although we’ve successfully answered the question “why do my drives make noise?” we haven’t entirely answered the more important question, “and how do I stop hearing them.”

By properly configuring our pool topology and block size, we can drastically minimize the number of seeks issued to our drive heads, which in turn makes them far less “chattery.” But we should also ask the question, “what if we just can’t hear them?”

The amount of noise your mechanical drives make is largely influenced by the chassis you put them in. Ideally, you want a fairly heavy aluminum or steel case with rubber grommets for all the drive screws, and rubber bushings anywhere the drive might be expected to rest.

Dampening the vibrations transmitted to the chassis–and having a chassis with heavy panels, ideally not gamerriffic glass–will do wonders to make any noise your drives make much less obnoxious.

About the Author

Jim Salter (@jrssnet) is an author, public speaker, mercenary sysadmin, and father of three—not necessarily in that order.

Read Jim Salter's Full Bio

The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.

Contribute to We ❤️ Open Source

Help educate our community by contributing a blog post, tutorial, or how-to.

Register for All Things Open 2024

Join thousands of open source friends October 27-29 in downtown Raleigh for ATO 2024!

Open Source Meetups

We host some of the most active open source meetups in the U.S. Get more info and RSVP to an upcoming event.