zfs

When you have to backup millions of files, use zfs if you can

I’ve been increasingly worried about one of the media servers I backup, the backup time has been crawling it’s ways from 12 to about 14 hours as of today. It’s not that it’s so many TB’s, only 1.4 TB actually, the problem is that there is more than 2 000 000 small files, and rsync have to fstat every one of them. So even when there hasn’t been much change the backup takes longer and longer.
So yesterday I figured I should try out zfs. The media server is already on zfs, and I use zfs on the backup server + I have already got a zfs sync script which I modified some months ago to use mbuffer for maximum bandwith utilization .

And how did this turn out ? well, the backup time went from about 14 hours to 7 seconds … I just realised I could never go away from zfs for these kind of setups (backing up millions of files ) .

zfs to the rescue

So I’ve had some zfs raidz/mirror problems, and again I noticed another troublesome disk in my zfs setup.
This time I noticed a minor sector error on a disk pretty early, and didn’t want to take any chances and decided to replace it at once. The disk is one of those super crappy WD green disks anyway, which I’ve found that REALLY shouldn’t be used in any raid setup / server setup ( anything other than a desktop you don’t care about) .
A bit wiser from last time, this time my nagios nrpe script picked up:

(da134:ciss1:2:11:0): READ(6). CDB: 8 a 9c d3 1 0 
(da134:ciss1:2:11:0): CAM status: CCB request completed with an error
(da134:ciss1:2:11:0): Retrying command

Not a 100% sure if that message really is a sector error, but I’m not taking any chances, I had a lot of troubles with this server last time a disk died.
So I did:

root:~# zpool offline tank da134
root:~# zpool status
[...]
	  mirror-5                DEGRADED     0     0     0
	    da122                 ONLINE       0     0     0
	    11771992511548113470  OFFLINE      0     0     0  was /dev/da134
[...]
root:~# zpool detach tank da134
root:~# zpool status
[...]
mirror-4  ONLINE       0     0     0
	    da98    ONLINE       0     0     0
	    da110   ONLINE       0     0     0
	  da122     ONLINE       0     0     0
[...]
root:~# halt -p
( replaced the drive and turned the server back on )
root:~# zpool attach tank da122 da134
root:~# zpool status
mirror-5  ONLINE       0     0     0
	    da122   ONLINE       0     0     0
	    da134   ONLINE       0     0     0  (resilvering)

I never blog’ed about what really happened from my previous raidz rebuild which went south (to put it mildly). Problem then was that I was running raidz, which supports 1 disk failure, but it turned out I had several block read errors, so after 4 attempts to resilver / rebuild, zfs still wasn’t able to rebuild the fresh drive simply because there was at least 2 other pretty rotten disks in the raid that kept on throwing new sector errors …

So I had to scrap the whole setup, and setup a fresh zfs pool. I got my boss to buy some new disks, but it turned out that 4 of those disks (some samsung disks) wasn’t recognized by the raid hardware controller (?), sooooo I put in some WD green disks there …

BUT I figured since the pool is less than 25% filled, I changed the whole setup from raidz to a mirrored setup, that is 6 mirror pairs with a total of 12 disks, and on top of that I set copies=2 for the backup pool. copies=2 will double the amount of space usage because every block is written to 2 blocks on the disk. As long as I have plenty of space I should be better set for corrupted sectors/blocks, bit rotting and what not. 🙂

root:~# zfs get copies tank/backup
NAME         PROPERTY  VALUE   SOURCE
tank/backup  copies    2       local
root:~# 

And the mirrored zfs setup look like this:

root:~# zpool status
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Apr 25 14:20:24 2013
        68.2G scanned out of 4.61T at 61.2M/s, 21h37m to go
        13.8G resilvered, 1.44% done
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da2     ONLINE       0     0     0
	    da14    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da26    ONLINE       0     0     0
	    da38    ONLINE       0     0     0
	  mirror-2  ONLINE       0     0     0
	    da50    ONLINE       0     0     0
	    da62    ONLINE       0     0     0
	  mirror-3  ONLINE       0     0     0
	    da74    ONLINE       0     0     0
	    da86    ONLINE       0     0     0
	  mirror-4  ONLINE       0     0     0
	    da98    ONLINE       0     0     0
	    da110   ONLINE       0     0     0
	  mirror-5  ONLINE       0     0     0
	    da122   ONLINE       0     0     0
	    da134   ONLINE       0     0     0  (resilvering)
	logs
	  da1       ONLINE       0     0     0

errors: No known data errors

zfs raidz – replacing a drive

So I had a faulty drive in a zfs raidz configuration. I replaced the drive without setting the faulty drive to offline, or detaching it from the current raidz configuration … I probably did that part wrong. After reboot I had something like this:

root@backupmh:~# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	tank                     DEGRADED     0     0     0
	  raidz1-0               ONLINE       0     0     0
	    da1                  ONLINE       0     0     0
	    da13                 ONLINE       0     0     0
	    da25                 ONLINE       0     0     0
	    da37                 ONLINE       0     0     0
	    da49                 ONLINE       0     0     0
	  raidz1-1               DEGRADED     0     0     0
	    da73                 ONLINE       0     0     0
	    da85                 ONLINE       0     0     0
	    da97                 ONLINE       0     0     0
	    da109                ONLINE       0     0     0
	    9120273794345838000  UNAVAIL      0     0     0  was /dev/da121
	spares
	  da61                   AVAIL   
	  da133                  AVAIL   

errors: No known data errors
root@backupmh:~# ls -l /dev/da121
crw-r-----  1 root  operator    1, 100 14 mar 11:59 /dev/da121

It’s /dev/da121 that I replaced. First I smply wannted to hot-swap the drive, but since this is a really old HP MSA-20 or something, and I’m using som super crap’y cheap sata drives I had to reboot and some other stuff until the raid controller wanted to recognize the drive …
Anyway, to get zfs to rebuild ( or resilver, as zfs calls it) I ended up with:

root@backupmh:~# zpool offline tank /dev/da121
root@backupmh:~# zpool online tank /dev/da121
warning: device '/dev/da121' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
root@backupmh:~# zpool replace tank /dev/da121

After that I got :

root@backupmh:~# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Thu Mar 14 12:14:51 2013
    6.46G scanned out of 1.36T at 4.60M/s, 85h38m to go
    685M resilvered, 0.46% done
config:

	NAME                       STATE     READ WRITE CKSUM
	tank                       DEGRADED     0     0     0
	  raidz1-0                 ONLINE       0     0     0
	    da1                    ONLINE       0     0     0
	    da13                   ONLINE       0     0     0
	    da25                   ONLINE       0     0     0
	    da37                   ONLINE       0     0     0
	    da49                   ONLINE       0     0     0
	  raidz1-1                 DEGRADED     0     0     0
	    da73                   ONLINE       0     0     0
	    da85                   ONLINE       0     0     0
	    da97                   ONLINE       0     0     0
	    da109                  ONLINE       0     0     0
	    replacing-4            UNAVAIL      0     0     0
	      9120273794345838000  UNAVAIL      0     0     0  was /dev/da121/old
	      da121                ONLINE       0     0     0  (resilvering)
	spares
	  da61                     AVAIL   
	  da133                    AVAIL   

errors: No known data errors

the resilver progress look slow … but it’s getting faster and faster every time I check it (started at an estimate of +600 hours) .
I initially setup the 2 raidz configurations with a hot-spare, I wonder why it didn’t start resilvering to the corresponding hot-spare after reboot (?) .
I should check that out, some day …