zfs to the rescue

So I’ve had some zfs raidz/mirror problems, and again I noticed another troublesome disk in my zfs setup.
This time I noticed a minor sector error on a disk pretty early, and didn’t want to take any chances and decided to replace it at once. The disk is one of those super crappy WD green disks anyway, which I’ve found that REALLY shouldn’t be used in any raid setup / server setup ( anything other than a desktop you don’t care about) .
A bit wiser from last time, this time my nagios nrpe script picked up:

(da134:ciss1:2:11:0): READ(6). CDB: 8 a 9c d3 1 0 
(da134:ciss1:2:11:0): CAM status: CCB request completed with an error
(da134:ciss1:2:11:0): Retrying command

Not a 100% sure if that message really is a sector error, but I’m not taking any chances, I had a lot of troubles with this server last time a disk died.
So I did:

root:~# zpool offline tank da134
root:~# zpool status
	  mirror-5                DEGRADED     0     0     0
	    da122                 ONLINE       0     0     0
	    11771992511548113470  OFFLINE      0     0     0  was /dev/da134
root:~# zpool detach tank da134
root:~# zpool status
mirror-4  ONLINE       0     0     0
	    da98    ONLINE       0     0     0
	    da110   ONLINE       0     0     0
	  da122     ONLINE       0     0     0
root:~# halt -p
( replaced the drive and turned the server back on )
root:~# zpool attach tank da122 da134
root:~# zpool status
mirror-5  ONLINE       0     0     0
	    da122   ONLINE       0     0     0
	    da134   ONLINE       0     0     0  (resilvering)

I never blog’ed about what really happened from my previous raidz rebuild which went south (to put it mildly). Problem then was that I was running raidz, which supports 1 disk failure, but it turned out I had several block read errors, so after 4 attempts to resilver / rebuild, zfs still wasn’t able to rebuild the fresh drive simply because there was at least 2 other pretty rotten disks in the raid that kept on throwing new sector errors …

So I had to scrap the whole setup, and setup a fresh zfs pool. I got my boss to buy some new disks, but it turned out that 4 of those disks (some samsung disks) wasn’t recognized by the raid hardware controller (?), sooooo I put in some WD green disks there …

BUT I figured since the pool is less than 25% filled, I changed the whole setup from raidz to a mirrored setup, that is 6 mirror pairs with a total of 12 disks, and on top of that I set copies=2 for the backup pool. copies=2 will double the amount of space usage because every block is written to 2 blocks on the disk. As long as I have plenty of space I should be better set for corrupted sectors/blocks, bit rotting and what not. 🙂

root:~# zfs get copies tank/backup
tank/backup  copies    2       local

And the mirrored zfs setup look like this:

root:~# zpool status
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Apr 25 14:20:24 2013
        68.2G scanned out of 4.61T at 61.2M/s, 21h37m to go
        13.8G resilvered, 1.44% done

	tank        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da2     ONLINE       0     0     0
	    da14    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da26    ONLINE       0     0     0
	    da38    ONLINE       0     0     0
	  mirror-2  ONLINE       0     0     0
	    da50    ONLINE       0     0     0
	    da62    ONLINE       0     0     0
	  mirror-3  ONLINE       0     0     0
	    da74    ONLINE       0     0     0
	    da86    ONLINE       0     0     0
	  mirror-4  ONLINE       0     0     0
	    da98    ONLINE       0     0     0
	    da110   ONLINE       0     0     0
	  mirror-5  ONLINE       0     0     0
	    da122   ONLINE       0     0     0
	    da134   ONLINE       0     0     0  (resilvering)
	  da1       ONLINE       0     0     0

errors: No known data errors

