So I had a faulty drive in a zfs raidz configuration. I replaced the drive without setting the faulty drive to offline, or detaching it from the current raidz configuration … I probably did that part wrong. After reboot I had something like this:
root@backupmh:~# zpool status pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scan: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da13 ONLINE 0 0 0 da25 ONLINE 0 0 0 da37 ONLINE 0 0 0 da49 ONLINE 0 0 0 raidz1-1 DEGRADED 0 0 0 da73 ONLINE 0 0 0 da85 ONLINE 0 0 0 da97 ONLINE 0 0 0 da109 ONLINE 0 0 0 9120273794345838000 UNAVAIL 0 0 0 was /dev/da121 spares da61 AVAIL da133 AVAIL errors: No known data errors root@backupmh:~# ls -l /dev/da121 crw-r----- 1 root operator 1, 100 14 mar 11:59 /dev/da121
It’s /dev/da121 that I replaced. First I smply wannted to hot-swap the drive, but since this is a really old HP MSA-20 or something, and I’m using som super crap’y cheap sata drives I had to reboot and some other stuff until the raid controller wanted to recognize the drive …
Anyway, to get zfs to rebuild ( or resilver, as zfs calls it) I ended up with:
root@backupmh:~# zpool offline tank /dev/da121 root@backupmh:~# zpool online tank /dev/da121 warning: device '/dev/da121' onlined, but remains in faulted state use 'zpool replace' to replace devices that are no longer present root@backupmh:~# zpool replace tank /dev/da121
After that I got :
root@backupmh:~# zpool status pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Mar 14 12:14:51 2013 6.46G scanned out of 1.36T at 4.60M/s, 85h38m to go 685M resilvered, 0.46% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da13 ONLINE 0 0 0 da25 ONLINE 0 0 0 da37 ONLINE 0 0 0 da49 ONLINE 0 0 0 raidz1-1 DEGRADED 0 0 0 da73 ONLINE 0 0 0 da85 ONLINE 0 0 0 da97 ONLINE 0 0 0 da109 ONLINE 0 0 0 replacing-4 UNAVAIL 0 0 0 9120273794345838000 UNAVAIL 0 0 0 was /dev/da121/old da121 ONLINE 0 0 0 (resilvering) spares da61 AVAIL da133 AVAIL errors: No known data errors
the resilver progress look slow … but it’s getting faster and faster every time I check it (started at an estimate of +600 hours) .
I initially setup the 2 raidz configurations with a hot-spare, I wonder why it didn’t start resilvering to the corresponding hot-spare after reboot (?) .
I should check that out, some day …
One comment on “zfs raidz – replacing a drive”