Hard Drive smartctl information
My NAS (CentOS 7, ZFS RAID 10) was spitting drive error messages on the console the other day. This after I started using it under my ESXi 5.5 U2 Server to run quest images. I ran zpool scrub and it fixed one error. I then wanted to see what the drives themselves had logged for errors.
I first initiated the test on all my drives. sda is a 120GB Samsung SSD I use for boot and OS. The rest are WD 3TB Red NAS drives of the same model WD30EFRX-68EUZN0. When I first set this up, some 8 months ago, I updated the firmware and turned off head parking. BTW smartcl shows the drives with only 53 parks so far, in 8 months. Sweet. The command I used to start the test are:
# yum -y install smartmontools
# smartctl -t short /dev/sda
# smartctl -t short /dev/sdb
# smartctl -t short /dev/sdc
# smartctl -t short /dev/sdd
# smartctl -t short /dev/sde
I waited five minutes for them to complete and then viewed the results
# smartctl -a /dev/sda
# smartctl -a /dev/sdb
# smartctl -a /dev/sdc
# smartctl -a /dev/sdd
# smartctl -a /dev/sde
I think I am all good now. I have moved the quest images off the NAS and onto another data store, three 500GB WD drives in the ESXi server. I also have a 250GB Samsung SSD in that for boot and OS. I will keep an eye on it.
2015/12/30 Update:
Started getting CRC errors again. Tried different sata cables. Did not work. I don't have another sata port on the board to try (The other 4 have drives on them), or a PCIE stata card laying around I could use. So figured I would just order another dive and swap it out. If the original was not bad then I would have a spare. Turns out this fixed my problem. I have moved 100GB on/off the ZFS pool and not one error. I went to WD site, where I registered my original drive, selected the drive, clicked RMA, and I was greeted with instructions and UPS label to send back. Zero hassle on that part. I took everything to work today, packaged it up, and left it on the UPS pallet in the shipping room. Looks like I got a spare drive anyway come to think of it.
Oh ya. I made sure the firmware was same version and turned off head parking on the new drive, as I did with the others when I first set this up.
2015/12/31 Update:
Well I started getting drive reset errors on /dev/sdb now. While I was shopping for a PCIE sata controller, I was reading that it is possible for a Kernel to cause this problem that does not address the hardware correctly. Well I am on CentOS Linux release 7.2.1511 (Core) fully updated and patched with Kernel 3.10.0-327.3.1.el7.x86_64 so not much I can do there. Then I got the notion of the BIOS interacts with the hardware as well. I checked the Gigabyte site. I went from an F2 to a F5 BIOS. I have transferred 100GB of data on/off the pool and ran multiple scrubs after this. No errors, so far :)
No comments:
Post a Comment