Why?


Search This Blog

Friday, December 25, 2015

Hard Drive smartctl information

Hard Drive smartctl information

My NAS (CentOS 7, ZFS RAID 10) was spitting drive error messages on the console the other day. This after I started using it under my ESXi 5.5 U2 Server to run quest images. I ran zpool scrub and it fixed one error. I then wanted to see what the drives themselves had logged for errors.

I first initiated the test on all my drives. sda is a 120GB Samsung SSD I use for boot and OS. The rest are WD 3TB Red NAS drives of the same model WD30EFRX-68EUZN0. When I first set this up, some 8 months ago, I updated the firmware and turned off head parking. BTW smartcl shows the drives with only 53 parks so far, in 8 months. Sweet. The command I used to start the test are:

# yum -y install smartmontools

# smartctl -t short /dev/sda
# smartctl -t short /dev/sdb
# smartctl -t short /dev/sdc
# smartctl -t short /dev/sdd
# smartctl -t short /dev/sde


I waited five minutes for them to complete and then viewed the results

# smartctl -a /dev/sda
# smartctl -a /dev/sdb
# smartctl -a /dev/sdc
# smartctl -a /dev/sdd
# smartctl -a /dev/sde



I think I am all good now. I have moved the quest images off the NAS and onto another data store, three 500GB WD drives in the ESXi server. I also have a 250GB Samsung SSD in that for boot and OS. I will keep an eye on it.

2015/12/30 Update:

Started getting CRC errors again. Tried different sata cables. Did not work. I don't have another sata port on the board to try (The other 4 have drives on them), or a PCIE stata card laying around I could use. So figured I would just order another dive and swap it out. If the original was not bad then I would have a spare. Turns out this fixed my problem. I have moved 100GB on/off the ZFS pool and not one error. I went to WD site, where I registered my original drive, selected the drive, clicked RMA, and I was greeted with instructions and UPS label to send back. Zero hassle on that part. I took everything to work today, packaged it up, and left it on the UPS pallet in the shipping room. Looks like I got a spare drive anyway come to think of it.

Oh ya. I made sure the firmware was same version and turned off head parking on the new drive, as I did with the others when I first set this up.

2015/12/31 Update:

Well I started getting drive reset errors on /dev/sdb now. While I was shopping for a PCIE sata controller, I was reading that it is possible for a Kernel to cause this problem that does not address the hardware correctly. Well I am on CentOS Linux release 7.2.1511 (Core) fully updated and patched with Kernel 3.10.0-327.3.1.el7.x86_64 so not much I can do there. Then I got the notion of the BIOS interacts with the hardware as well. I checked the Gigabyte site. I went from an F2 to a F5 BIOS. I have transferred 100GB of data on/off the pool and ran multiple scrubs after this. No errors, so far :)

No comments:

Post a Comment