Why?


Search This Blog

Sunday, March 12, 2017

Bad luck with WD Red NAS 3TB drives

3/12/2017

I have had 4 of 8 drives go bad in the last year and half. Oldest drive is 2 years old. Model numbers for all the drives are WD30EFRX-68EUZN0. I am waiting on an RMA to come back, and another went bad tonight. Lucky I have extra drives and am replacing tonight with another new WD Red NAS 3TB drive i had for just this purpose.

I have decided to not get anymore of these drives, other than the RMA I am waiting on now, and the RMA I will do after I replace the current bad drive. I have ordered a single HGST Desktsar NAS 3TB drive to look at. If this will fit my needs I will replace all drives in my NAS with these.

I want to look at the heat, noise, and see if i can get head parking off on the HGST drives. I will also measure performance, but I am sure the 7200 HGST will be faster then the 5400 WD Red.

Just an FYI to everyone. Be careful with these WD Red NAS 3TB drives.

UPDATE 3/13/2017

So i got the drive out of the NAS. Home built  on Centos 7.3, kernel 4.10, btrfs-progs 4.9. Took a bit to get the drive removed from the array (wait for data to be moved across the other 7 drives) and then add the replacement back in (wait for data to be balanced back across all 8 drives again).

I then took the "bad drive" and put it in my Windows workstation. I ran the WD Data Lifeguard extended test and then ran the extended test from Aomei Partition Assistant Pro. Both of these test show I have no bad sectors. The drive seems to be good as new. To make long story short I think i have a bad fan cable from my SAS/SATA controller to the drives. I have two ports on controller that controls 4 SATA drives each. I reseated HBA card and cables, and power connectors on the drives, and ran some test and scrubs. all seems good with new drive. I then put the side panels back on the case and place it back in its home. And within an hour i start getting more errors on the same device id, only with the brand new drive on that id. So i pull the server back out, remove side panels, and reboot. Over 24 hours later I don't have another error. It maybe that the side panels on the case are moving the cables enough for the bad cable to cause the errors. So i have two new fan cables coming and should have them installed tomorrow night. I still have the new HGST drive coming because I really want to check this out. If they are not to noisy and run to hot, I may just replace them all anyway. I better see some latency and IO improvements before that happens. At $150.00 per drive x8..... Well its not cheap to me.

Friday, March 3, 2017

BTRFS Drive errors from 2017-02-26

BTRFS Drive errors from 2017-02-26

Was not getting failed scrubs but seeing some errors.

This is 8x 3TB WD Red drives in raid10 on a Supermicro AOC-SAS2LP-MV8 Add-on Card, 8-Channel SAS/SATA Adapter with 600MB/s per Channel in a PCIE x16 slot running at x8 on a Supermicro ATX DDR4 LGA 1151 C7Z170-OCE-O Motherboard with 64GB DDR4 RAM (4x 16GB sticks).

FYI Everything was done online unless otherwise stated.

What file system looks like in btrfs.

[root@nas ~]# btrfs fi show
Label: 'myraid'  uuid: 1ec4f641-74a8-466e-89cc-e687672aaaea
        Total devices 8 FS bytes used 1.16TiB
        devid    1 size 2.73TiB used 301.53GiB path /dev/sdb
        devid    2 size 2.73TiB used 301.53GiB path /dev/sdc
        devid    3 size 2.73TiB used 301.53GiB path /dev/sdd
        devid    4 size 2.73TiB used 301.53GiB path /dev/sde
        devid    6 size 2.73TiB used 301.53GiB path /dev/sdg
        devid    7 size 2.73TiB used 301.53GiB path /dev/sdh
        devid    8 size 2.73TiB used 301.53GiB path /dev/sdi
        devid    9 size 2.73TiB used 301.53GiB path /dev/sdf


Take a look at device stats.

[root@nas ~]# /usr/local/bin/btrfs device stats /myraid/
[/dev/sdb].write_io_errs   0
[/dev/sdb].read_io_errs    0
[/dev/sdb].flush_io_errs   0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdc].write_io_errs   0
[/dev/sdc].read_io_errs    0
[/dev/sdc].flush_io_errs   0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0
[/dev/sdd].write_io_errs   0
[/dev/sdd].read_io_errs    0
[/dev/sdd].flush_io_errs   0
[/dev/sdd].corruption_errs 0
[/dev/sdd].generation_errs 0
[/dev/sde].write_io_errs   0
[/dev/sde].read_io_errs    44
[/dev/sde].flush_io_errs   0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0
[/dev/sdg].write_io_errs   0
[/dev/sdg].read_io_errs    0
[/dev/sdg].flush_io_errs   0
[/dev/sdg].corruption_errs 0
[/dev/sdg].generation_errs 0
[/dev/sdh].write_io_errs   0
[/dev/sdh].read_io_errs    0
[/dev/sdh].flush_io_errs   0
[/dev/sdh].corruption_errs 0
[/dev/sdh].generation_errs 0
[/dev/sdi].write_io_errs   0
[/dev/sdi].read_io_errs    0
[/dev/sdi].flush_io_errs   0
[/dev/sdi].corruption_errs 0
[/dev/sdi].generation_errs 0
[/dev/sdf].write_io_errs   0
[/dev/sdf].read_io_errs    0
[/dev/sdf].flush_io_errs   0
[/dev/sdf].corruption_errs 0
[/dev/sdf].generation_errs 0


Run extended smartcl test.

[root@nas ~]# smartctl -t long /dev/sdb
[root@nas ~]# smartctl -t long /dev/sdc
[root@nas ~]# smartctl -t long /dev/sdd
[root@nas ~]# smartctl -t long /dev/sde
[root@nas ~]# smartctl -t long /dev/sdf
[root@nas ~]# smartctl -t long /dev/sdg
[root@nas ~]# smartctl -t long /dev/sdh
[root@nas ~]# smartctl -t long /dev/sdi


I waited an hour then reviewed the results.

[root@nas ~]# smartctl -l selftest /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     12660         -
# 2  Extended offline    Completed without error       00%      8916         -
# 3  Short offline       Completed without error       00%      6097         -
# 4  Extended offline    Completed without error       00%      4288         -
# 5  Short offline       Completed without error       00%      4245         -
# 6  Short offline       Completed without error       00%      4242         -
# 7  Short offline       Interrupted (host reset)      50%      4241         -
# 8  Short offline       Completed without error       00%      4172         -
# 9  Short offline       Completed without error       00%      4109         -

[root@nas ~]# smartctl -l selftest /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     12660         -
# 2  Extended offline    Completed without error       00%      8916         -
# 3  Short offline       Completed without error       00%      6096         -
# 4  Extended offline    Completed without error       00%      4288         -
# 5  Short offline       Completed without error       00%      4109         -

[root@nas ~]# smartctl -l selftest /dev/sdd
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     13003         -
# 2  Extended offline    Completed without error       00%      9260         -
# 3  Short offline       Completed without error       00%      6440         -
# 4  Extended offline    Completed without error       00%      4632         -
# 5  Short offline       Completed without error       00%      4452         -
# 6  Short offline       Completed without error       00%         0         -

[root@nas ~]# smartctl -l selftest /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      8452         339784376
# 2  Extended offline    Completed without error       00%      4716         -
# 3  Short offline       Completed without error       00%      1896         -
# 4  Extended offline    Completed without error       00%        88         -
# 5  Short offline       Completed without error       00%        15         -
# 6  Short offline       Aborted by host               90%        12         -
# 7  Short offline       Aborted by host               90%        12         -
# 8  Short offline       Aborted by host               90%        12         -
# 9  Short offline       Aborted by host               90%         5         -
#10  Short offline       Aborted by host               90%         5         -
#11  Short offline       Aborted by host               90%         5         -
#12  Short offline       Aborted by host               90%         4         -
#13  Short offline       Aborted by host               90%         4         -
#14  Short offline       Aborted by host               90%         4         -
#15  Short offline       Aborted by host               90%         0         -
#16  Short offline       Aborted by host               90%         0         -

[root@nas ~]# smartctl -l selftest /dev/sdf
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      3118         -
# 2  Short offline       Completed without error       00%        12         -

[root@nas ~]# smartctl -l selftest /dev/sdg
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      6298         -
# 2  Extended offline    Completed without error       00%      2555         -

[root@nas ~]# smartctl -l selftest /dev/sdh
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      6147         -
# 2  Extended offline    Completed without error       00%      2404         -
# 3  Short offline       Completed without error       00%         0         -

[root@nas ~]# smartctl -l selftest /dev/sdi
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      6147         -
# 2  Extended offline    Completed without error       00%      2404         -
# 3  Short offline       Completed without error       00%         0         -



[root@nas ~]# smartctl -a /dev/sdb | grep "Raw_Read_Error_Rate"
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
[root@nas ~]# smartctl -a /dev/sdc | grep "Raw_Read_Error_Rate"
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
[root@nas ~]# smartctl -a /dev/sdd | grep "Raw_Read_Error_Rate"
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
[root@nas ~]# smartctl -a /dev/sde | grep "Raw_Read_Error_Rate"
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1258
[root@nas ~]# smartctl -a /dev/sdf | grep "Raw_Read_Error_Rate"
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
[root@nas ~]# smartctl -a /dev/sdg | grep "Raw_Read_Error_Rate"
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
[root@nas ~]# smartctl -a /dev/sdh | grep "Raw_Read_Error_Rate"
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
[root@nas ~]# smartctl -a /dev/sdi | grep "Raw_Read_Error_Rate"
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

 

Yup. /dev/sde seems to have an issue.



Get serial number of all drives for a possible RMA on /dev/sde.

[root@nas ~]# smartctl -a /dev/sdb | grep "Serial Number:"
Serial Number:    WD-WMC4N0J0YT1V
[root@nas ~]# smartctl -a /dev/sdc | grep "Serial Number:"
Serial Number:    WD-WMC4N0J2L138
[root@nas ~]# smartctl -a /dev/sdd | grep "Serial Number:"
Serial Number:    WD-WCC4N2FJRTU9
[root@nas ~]# smartctl -a /dev/sde | grep "Serial Number:"
Serial Number:    WD-WCC4N4SSDRFN
[root@nas ~]# smartctl -a /dev/sdf | grep "Serial Number:"
Serial Number:    WD-WCC4N1VYZH52
[root@nas ~]# smartctl -a /dev/sdg | grep "Serial Number:"
Serial Number:    WD-WMC4N0M57KEY
[root@nas ~]# smartctl -a /dev/sdh | grep "Serial Number:"
Serial Number:    WD-WCC4N5YF2Z2Y
[root@nas ~]# smartctl -a /dev/sdi | grep "Serial Number:"
Serial Number:    WD-WCC4N5CJ6H8U




Get List of BadBlocks on all drives. Run in background and save to file.

badblocks -v /dev/sdb > /tmp/bad-blocks-b.txt &
badblocks -v /dev/sdc > /tmp/bad-blocks-c.txt &
badblocks -v /dev/sdd > /tmp/bad-blocks-d.txt &
badblocks -v /dev/sde > /tmp/bad-blocks-e.txt &
badblocks -v /dev/sdf > /tmp/bad-blocks-f.txt &
badblocks -v /dev/sdg > /tmp/bad-blocks-g.txt &
badblocks -v /dev/sdh > /tmp/bad-blocks-h.txt &
badblocks -v /dev/sdi > /tmp/bad-blocks-i.txt &

Monitor file size with:

[root@nas ~]# watch ls -lsa /tmp/bad-blocks-*.txt

If you have a really bad drive it could create a file the size of the drive itself so be sure to monitor and make sure you do not fill up your /tmp directory.

If you need to kill it then ket the pid with:

[root@nas tmp]# ps -ef | grep "badblocks"
UID        PID  PPID  C STIME TTY      TIME     CMD
root     27013 25404  3 10:43 pts/0    00:01:12 badblocks -v /dev/sdb
root     27014 25404  3 10:43 pts/0    00:01:12 badblocks -v /dev/sdc
root     27015 25404  3 10:43 pts/0    00:01:12 badblocks -v /dev/sdd
root     27016 25404  2 10:43 pts/0    00:01:11 badblocks -v /dev/sde
root     27017 25404  3 10:43 pts/0    00:01:13 badblocks -v /dev/sdf
root     27018 25404  3 10:43 pts/0    00:01:12 badblocks -v /dev/sdg
root     27019 25404  3 10:43 pts/0    00:01:12 badblocks -v /dev/sdh
root     27020 25404  3 10:43 pts/0    00:01:12 badblocks -v /dev/sdi
root     31044 26976  0 11:22 pts/1    00:00:00 grep --color=auto badblocks



While badblock test is running I have already got a RMA number from WD and a shipping label on my printer. I ordered a new drive from Amazon that will be here on the 28th. I will swap out then and ship bad drive back on the 29th.

Running smartctl test long on all drives

smartctl -t long /dev/sdb
smartctl -t long /dev/sdc
smartctl -t long /dev/sdd
smartctl -t long /dev/sde
smartctl -t long /dev/sdf
smartctl -t long /dev/sdg
smartctl -t long /dev/sdh
smartctl -t long /dev/sdi

Check progress of test

smartctl -a /dev/sdb | grep "Self-test execution status"
smartctl -a /dev/sdb | grep "of test remaining."

smartctl -a /dev/sdc | grep "Self-test execution status"
smartctl -a /dev/sdc | grep "of test remaining."

smartctl -a /dev/sdd | grep "Self-test execution status"
smartctl -a /dev/sdd | grep "of test remaining."

smartctl -a /dev/sde | grep "Self-test execution status"
smartctl -a /dev/sde | grep "of test remaining."

smartctl -a /dev/sdf | grep "Self-test execution status"
smartctl -a /dev/sdf | grep "of test remaining."

smartctl -a /dev/sdg | grep "Self-test execution status"
smartctl -a /dev/sdg | grep "of test remaining."

smartctl -a /dev/sdh | grep "Self-test execution status"
smartctl -a /dev/sdh | grep "of test remaining."

smartct
l -a /dev/sdi | grep "Self-test execution status"
smartctl -a /dev/sdi | grep "of test remaining."



New drive is in and I am backing up NAS. My btrfs pool of /myraid is gettin backuped up to a PC with raid1.
I am also duplicating the more inportant files to a SSD. Better safe than sorry.

New drive has had a full surface test (9 hours) and passed with flying colors.

I am not turning this into a drive remove/replace since i want to change the partions of my boot SSD so I am just
goging to nuke the entire system and rebuild from scratch.

my /(root) partion is getting a backup vi tar so i can have access to my old crontab files and maintemce scripts.
I can just yse the old samba.conf as well, etc....



tar -zcvpf /myraid/nas.backup.tar.gz --exclude=/myraid --exclude=/usr --exclude=/proc --exclude=/lib --exclude=/lib64 --exclude=/dev /

Now copy the tar.gz file to a few drives off the server as well.

Make USB install Centos 7.3 min and do the install :)

Install done. I see all 9 drives and 4 network connections. I setup all the NICs during the install and they all seem to be ok
Possibily some tweeking on these later.

I installed the OS on my SSD with 1GB /boot and /boot/efi (I am using EFI). The rest to /

My other 8 drives are on my Supermicro AOC-SAS2LP-MV8 JBOD HBA. I will not touch those untill I get ready to setup btrfs on them.

So now some base stuff


cp /etc/sysconfig/selinux /etc/sysconfig/selinux.bak
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux

cp /etc/selinux/config /etc/selinux/config.bak
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config

systemctl disable firewalld
systemctl stop firewalld

service iptables stop
service ip6tables stop
chkconfig iptables off
chkconfig ip6tables off

yum -y install bind-utils traceroute net-tools ntp* gcc glibc glibc-common gd gd-devel make net-snmp openssl-devel xinetd unzip libtool* make patch perl bison flex-devel gcc-c++ ncurses-devel flex libtermcap-devel autoconf* automake* autoconf libxml2-devel cmake sqlite* wget ntp* lm_sensors ncurses-devel qt-devel hmaccalc zlib-devel binutils-devel elfutils-libelf-devel wget bc gzip uuid* libuuid-devel jansson* libxml2* sqlite* openssl* lsof NetworkManager-tui mlocate yum-utils kernel-devel nfs-utils tcpdump git vim gdisk parted

yum -y groupinstall "Development Tools"
yum -y update
yum -y upgrade

cd /root
echo ':color desert' > .vimrc

systemctl disable kdump.service

reboot

# cat /etc/default/grub
GRUB_TIMEOUT=60
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=cl_bcache/root ipv6.disable=1 zswap.enable=1 consoleblank=0"
GRUB_DISABLE_RECOVERY="true"

Make changes as refelected about to
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=cl_bcache/root ipv6.disable=1 zswap.enable=1 consoleblank=0"

# grub2-mkconfig -o /boot/grub2/grub.cfg
or if using UEFI
# grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

reboot


Now update kerenel

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install yum-plugin-fastestmirror

yum --enablerepo=elrepo-kernel install kernel-ml

reboot

Manualy select new kernel from grub boot screen.

uname -r
4.10.1-1.el7.elrepo.x86_64

Do any testing you need and when happy set this to default entry when happy.

grub2-set-default 0

reboot

uname -r
4.10.1-1.el7.elrepo.x86_64


Now time to physicaly replace drive then setup btrfs.

poweroff

New drive in and server rebooted

parted -l

shows all drives. 7x WD RED NAS drives show still brtfs partitions. New drive has nothing.

idle3ctl show new drive has head parking on. lets turn that off.

# ./idle3ctl /dev/sde
Idle3 timer set to 138 (0x8a)

# ./idle3ctl -d /dev/sde
Idle3 timer disabled
Please power cycle your drive off and on for the new setting to be taken into account. A reboot will not be enough!

So lets powerr off, let set for a min, and power back on and rechedck.

poweroff

looks good on all drives 8 WD RED drives

[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdb
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdc
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdd
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sde
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdf
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdg
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdh
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdi
Idle3 timer is disabled


No lets clean the disk for a new array

I used parted to rm the partiions then a w and q.

Then

wipefs -a /dev/sdb
wipefs -a /dev/sdc
wipefs -a /dev/sdd
wipefs -a /dev/sde
wipefs -a /dev/sdf
wipefs -a /dev/sdg
wipefs -a /dev/sdh
wipefs -a /dev/sdi

Then

dd if=/dev/zero of=/dev/sdb bs=1024 count=1024
dd if=/dev/zero of=/dev/sdc bs=1024 count=1024
dd if=/dev/zero of=/dev/sdd bs=1024 count=1024
dd if=/dev/zero of=/dev/sde bs=1024 count=1024
dd if=/dev/zero of=/dev/sdf bs=1024 count=1024
dd if=/dev/zero of=/dev/sdg bs=1024 count=1024
dd if=/dev/zero of=/dev/sdh bs=1024 count=1024
dd if=/dev/zero of=/dev/sdi bs=1024 count=1024

Then to just look at the devs

ls -lsa /dev/sd*
0 brw-rw---- 1 root disk 8,   0 Mar  1 15:02 /dev/sda
0 brw-rw---- 1 root disk 8,   1 Mar  1 15:02 /dev/sda1
0 brw-rw---- 1 root disk 8,   2 Mar  1 15:02 /dev/sda2
0 brw-rw---- 1 root disk 8,   3 Mar  1 15:02 /dev/sda3
0 brw-rw---- 1 root disk 8,  16 Mar  1 15:11 /dev/sdb
0 brw-rw---- 1 root disk 8,  32 Mar  1 15:11 /dev/sdc
0 brw-rw---- 1 root disk 8,  48 Mar  1 15:11 /dev/sdd
0 brw-rw---- 1 root disk 8,  64 Mar  1 15:11 /dev/sde
0 brw-rw---- 1 root disk 8,  80 Mar  1 15:11 /dev/sdf
0 brw-rw---- 1 root disk 8,  96 Mar  1 15:11 /dev/sdg
0 brw-rw---- 1 root disk 8, 112 Mar  1 15:11 /dev/sdh
0 brw-rw---- 1 root disk 8, 128 Mar  1 15:11 /dev/sdi



fdisk -l also shows they look ready


[root@nas idle3-tools-0.9.1]# fdisk -l
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sda: 128.0 GB, 128035676160 bytes, 250069680 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1         2048      2099199      1G  EFI System      EFI System Partition
 2      2099200      4196351      1G  Microsoft basic
 3      4196352    250068991  117.2G  Linux LVM      

Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdd: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sde: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdf: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdg: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdh: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdi: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/mapper/cl_nas-root: 125.9 GB, 125883645952 bytes, 245866496 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes



Check out btrfs version first

# btrfs --version
btrfs-progs v4.4.1

I think there is a newer on out. Lets go see.

git clone git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

cd btrfs-progs

yum -y install libuuid-devel libattr-devel zlib-devel libacl-devel e2fsprogs-devel libblkid-devel lzo* asciidoc xmlto

./autogen.sh
./configure
make

Lets check version from within the folder

[root@nas btrfs-progs]# ./btrfs --version
btrfs-progs v4.9.1

Yup its newer

Now check from /

[root@nas btrfs-progs]# cd /
[root@nas /]# btrfs --version
btrfs-progs v4.4.1
[root@nas /]#

So we got two versions

I copied all +x files from /root/btrfs-progs to /usr/sbin overright files if they exist.

Now from / of drive I get

[root@nas /]# btrfs --version
btrfs-progs v4.9.1

I hope thats good :)

So lets build an array!!!

First I will use raid0 for some quick testing.

mkfs.btrfs -f -m raid0 -d raid0 -L myraid /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi

also here is for raid10

mkfs.btrfs -f -m raid10 -d raid10 -L myraid /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi

[root@nas ~]# mkfs.btrfs -f -m raid0 -d raid0 -L myraid /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi
btrfs-progs v4.9.1
See http://btrfs.wiki.kernel.org for more information.

Label:              myraid
UUID:               5a5610aa-2615-4ee2-bd4a-076ab2931b70
Node size:          16384
Sector size:        4096
Filesystem size:    21.83TiB
Block group profiles:
  Data:             RAID0             8.00GiB
  Metadata:         RAID0             4.00GiB
  System:           RAID0            16.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  8
Devices:
   ID        SIZE  PATH
    1     2.73TiB  /dev/sdb
    2     2.73TiB  /dev/sdc
    3     2.73TiB  /dev/sdd
    4     2.73TiB  /dev/sde
    5     2.73TiB  /dev/sdf
    6     2.73TiB  /dev/sdg
    7     2.73TiB  /dev/sdh
    8     2.73TiB  /dev/sdi
   
[root@nas ~]# btrfs fi show
Label: 'myraid'  uuid: 5a5610aa-2615-4ee2-bd4a-076ab2931b70
        Total devices 8 FS bytes used 112.00KiB
        devid    1 size 2.73TiB used 1.50GiB path /dev/sdb
        devid    2 size 2.73TiB used 1.50GiB path /dev/sdc
        devid    3 size 2.73TiB used 1.50GiB path /dev/sdd
        devid    4 size 2.73TiB used 1.50GiB path /dev/sde
        devid    5 size 2.73TiB used 1.50GiB path /dev/sdf
        devid    6 size 2.73TiB used 1.50GiB path /dev/sdg
        devid    7 size 2.73TiB used 1.50GiB path /dev/sdh
        devid    8 size 2.73TiB used 1.50GiB path /dev/sdi

Lets mount this thing

mkdir /myraid

mount with UUID from above. uuid: 5a5610aa-2615-4ee2-bd4a-076ab2931b70

mount -t btrfs -o defaults,nodatacow,noatime,x-systemd.device-timeout=30 -U 5a5610aa-2615-4ee2-bd4a-076ab2931b70 /myraid

[root@nas ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                  32G     0   32G   0% /dev
tmpfs                     32G     0   32G   0% /dev/shm
tmpfs                     32G  8.9M   32G   1% /run
tmpfs                     32G     0   32G   0% /sys/fs/cgroup
/dev/mapper/cl_nas-root  118G  2.2G  116G   2% /
/dev/sda2               1014M  191M  824M  19% /boot
/dev/sda1               1022M  9.5M 1013M   1% /boot/efi
tmpfs                    6.3G     0  6.3G   0% /run/user/0
/dev/sdb                  22T   20M   22T   1% /myraid

Oh ya!!! 22TB of btrfs array

My line for fstab i will put in later is:

UUID=5a5610aa-2615-4ee2-bd4a-076ab2931b70   /myraid   btrfs  defaults,nodatacow,noatime,x-systemd.device-timeout=30  0 0


this is how to clear cache when testing transfer speeds to make sure you are not using cache.
Do this between each transfer


sync; echo 2 > /proc/sys/vm/drop_caches
sync; echo 3 > /proc/sys/vm/drop_caches

---tune 10Gb CNA if needed

service irqbalance stop
service cpuspeed stop
chkconfig irqbalance off
chkconfig cpuspeed off
systemctl disable irqbalance
systemctl disable cpuspeed
systemctl stop irqbalance
systemctl stop cpuspeed

vi /etc/sysconfig/network-scripts/ifcfg-eth???
MTU="9000"

vi /etc/sysctl.conf
# -- tuning -- #
# Increase system file descriptor limit
fs.file-max = 65535

# Increase system IP port range to allow for more concurrent connections
net.ipv4.ip_local_port_range = 1024 65000

# -- 10gbe tuning from Intel ixgb driver README -- #

# turn off selective ACK and timestamps
net.ipv4.tcp_sack = 0
net.ipv4.tcp_timestamps = 0

# memory allocation min/pressure/max.
# read buffer, write buffer, and buffer space
net.ipv4.tcp_rmem = 10000000 10000000 10000000
net.ipv4.tcp_wmem = 10000000 10000000 10000000
net.ipv4.tcp_mem = 10000000 10000000 10000000

net.core.rmem_max = 524287
net.core.wmem_max = 524287
net.core.rmem_default = 524287
net.core.wmem_default = 524287
net.core.optmem_max = 524287
net.core.netdev_max_backlog = 300000

reboot and test speed.

on linux client pointing to server with ip 192.168.90.100

# iperf3 -c 192.168.90.100 -p 5201

on linux server with IP 192.168.90.100

iperf3 -s -p 5201 -B 192.168.90.100

---end tune 10Gb CNA if needed


---setup NFS for ESXi server

vi /etc/exports
/myraid/     192.168.10.0/24(rw,async,no_root_squash,no_subtree_check)
/myraid/     192.168.90.0/24(rw,async,no_root_squash,no_subtree_check)

systemctl start rpcbind nfs-server
systemctl enable rpcbind nfs-server

---end setup NFS for ESXi server




--install samaba if needed

yum -y install samba

useradd samba -s /sbin/nologin

smbpasswd -a samba
            Supply a password
            Retype the password
   
mkdir /myraid

chown -R samba:root /myraid/

vi /etc/samba/smb.conf

[global]
workgroup = WORKGROUP ;use name of your workgroup here
server string = Samba Server Version %v
netbios name = NAS

Add this to botton of /etc/samba/smb.conf file

[NAS]
comment = NAS
path = /myraid
writable = yes
valid users = samba


systemctl start smb
systemctl enable smb
systemctl start nmb
systemctl enable nmb

testparm
   
--end install samaba if needed



---install plex if needed


visit plex site and get rpm for your version of OS
copy this to /root

yum -y localinstall name.rpm

systemctl enable plexmediaserver
systemctl start plexmediaserver

---end install plex if needed



---install LAMP

yum -y install httpd mariadb-server mariadb php php-mysql
systemctl enable httpd.service
systemctl start httpd.service
systemctl status httpd.service

Make sure it works with:
http://your_server_IP_address/

systemctl enable mariadb
systemctl start mariadb
systemctl status mariadb
mysql_secure_installation

vi /var/www/html/info.php
<?php phpinfo(); ?>

http://your_server_IP_address/info.php


---End install LAMP


---Extra goodies

yum -y install epel-release
yum -y install stress htop iftop iotop hddtemp smartmontools iperf3 sysstat mlocate
yum -y update


updatedb **this is to update mlocate db


---End Extra goodies




---Use gmail as relay for sending mail

Replace glen@gmail.com with a real email address in items below
Replace mycentserver.mydomain.domain with real hostname in items below
Replace gmail_password with real password in items below

# yum remove postfix

Now install ssmtp.

# yum -y install ssmtp mailx

Now edit your  /etc/ssmtp/ssmtp.conf. I removed everything and just added the below in the file.

#  vi /etc/ssmtp/ssmtp.conf

root=glen@gmail.com
mailhub=smtp.gmail.com:587
rewriteDomain=gmail.com
hostname=mycentserver.mydomain.domain
UseTLS=Yes
UseSTARTTLS=Yes
AuthUser=glen@gmail.com
AuthPass=gmail_password
FromLineOverride=YES

# This solved if you get a ssmtp: Cannot open smtp.gmail.com:587 when try to send an email
# if you enabled uncommenting DEBUG=Yes line and your /var/log/maillog show
# SSL not working: certificate verify failed (20) Uncomment the following line but first
# VERIFY FILE EXISTS
TLS_CA_File=/etc/pki/tls/certs/ca-bundle.crt

# DEBUG=Yes

Now edit your /etc/ssmtp/revaliases file and add the following.

# vi /etc/ssmtp/revaliases

root:glen@gmail.com:smtp.gmail.com:587

Now run

# alternatives --config mta

And choose the number for sendmail.ssmtp, like below

There is 1 program that provides 'mta'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/sbin/sendmail.ssmtp

Enter to keep the current selection[+], or type selection number: 1
#

Now send email to your gmail account from Centos cli

# mail -s "Test Subject" glen@gmail.com

Type your message text and on new line press ctrl d to send

---End Use gmail as relay for sending mail


Thursday, March 2, 2017

Centos 7 backing up my NAS

I put the tar in /myraid (my btrfs 8 drive raid10) then copy this off to another machine that then replicates to cloud.

So if my boot drive goes out I have the backup on the array, on another onsite machine, and in the cloud.

tar --exclude='/myraid' --exclude='/proc' --exclude='/sys' -czvf /myraid/nas_backup.tar.gz /

I don't have enough cloud storage to replicate /myraid but I copy the entire /myraid mount to another onsite machine with a raid1. I guess its better than nothing :)