Broadsoft, Adtran, and Centos: BTRFS Drive errors from 2017-02-26

BTRFS Drive errors from 2017-02-26

Was not getting failed scrubs but seeing some errors.

This is 8x 3TB WD Red drives in raid10 on a Supermicro AOC-SAS2LP-MV8 Add-on Card, 8-Channel SAS/SATA Adapter with 600MB/s per Channel in a PCIE x16 slot running at x8 on a Supermicro ATX DDR4 LGA 1151 C7Z170-OCE-O Motherboard with 64GB DDR4 RAM (4x 16GB sticks).

FYI Everything was done online unless otherwise stated.

What file system looks like in btrfs.

[root@nas ~]# btrfs fi show
Label: 'myraid' uuid: 1ec4f641-74a8-466e-89cc-e687672aaaea
        Total devices 8 FS bytes used 1.16TiB
        devid    1 size 2.73TiB used 301.53GiB path /dev/sdb
        devid    2 size 2.73TiB used 301.53GiB path /dev/sdc
        devid    3 size 2.73TiB used 301.53GiB path /dev/sdd
        devid    4 size 2.73TiB used 301.53GiB path /dev/sde
        devid    6 size 2.73TiB used 301.53GiB path /dev/sdg
        devid    7 size 2.73TiB used 301.53GiB path /dev/sdh
        devid    8 size 2.73TiB used 301.53GiB path /dev/sdi
        devid    9 size 2.73TiB used 301.53GiB path /dev/sdf

Take a look at device stats.

[root@nas ~]# /usr/local/bin/btrfs device stats /myraid/
[/dev/sdb].write_io_errs   0
[/dev/sdb].read_io_errs    0
[/dev/sdb].flush_io_errs   0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdc].write_io_errs   0
[/dev/sdc].read_io_errs    0
[/dev/sdc].flush_io_errs   0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0
[/dev/sdd].write_io_errs   0
[/dev/sdd].read_io_errs    0
[/dev/sdd].flush_io_errs   0
[/dev/sdd].corruption_errs 0
[/dev/sdd].generation_errs 0
[/dev/sde].write_io_errs   0
[/dev/sde].read_io_errs    44
[/dev/sde].flush_io_errs   0
[/dev/sde].corruption_errs 0
[/dev/sde].generation_errs 0
[/dev/sdg].write_io_errs   0
[/dev/sdg].read_io_errs    0
[/dev/sdg].flush_io_errs   0
[/dev/sdg].corruption_errs 0
[/dev/sdg].generation_errs 0
[/dev/sdh].write_io_errs   0
[/dev/sdh].read_io_errs    0
[/dev/sdh].flush_io_errs   0
[/dev/sdh].corruption_errs 0
[/dev/sdh].generation_errs 0
[/dev/sdi].write_io_errs   0
[/dev/sdi].read_io_errs    0
[/dev/sdi].flush_io_errs   0
[/dev/sdi].corruption_errs 0
[/dev/sdi].generation_errs 0
[/dev/sdf].write_io_errs   0
[/dev/sdf].read_io_errs    0
[/dev/sdf].flush_io_errs   0
[/dev/sdf].corruption_errs 0
[/dev/sdf].generation_errs 0

Run extended smartcl test.

[root@nas ~]# smartctl -t long /dev/sdb
[root@nas ~]# smartctl -t long /dev/sdc
[root@nas ~]# smartctl -t long /dev/sdd
[root@nas ~]# smartctl -t long /dev/sde
[root@nas ~]# smartctl -t long /dev/sdf
[root@nas ~]# smartctl -t long /dev/sdg
[root@nas ~]# smartctl -t long /dev/sdh
[root@nas ~]# smartctl -t long /dev/sdi

I waited an hour then reviewed the results.

[root@nas ~]# smartctl -l selftest /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description    Status                  Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline    Completed without error       00%     12660         -
# 2 Extended offline    Completed without error       00%      8916         -
# 3 Short offline       Completed without error       00%      6097         -
# 4 Extended offline    Completed without error       00%      4288         -
# 5 Short offline       Completed without error       00%      4245         -
# 6 Short offline       Completed without error       00%      4242         -
# 7 Short offline       Interrupted (host reset)      50%      4241         -
# 8 Short offline       Completed without error       00%      4172         -
# 9 Short offline       Completed without error       00%      4109         -

[root@nas ~]# smartctl -l selftest /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description    Status                  Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline    Completed without error       00%     12660         -
# 2 Extended offline    Completed without error       00%      8916         -
# 3 Short offline       Completed without error       00%      6096         -
# 4 Extended offline    Completed without error       00%      4288         -
# 5 Short offline       Completed without error       00%      4109         -

[root@nas ~]# smartctl -l selftest /dev/sdd
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description    Status                  Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline    Completed without error       00%     13003         -
# 2 Extended offline    Completed without error       00%      9260         -
# 3 Short offline       Completed without error       00%      6440         -
# 4 Extended offline    Completed without error       00%      4632         -
# 5 Short offline       Completed without error       00%      4452         -
# 6 Short offline       Completed without error       00%         0         -

[root@nas ~]# smartctl -l selftest /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description    Status                  Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline    Completed: read failure       90%      8452         339784376
# 2 Extended offline    Completed without error       00%      4716         -
# 3 Short offline       Completed without error       00%      1896         -
# 4 Extended offline    Completed without error       00%        88         -
# 5 Short offline       Completed without error       00%        15         -
# 6 Short offline       Aborted by host               90%        12         -
# 7 Short offline       Aborted by host               90%        12         -
# 8 Short offline       Aborted by host               90%        12         -
# 9 Short offline       Aborted by host               90%         5         -
#10 Short offline       Aborted by host               90%         5         -
#11 Short offline       Aborted by host               90%         5         -
#12 Short offline       Aborted by host               90%         4         -
#13 Short offline       Aborted by host               90%         4         -
#14 Short offline       Aborted by host               90%         4         -
#15 Short offline       Aborted by host               90%         0         -
#16 Short offline       Aborted by host               90%         0         -

[root@nas ~]# smartctl -l selftest /dev/sdf
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description    Status                  Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline    Completed without error       00%      3118         -
# 2 Short offline       Completed without error       00%        12         -

[root@nas ~]# smartctl -l selftest /dev/sdg
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description    Status                  Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline    Completed without error       00%      6298         -
# 2 Extended offline    Completed without error       00%      2555         -

[root@nas ~]# smartctl -l selftest /dev/sdh
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description    Status                  Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline    Completed without error       00%      6147         -
# 2 Extended offline    Completed without error       00%      2404         -
# 3 Short offline       Completed without error       00%         0         -

[root@nas ~]# smartctl -l selftest /dev/sdi
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description    Status                  Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline    Completed without error       00%      6147         -
# 2 Extended offline    Completed without error       00%      2404         -
# 3 Short offline       Completed without error       00%         0         -

[root@nas ~]# smartctl -a /dev/sdb | grep "Raw_Read_Error_Rate"
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail Always       -       0
[root@nas ~]# smartctl -a /dev/sdc | grep "Raw_Read_Error_Rate"
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail Always       -       0
[root@nas ~]# smartctl -a /dev/sdd | grep "Raw_Read_Error_Rate"
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail Always       -       0
[root@nas ~]# smartctl -a /dev/sde | grep "Raw_Read_Error_Rate"
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail Always       -       1258
[root@nas ~]# smartctl -a /dev/sdf | grep "Raw_Read_Error_Rate"
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail Always       -       0
[root@nas ~]# smartctl -a /dev/sdg | grep "Raw_Read_Error_Rate"
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail Always       -       0
[root@nas ~]# smartctl -a /dev/sdh | grep "Raw_Read_Error_Rate"
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail Always       -       0
[root@nas ~]# smartctl -a /dev/sdi | grep "Raw_Read_Error_Rate"
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail Always       -       0

Yup. /dev/sde seems to have an issue.

Get serial number of all drives for a possible RMA on /dev/sde.

[root@nas ~]# smartctl -a /dev/sdb | grep "Serial Number:"
Serial Number:    WD-WMC4N0J0YT1V
[root@nas ~]# smartctl -a /dev/sdc | grep "Serial Number:"
Serial Number:    WD-WMC4N0J2L138
[root@nas ~]# smartctl -a /dev/sdd | grep "Serial Number:"
Serial Number:    WD-WCC4N2FJRTU9
[root@nas ~]# smartctl -a /dev/sde | grep "Serial Number:"
Serial Number:    WD-WCC4N4SSDRFN
[root@nas ~]# smartctl -a /dev/sdf | grep "Serial Number:"
Serial Number:    WD-WCC4N1VYZH52
[root@nas ~]# smartctl -a /dev/sdg | grep "Serial Number:"
Serial Number:    WD-WMC4N0M57KEY
[root@nas ~]# smartctl -a /dev/sdh | grep "Serial Number:"
Serial Number:    WD-WCC4N5YF2Z2Y
[root@nas ~]# smartctl -a /dev/sdi | grep "Serial Number:"
Serial Number:    WD-WCC4N5CJ6H8U

Get List of BadBlocks on all drives. Run in background and save to file.

badblocks -v /dev/sdb > /tmp/bad-blocks-b.txt &
badblocks -v /dev/sdc > /tmp/bad-blocks-c.txt &
badblocks -v /dev/sdd > /tmp/bad-blocks-d.txt &
badblocks -v /dev/sde > /tmp/bad-blocks-e.txt &
badblocks -v /dev/sdf > /tmp/bad-blocks-f.txt &
badblocks -v /dev/sdg > /tmp/bad-blocks-g.txt &
badblocks -v /dev/sdh > /tmp/bad-blocks-h.txt &
badblocks -v /dev/sdi > /tmp/bad-blocks-i.txt &

Monitor file size with:

[root@nas ~]# watch ls -lsa /tmp/bad-blocks-*.txt

If you have a really bad drive it could create a file the size of the drive itself so be sure to monitor and make sure you do not fill up your /tmp directory.

If you need to kill it then ket the pid with:

[root@nas tmp]# ps -ef | grep "badblocks"
UID        PID PPID C STIME TTY      TIME     CMD
root     27013 25404 3 10:43 pts/0    00:01:12 badblocks -v /dev/sdb
root     27014 25404 3 10:43 pts/0    00:01:12 badblocks -v /dev/sdc
root     27015 25404 3 10:43 pts/0    00:01:12 badblocks -v /dev/sdd
root     27016 25404 2 10:43 pts/0    00:01:11 badblocks -v /dev/sde
root     27017 25404 3 10:43 pts/0    00:01:13 badblocks -v /dev/sdf
root     27018 25404 3 10:43 pts/0    00:01:12 badblocks -v /dev/sdg
root     27019 25404 3 10:43 pts/0    00:01:12 badblocks -v /dev/sdh
root     27020 25404 3 10:43 pts/0    00:01:12 badblocks -v /dev/sdi
root     31044 26976 0 11:22 pts/1    00:00:00 grep --color=auto badblocks

While badblock test is running I have already got a RMA number from WD and a shipping label on my printer. I ordered a new drive from Amazon that will be here on the 28th. I will swap out then and ship bad drive back on the 29th.

Running smartctl test long on all drives

smartctl -t long /dev/sdb
smartctl -t long /dev/sdc
smartctl -t long /dev/sdd
smartctl -t long /dev/sde
smartctl -t long /dev/sdf
smartctl -t long /dev/sdg
smartctl -t long /dev/sdh
smartctl -t long /dev/sdi

Check progress of test

smartctl -a /dev/sdb | grep "Self-test execution status"
smartctl -a /dev/sdb | grep "of test remaining."

smartctl -a /dev/sdc | grep "Self-test execution status"
smartctl -a /dev/sdc | grep "of test remaining."

smartctl -a /dev/sdd | grep "Self-test execution status"
smartctl -a /dev/sdd | grep "of test remaining."

smartctl -a /dev/sde | grep "Self-test execution status"
smartctl -a /dev/sde | grep "of test remaining."

smartctl -a /dev/sdf | grep "Self-test execution status"
smartctl -a /dev/sdf | grep "of test remaining."

smartctl -a /dev/sdg | grep "Self-test execution status"
smartctl -a /dev/sdg | grep "of test remaining."

smartctl -a /dev/sdh | grep "Self-test execution status"
smartctl -a /dev/sdh | grep "of test remaining."

smartct
l -a /dev/sdi | grep "Self-test execution status"
smartctl -a /dev/sdi | grep "of test remaining."

New drive is in and I am backing up NAS. My btrfs pool of /myraid is gettin backuped up to a PC with raid1.
I am also duplicating the more inportant files to a SSD. Better safe than sorry.

New drive has had a full surface test (9 hours) and passed with flying colors.

I am not turning this into a drive remove/replace since i want to change the partions of my boot SSD so I am just
goging to nuke the entire system and rebuild from scratch.

my /(root) partion is getting a backup vi tar so i can have access to my old crontab files and maintemce scripts.
I can just yse the old samba.conf as well, etc....

tar -zcvpf /myraid/nas.backup.tar.gz --exclude=/myraid --exclude=/usr --exclude=/proc --exclude=/lib --exclude=/lib64 --exclude=/dev /

Now copy the tar.gz file to a few drives off the server as well.

Make USB install Centos 7.3 min and do the install :)

Install done. I see all 9 drives and 4 network connections. I setup all the NICs during the install and they all seem to be ok
Possibily some tweeking on these later.

I installed the OS on my SSD with 1GB /boot and /boot/efi (I am using EFI). The rest to /

My other 8 drives are on my Supermicro AOC-SAS2LP-MV8 JBOD HBA. I will not touch those untill I get ready to setup btrfs on them.

So now some base stuff

cp /etc/sysconfig/selinux /etc/sysconfig/selinux.bak
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux

cp /etc/selinux/config /etc/selinux/config.bak
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config

systemctl disable firewalld
systemctl stop firewalld

service iptables stop
service ip6tables stop
chkconfig iptables off
chkconfig ip6tables off

yum -y install bind-utils traceroute net-tools ntp* gcc glibc glibc-common gd gd-devel make net-snmp openssl-devel xinetd unzip libtool* make patch perl bison flex-devel gcc-c++ ncurses-devel flex libtermcap-devel autoconf* automake* autoconf libxml2-devel cmake sqlite* wget ntp* lm_sensors ncurses-devel qt-devel hmaccalc zlib-devel binutils-devel elfutils-libelf-devel wget bc gzip uuid* libuuid-devel jansson* libxml2* sqlite* openssl* lsof NetworkManager-tui mlocate yum-utils kernel-devel nfs-utils tcpdump git vim gdisk parted

yum -y groupinstall "Development Tools"
yum -y update
yum -y upgrade

cd /root
echo ':color desert' > .vimrc

systemctl disable kdump.service

reboot

# cat /etc/default/grub
GRUB_TIMEOUT=60
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=cl_bcache/root ipv6.disable=1 zswap.enable=1 consoleblank=0"
GRUB_DISABLE_RECOVERY="true"

Make changes as refelected about to
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=cl_bcache/root ipv6.disable=1 zswap.enable=1 consoleblank=0"

# grub2-mkconfig -o /boot/grub2/grub.cfg
or if using UEFI
# grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

reboot

Now update kerenel

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum install yum-plugin-fastestmirror

yum --enablerepo=elrepo-kernel install kernel-ml

reboot

Manualy select new kernel from grub boot screen.

uname -r
4.10.1-1.el7.elrepo.x86_64

Do any testing you need and when happy set this to default entry when happy.

grub2-set-default 0

reboot

uname -r
4.10.1-1.el7.elrepo.x86_64

Now time to physicaly replace drive then setup btrfs.

poweroff

New drive in and server rebooted

parted -l

shows all drives. 7x WD RED NAS drives show still brtfs partitions. New drive has nothing.

idle3ctl show new drive has head parking on. lets turn that off.

# ./idle3ctl /dev/sde
Idle3 timer set to 138 (0x8a)

# ./idle3ctl -d /dev/sde
Idle3 timer disabled
Please power cycle your drive off and on for the new setting to be taken into account. A reboot will not be enough!

So lets powerr off, let set for a min, and power back on and rechedck.

poweroff

looks good on all drives 8 WD RED drives

[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdb
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdc
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdd
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sde
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdf
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdg
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdh
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl /dev/sdi
Idle3 timer is disabled

No lets clean the disk for a new array

I used parted to rm the partiions then a w and q.

Then

wipefs -a /dev/sdb
wipefs -a /dev/sdc
wipefs -a /dev/sdd
wipefs -a /dev/sde
wipefs -a /dev/sdf
wipefs -a /dev/sdg
wipefs -a /dev/sdh
wipefs -a /dev/sdi

Then

dd if=/dev/zero of=/dev/sdb bs=1024 count=1024
dd if=/dev/zero of=/dev/sdc bs=1024 count=1024
dd if=/dev/zero of=/dev/sdd bs=1024 count=1024
dd if=/dev/zero of=/dev/sde bs=1024 count=1024
dd if=/dev/zero of=/dev/sdf bs=1024 count=1024
dd if=/dev/zero of=/dev/sdg bs=1024 count=1024
dd if=/dev/zero of=/dev/sdh bs=1024 count=1024
dd if=/dev/zero of=/dev/sdi bs=1024 count=1024

Then to just look at the devs

ls -lsa /dev/sd*
0 brw-rw---- 1 root disk 8,   0 Mar 1 15:02 /dev/sda
0 brw-rw---- 1 root disk 8,   1 Mar 1 15:02 /dev/sda1
0 brw-rw---- 1 root disk 8,   2 Mar 1 15:02 /dev/sda2
0 brw-rw---- 1 root disk 8,   3 Mar 1 15:02 /dev/sda3
0 brw-rw---- 1 root disk 8, 16 Mar 1 15:11 /dev/sdb
0 brw-rw---- 1 root disk 8, 32 Mar 1 15:11 /dev/sdc
0 brw-rw---- 1 root disk 8, 48 Mar 1 15:11 /dev/sdd
0 brw-rw---- 1 root disk 8, 64 Mar 1 15:11 /dev/sde
0 brw-rw---- 1 root disk 8, 80 Mar 1 15:11 /dev/sdf
0 brw-rw---- 1 root disk 8, 96 Mar 1 15:11 /dev/sdg
0 brw-rw---- 1 root disk 8, 112 Mar 1 15:11 /dev/sdh
0 brw-rw---- 1 root disk 8, 128 Mar 1 15:11 /dev/sdi

fdisk -l also shows they look ready

[root@nas idle3-tools-0.9.1]# fdisk -l
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sda: 128.0 GB, 128035676160 bytes, 250069680 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt

#         Start          End    Size Type            Name
1         2048      2099199      1G EFI System      EFI System Partition
2      2099200      4196351      1G Microsoft basic
3      4196352    250068991 117.2G Linux LVM

Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sdc: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sdd: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sde: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sdf: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sdg: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sdh: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sdi: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/mapper/cl_nas-root: 125.9 GB, 125883645952 bytes, 245866496 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Check out btrfs version first

# btrfs --version
btrfs-progs v4.4.1

I think there is a newer on out. Lets go see.

git clone git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

cd btrfs-progs

yum -y install libuuid-devel libattr-devel zlib-devel libacl-devel e2fsprogs-devel libblkid-devel lzo* asciidoc xmlto

./autogen.sh
./configure
make

Lets check version from within the folder

[root@nas btrfs-progs]# ./btrfs --version
btrfs-progs v4.9.1

Yup its newer

Now check from /

[root@nas btrfs-progs]# cd /
[root@nas /]# btrfs --version
btrfs-progs v4.4.1
[root@nas /]#

So we got two versions

I copied all +x files from /root/btrfs-progs to /usr/sbin overright files if they exist.

Now from / of drive I get

[root@nas /]# btrfs --version
btrfs-progs v4.9.1

I hope thats good :)

So lets build an array!!!

First I will use raid0 for some quick testing.

mkfs.btrfs -f -m raid0 -d raid0 -L myraid /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi

also here is for raid10

mkfs.btrfs -f -m raid10 -d raid10 -L myraid /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi

[root@nas ~]# mkfs.btrfs -f -m raid0 -d raid0 -L myraid /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi
btrfs-progs v4.9.1
See http://btrfs.wiki.kernel.org for more information.

Label:              myraid
UUID:               5a5610aa-2615-4ee2-bd4a-076ab2931b70
Node size:          16384
Sector size:        4096
Filesystem size:    21.83TiB
Block group profiles:
Data:             RAID0             8.00GiB
Metadata:         RAID0             4.00GiB
System:           RAID0            16.00MiB
SSD detected:       no
Incompat features: extref, skinny-metadata
Number of devices: 8
Devices:
   ID        SIZE PATH
    1     2.73TiB /dev/sdb
    2     2.73TiB /dev/sdc
    3     2.73TiB /dev/sdd
    4     2.73TiB /dev/sde
    5     2.73TiB /dev/sdf
    6     2.73TiB /dev/sdg
    7     2.73TiB /dev/sdh
    8     2.73TiB /dev/sdi

[root@nas ~]# btrfs fi show
Label: 'myraid' uuid: 5a5610aa-2615-4ee2-bd4a-076ab2931b70
        Total devices 8 FS bytes used 112.00KiB
        devid    1 size 2.73TiB used 1.50GiB path /dev/sdb
        devid    2 size 2.73TiB used 1.50GiB path /dev/sdc
        devid    3 size 2.73TiB used 1.50GiB path /dev/sdd
        devid    4 size 2.73TiB used 1.50GiB path /dev/sde
        devid    5 size 2.73TiB used 1.50GiB path /dev/sdf
        devid    6 size 2.73TiB used 1.50GiB path /dev/sdg
        devid    7 size 2.73TiB used 1.50GiB path /dev/sdh
        devid    8 size 2.73TiB used 1.50GiB path /dev/sdi

Lets mount this thing

mkdir /myraid

mount with UUID from above. uuid: 5a5610aa-2615-4ee2-bd4a-076ab2931b70

mount -t btrfs -o defaults,nodatacow,noatime,x-systemd.device-timeout=30 -U 5a5610aa-2615-4ee2-bd4a-076ab2931b70 /myraid

[root@nas ~]# df -h
Filesystem               Size Used Avail Use% Mounted on
devtmpfs                  32G     0   32G   0% /dev
tmpfs                     32G     0   32G   0% /dev/shm
tmpfs                     32G 8.9M   32G   1% /run
tmpfs                     32G     0   32G   0% /sys/fs/cgroup
/dev/mapper/cl_nas-root 118G 2.2G 116G   2% /
/dev/sda2               1014M 191M 824M 19% /boot
/dev/sda1               1022M 9.5M 1013M   1% /boot/efi
tmpfs                    6.3G     0 6.3G   0% /run/user/0
/dev/sdb                  22T   20M   22T   1% /myraid

Oh ya!!! 22TB of btrfs array

My line for fstab i will put in later is:

UUID=5a5610aa-2615-4ee2-bd4a-076ab2931b70   /myraid   btrfs defaults,nodatacow,noatime,x-systemd.device-timeout=30 0 0

this is how to clear cache when testing transfer speeds to make sure you are not using cache.
Do this between each transfer

sync; echo 2 > /proc/sys/vm/drop_caches
sync; echo 3 > /proc/sys/vm/drop_caches

---tune 10Gb CNA if needed

service irqbalance stop
service cpuspeed stop
chkconfig irqbalance off
chkconfig cpuspeed off
systemctl disable irqbalance
systemctl disable cpuspeed
systemctl stop irqbalance
systemctl stop cpuspeed

vi /etc/sysconfig/network-scripts/ifcfg-eth???
MTU="9000"

vi /etc/sysctl.conf
# -- tuning -- #
# Increase system file descriptor limit
fs.file-max = 65535

# Increase system IP port range to allow for more concurrent connections
net.ipv4.ip_local_port_range = 1024 65000

# -- 10gbe tuning from Intel ixgb driver README -- #

# turn off selective ACK and timestamps
net.ipv4.tcp_sack = 0
net.ipv4.tcp_timestamps = 0

# memory allocation min/pressure/max.
# read buffer, write buffer, and buffer space
net.ipv4.tcp_rmem = 10000000 10000000 10000000
net.ipv4.tcp_wmem = 10000000 10000000 10000000
net.ipv4.tcp_mem = 10000000 10000000 10000000

net.core.rmem_max = 524287
net.core.wmem_max = 524287
net.core.rmem_default = 524287
net.core.wmem_default = 524287
net.core.optmem_max = 524287
net.core.netdev_max_backlog = 300000

reboot and test speed.

on linux client pointing to server with ip 192.168.90.100

# iperf3 -c 192.168.90.100 -p 5201

on linux server with IP 192.168.90.100

iperf3 -s -p 5201 -B 192.168.90.100

---end tune 10Gb CNA if needed

---setup NFS for ESXi server

vi /etc/exports
/myraid/     192.168.10.0/24(rw,async,no_root_squash,no_subtree_check)
/myraid/     192.168.90.0/24(rw,async,no_root_squash,no_subtree_check)

systemctl start rpcbind nfs-server
systemctl enable rpcbind nfs-server

---end setup NFS for ESXi server

--install samaba if needed

yum -y install samba

useradd samba -s /sbin/nologin

smbpasswd -a samba
            Supply a password
            Retype the password

mkdir /myraid

chown -R samba:root /myraid/

vi /etc/samba/smb.conf

[global]
workgroup = WORKGROUP ;use name of your workgroup here
server string = Samba Server Version %v
netbios name = NAS

Add this to botton of /etc/samba/smb.conf file

[NAS]
comment = NAS
path = /myraid
writable = yes
valid users = samba

systemctl start smb
systemctl enable smb
systemctl start nmb
systemctl enable nmb

testparm

--end install samaba if needed

---install plex if needed

visit plex site and get rpm for your version of OS
copy this to /root

yum -y localinstall name.rpm

systemctl enable plexmediaserver
systemctl start plexmediaserver

---end install plex if needed

---install LAMP

yum -y install httpd mariadb-server mariadb php php-mysql
systemctl enable httpd.service
systemctl start httpd.service
systemctl status httpd.service

Make sure it works with:
http://your_server_IP_address/

systemctl enable mariadb
systemctl start mariadb
systemctl status mariadb
mysql_secure_installation

vi /var/www/html/info.php
<?php phpinfo(); ?>

http://your_server_IP_address/info.php

---End install LAMP

---Extra goodies

yum -y install epel-release
yum -y install stress htop iftop iotop hddtemp smartmontools iperf3 sysstat mlocate
yum -y update

updatedb **this is to update mlocate db

---End Extra goodies

---Use gmail as relay for sending mail

Replace glen@gmail.com with a real email address in items below
Replace mycentserver.mydomain.domain with real hostname in items below
Replace gmail_password with real password in items below

# yum remove postfix

Now install ssmtp.

# yum -y install ssmtp mailx

Now edit your /etc/ssmtp/ssmtp.conf. I removed everything and just added the below in the file.

# vi /etc/ssmtp/ssmtp.conf

root=glen@gmail.com
mailhub=smtp.gmail.com:587
rewriteDomain=gmail.com
hostname=mycentserver.mydomain.domain
UseTLS=Yes
UseSTARTTLS=Yes
AuthUser=glen@gmail.com
AuthPass=gmail_password
FromLineOverride=YES

# This solved if you get a ssmtp: Cannot open smtp.gmail.com:587 when try to send an email
# if you enabled uncommenting DEBUG=Yes line and your /var/log/maillog show
# SSL not working: certificate verify failed (20) Uncomment the following line but first
# VERIFY FILE EXISTS
TLS_CA_File=/etc/pki/tls/certs/ca-bundle.crt

# DEBUG=Yes

Now edit your /etc/ssmtp/revaliases file and add the following.

# vi /etc/ssmtp/revaliases

root:glen@gmail.com:smtp.gmail.com:587

Now run

# alternatives --config mta

And choose the number for sendmail.ssmtp, like below

There is 1 program that provides 'mta'.

Selection    Command
-----------------------------------------------
*+ 1           /usr/sbin/sendmail.ssmtp

Enter to keep the current selection[+], or type selection number: 1
#

Now send email to your gmail account from Centos cli

# mail -s "Test Subject" glen@gmail.com

Type your message text and on new line press ctrl d to send

---End Use gmail as relay for sending mail

Broadsoft, Adtran, and Centos

Why?

Search This Blog

Friday, March 3, 2017

BTRFS Drive errors from 2017-02-26

No comments:

Post a Comment