Replace bad drive on btrfs 8 drive raid10 array in NAS
My first failed drive in a btrfs file system. Here it goes :)
NAS build overview first.
• Supermicro C7Z170-OCE Motherboard
• Intel i5-6600k with Hyper 212 EVO cooler
• 64GB DDR4 Corsair LPX RAM (4x16GB)
• Samsung 850 pro 120GB SSD for boot and OS
• Supermicro 8-Port SAS/SATA Card - (AOC-SAS2LP-MV8) 8-channel SAS/SATA adapter with 600MB/s per channel
• 8x WD Red NAS 3TB drives in btrfs raid10 on the Supermicro 8-Port SAS/SATA Card
• 10G-Tek 10Gb dual port CNA card SFP+
• Fractal Design R5 case with Noctua SSO2 NF-A14 case fans
• Running CentOS 7 with 4.7 kernel and btrfs-progs v4.7
• Using Plex Media Server, SAMBA, and NFS (10Gb Ethernet to my ESXi server so I can run images off of the btrfs array over NFS).
Drive 5 (/dev/sdf) went bad. Cluster errors. Ran badblocks check and was spitting tons of errors. All other drives checked out fine. Command to check bad blocks is:
badblocks -v /dev/sd*
With * being the letter of your device. I ran them all at the same time, in order to save time. Commands I used for this are:
badblocks -v /dev/sdb > /tmp/bad-blocks-b.txt &
badblocks -v /dev/sdc > /tmp/bad-blocks-c.txt &
badblocks -v /dev/sdd > /tmp/bad-blocks-d.txt &
badblocks -v /dev/sde > /tmp/bad-blocks-e.txt &
badblocks -v /dev/sdf > /tmp/bad-blocks-f.txt &
badblocks -v /dev/sdg > /tmp/bad-blocks-g.txt &
badblocks -v /dev/sdh > /tmp/bad-blocks-h.txt &
badblocks -v /dev/sdi > /tmp/bad-blocks-i.txt &
Now with all my checks running in the background I can monitor things via the txt files. You can see the processes running using "htop", "ps -ef | grep badblocks", or whatever you like.
I noticed the /tmp/bad-blocks-f.txt growing to over a Gig in size so i killed this process, as i already new drive was bad. I let the others run to completion. Not sure how long this took. It ran over night and I did not try and time it.
FYI The command i used to create the array originally was:
[root@nas /]# mkfs.btrfs -f -m raid10 -d raid10 -L myraid /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi
As you can see I used raid10 for the metadata (-m) and the data (-d) and labeled it myraid (-L) using all 8 drives. The -f was to force.
I noticed the bad drive on 9/13/2016 and ordered new drive from Amazon that day. Drive was delivered to me on 9/15/2016. I put the drive in my PC and ran WD Diags on it with extended test. It took about 4 hours to complete. 100% pass.
I don’t have any spare sata ports in my NAS so I’m going to have to remove the drive and replace with new drive. I ran "smartctl -i" on all my drives so I could get serial numbers and make sure I replaced the correct drive before I shut down. My drives are /dev/sd[b-i] (/dev/sda is an SSD I use for boot and OS).
[root@nas ~]# smartctl -i /dev/sdf
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.7.0-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WMC4N0M1P20N
LU WWN Device Id: 5 0014ee 6b0f85183
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Sep 13 14:20:53 2016 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
So my serial number for this drive, /dev/sdf, is WMC4N0M1P20N. I shut down the system, found and replaced drive. On boot it got stuck on mounting the btrfs filesystem that I auto mount in /etc/fstab, as it has a missing drive now (The new drive I replaced the old one with is not setup for the array yet so it says its missing). I rebooted in safe mode and remd out the line in /etc/fstab for this mount. That line was:
UUID=1ec4f641-74a8-466e-89cc-e687672aaaea /myraid btrfs defaults,nodatacow,noatime,x-systemd.device-timeout=0 0 0
So I just put a # in front of it, saved and exit. Reboot system. System up now. Now to work with the btrfs file system I have to get it mounted, so I mount it in degraded mode, as it is degraded with the missing disk. Command I used for that is:
mount -t btrfs -o degraded UUID=1ec4f641-74a8-466e-89cc-e687672aaaea /myraid
BTW i use the UUID for my mounts. You can get this from running "btrfs fi show".
I now inspect the filesystem with:
[root@nas ~]# btrfs fi show
Label: 'myraid' uuid: 1ec4f641-74a8-466e-89cc-e687672aaaea
Total devices 8 FS bytes used 788.93GiB
devid 1 size 2.73TiB used 200.27GiB path /dev/sdb
devid 2 size 2.73TiB used 200.27GiB path /dev/sdc
devid 3 size 2.73TiB used 200.27GiB path /dev/sdd
devid 4 size 2.73TiB used 200.27GiB path /dev/sde
devid 6 size 2.73TiB used 200.27GiB path /dev/sdg
devid 7 size 2.73TiB used 200.27GiB path /dev/sdh
devid 8 size 2.73TiB used 200.27GiB path /dev/sdi
*** Some devices missing
As you can see there is no devid 5, /dev/sdf. Now I can tell btrfs to delete the missing device with:
[root@nas ~]# btrfs device delete missing /myraid/
It goes through a lengthy process of relocating blocks from /dev/sdg. Raid10 is striped mirrors so it pairs them like /dev/sdb with /dev/sdc, /dev/sdd with /dev/sde, /dev/sdf with /dev/sdg, etc.
Sep 16 17:04:44 nas kernel: BTRFS info (device sdg): relocating block group 2407559856128 flags 65
Sep 16 17:04:52 nas kernel: BTRFS info (device sdg): found 43 extents
Sep 16 17:05:02 nas kernel: BTRFS info (device sdg): found 43 extents
Sep 16 17:05:03 nas kernel: BTRFS info (device sdg): relocating block group 2394674954240 flags 65
Sep 16 17:05:10 nas kernel: BTRFS info (device sdg): found 38 extents
Sep 16 17:05:20 nas kernel: BTRFS info (device sdg): found 38 extents
Sep 16 17:05:21 nas kernel: BTRFS info (device sdg): relocating block group 2738742099968 flags 65
Sep 16 17:05:28 nas kernel: BTRFS info (device sdg): found 32 extents
Sep 16 17:05:38 nas kernel: BTRFS info (device sdg): found 32 extents
.....
The lines above get displayed on the console. If you are ssh into your system you can also see them in /var/log/messages.
Now I wait for the process to complete. Start time is 5:04:44PM AZ time on 9/16/2016. (Insert jeopardy theme song here).
It’s now 5:52:18PM AZ time. More jeopardy theme music.
6:05:24PM and it finished. Let’s inspect.
[root@nas log]# btrfs fi show
Label: 'myraid' uuid: 1ec4f641-74a8-466e-89cc-e687672aaaea
Total devices 7 FS bytes used 788.83GiB
devid 1 size 2.73TiB used 226.34GiB path /dev/sdb
devid 2 size 2.73TiB used 225.38GiB path /dev/sdc
devid 3 size 2.73TiB used 226.03GiB path /dev/sdd
devid 4 size 2.73TiB used 225.38GiB path /dev/sde
devid 6 size 2.73TiB used 226.38GiB path /dev/sdg
devid 7 size 2.73TiB used 225.38GiB path /dev/sdh
devid 8 size 2.73TiB used 225.38GiB path /dev/sdi
SWEET! now more missing drive. Notice how the data on each of the drives is more than when I ran the command with the missing drive.
Now let’s take out the rem from the /etc/fstab file and reboot the system. Fingers crossed :)
System booted fine but my space on the array seems a little off. Before this started I had 11TB of disk space show up when I ran "df -h". Now its:
[root@nas ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4.0K 32G 1% /dev/shm
tmpfs 32G 8.9M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/centos_nas-root 50G 4.4G 46G 9% /
/dev/mapper/centos_nas-home 57G 33M 57G 1% /home
/dev/sda2 497M 116M 381M 24% /boot
/dev/sda1 200M 9.5M 191M 5% /boot/efi
/dev/sdb 9.6T 789G 5.1T 14% /myraid
tmpfs 6.3G 0 6.3G 0% /run/user/0
Yes, that’s right. 9.6TB now. So I am truly not in a raid10 anymore as my data seems to be striped across all my drives and not in mirrored pairs. Makes sense as don’t have even number of drives now. hmmmmmm.... Not sure how I can get this back into striped mirrors.
It still shows in raid10 though.
[root@nas ~]# btrfs fi df -h /myraid/
Data, RAID10: total=789.00GiB, used=788.83GiB
System, RAID10: total=96.00MiB, used=96.00KiB
Metadata, RAID10: total=1.03GiB, used=6.50MiB
GlobalReserve, single: total=16.00MiB, used=0.00B
First inspect filesystem.
[root@nas ~]# btrfs fi show
Label: 'myraid' uuid: 1ec4f641-74a8-466e-89cc-e687672aaaea
Total devices 7 FS bytes used 788.83GiB
devid 1 size 2.73TiB used 226.34GiB path /dev/sdb
devid 2 size 2.73TiB used 225.38GiB path /dev/sdc
devid 3 size 2.73TiB used 226.03GiB path /dev/sdd
devid 4 size 2.73TiB used 225.38GiB path /dev/sde
devid 6 size 2.73TiB used 226.38GiB path /dev/sdg
devid 7 size 2.73TiB used 225.38GiB path /dev/sdh
devid 8 size 2.73TiB used 225.38GiB path /dev/sdi
Now let’s add the new drive back in.
[root@nas ~]# btrfs dev add -f /dev/sdf /myraid/
That only took a second.
Inspect filesystem.
[root@nas ~]# btrfs fi show
Label: 'myraid' uuid: 1ec4f641-74a8-466e-89cc-e687672aaaea
Total devices 8 FS bytes used 788.83GiB
devid 1 size 2.73TiB used 226.34GiB path /dev/sdb
devid 2 size 2.73TiB used 225.38GiB path /dev/sdc
devid 3 size 2.73TiB used 226.03GiB path /dev/sdd
devid 4 size 2.73TiB used 225.38GiB path /dev/sde
devid 6 size 2.73TiB used 226.38GiB path /dev/sdg
devid 7 size 2.73TiB used 225.38GiB path /dev/sdh
devid 8 size 2.73TiB used 225.38GiB path /dev/sdi
devid 9 size 2.73TiB used 0.00B path /dev/sdf
Well I have 8 drives again. That’s a start :) But no data on it, of course.
I also see my devid for the new drive is 9, and no longer shows a 5. Hmmmmmm....
Let’s balance out the filesystem now and see what it looks like when done.
[root@nas ~]# btrfs bala start -v /myraid/
You can monitor the progress with:
[root@nas ~]# btrfs bal status -v /myraid/
Balance on '/myraid/' is running
18 out of about 265 chunks balanced (19 considered), 93% left
Dumping filters: flags 0x7, state 0x1, force is off
DATA (flags 0x0): balancing
METADATA (flags 0x0): balancing
SYSTEM (flags 0x0): balancing
Start time 6:42:36PM AZ time. Is that Jeopardy music I here. Time to binge on walking dead season six for a bit.
Oh ya. Remembered to turn head parking off on the new drive as well using tool idle3ctl. Command is:
Get current setting on /dev/sdf
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sdf
Idle3 timer set to 138 (0x8a)
Turn parking off
[root@nas idle3-tools-0.9.1]# ./idle3ctl -d /dev/sdf
Idle3 timer disabled
Please power cycle your drive off and on for the new setting to be taken into account. A reboot will not be enough!
Recheck setting
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sdf
Idle3 timer is disabled
I will power cycle later, when the balance is done running. I will then recheck the setting to verify.
Aslo check all firmware versions
[root@nas ~]# ./wd5741x64
WD5741 Version 1
Update Drive
Copyright (C) 2013 Western Digital Corporation
-Dn Model String Serial Number Firmware
-D0 Samsung SSD 850 PRO 128GB S1SMNSAG301480T EXM02B6Q
-D1 WDC WD30EFRX-68EUZN0 WD-WMC4N0J0YT1V 82.00A82
-D2 WDC WD30EFRX-68EUZN0 WD-WMC4N0J2L138 82.00A82
-D3 WDC WD30EFRX-68EUZN0 WD-WCC4N2FJRTU9 82.00A82
-D4 WDC WD30EFRX-68EUZN0 WD-WCC4N4SSDRFN 82.00A82
-D5 WDC WD30EFRX-68EUZN0 WD-WCC4N1VYZH52 82.00A82
-D6 WDC WD30EFRX-68EUZN0 WD-WMC4N0M57KEY 82.00A82
-D7 WDC WD30EFRX-68EUZN0 WD-WCC4N5YF2Z2Y 82.00A82
-D8 WDC WD30EFRX-68EUZN0 WD-WCC4N5CJ6H8U 82.00A82
[root@nas ~]#
Firmware is also displayed running "smartctl -i" we did earlier to get all the drives serial numbers.
So at 7:46:18PM it finished. Lets inspect.
[root@nas ~]# btrfs fi sho
Label: 'myraid' uuid: 1ec4f641-74a8-466e-89cc-e687672aaaea
Total devices 8 FS bytes used 788.91GiB
devid 1 size 2.73TiB used 199.27GiB path /dev/sdb
devid 2 size 2.73TiB used 199.27GiB path /dev/sdc
devid 3 size 2.73TiB used 199.27GiB path /dev/sdd
devid 4 size 2.73TiB used 199.27GiB path /dev/sde
devid 6 size 2.73TiB used 199.27GiB path /dev/sdg
devid 7 size 2.73TiB used 199.27GiB path /dev/sdh
devid 8 size 2.73TiB used 199.27GiB path /dev/sdi
devid 9 size 2.73TiB used 199.27GiB path /dev/sdf
So my data is even across all eight drives. It also shows 1 gig less on each drive but the before drive replacement.
[root@nas ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 4.0K 32G 1% /dev/shm
tmpfs 32G 8.9M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/centos_nas-root 50G 4.4G 46G 9% /
/dev/mapper/centos_nas-home 57G 33M 57G 1% /home
/dev/sda2 497M 116M 381M 24% /boot
/dev/sda1 200M 9.5M 191M 5% /boot/efi
/dev/sdb 11T 789G 11T 8% /myraid
tmpfs 6.3G 0 6.3G 0% /run/user/0
I also have my 11TB showing up on df -h.
[root@nas ~]# btrfs fi df /myraid/
Data, RAID10: total=796.00GiB, used=788.90GiB
System, RAID10: total=64.00MiB, used=112.00KiB
Metadata, RAID10: total=1.00GiB, used=6.34MiB
GlobalReserve, single: total=16.00MiB, used=0.00B
looks like it’s in raid10 mode still. One thing I did notice was on the balance it was using /dev/sdc, and not /dev/sdg. Remember it used /dev/sdg for relocating blocks when i removed /dev/sdf. But when it balanced after the new drive in /dev/sdf it used /dev/sdc for relocating blocks. See below:
Sep 16 19:41:58 nas kernel: BTRFS info (device sdc): relocating block group 2897790107648 flags 65
Sep 16 19:42:07 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:42:13 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:42:13 nas kernel: BTRFS info (device sdc): relocating block group 2894568882176 flags 65
Sep 16 19:42:20 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:42:27 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:42:27 nas kernel: BTRFS info (device sdc): relocating block group 2891347656704 flags 65
Sep 16 19:42:35 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:42:41 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:42:41 nas kernel: BTRFS info (device sdc): relocating block group 2888126431232 flags 65
Sep 16 19:42:48 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:42:55 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:42:55 nas kernel: BTRFS info (device sdc): relocating block group 2884905205760 flags 65
Sep 16 19:43:02 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:43:08 nas kernel: BTRFS info (device sdc): found 24 extents
Sep 16 19:43:09 nas kernel: BTRFS info (device sdc): relocating block group 2881683980288 flags 65
.....
Let’s run some bench marks on this baby now. Woop Woop! Oh lets power cycle and look at head parking again first.
Powered back up and looks good on head parking.
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sdb
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sdc
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sdd
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sde
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sdf
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sdg
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sdh
Idle3 timer is disabled
[root@nas idle3-tools-0.9.1]# ./idle3ctl -g /dev/sdi
Idle3 timer is disabled
Let’s do a scrub before the benchmarks just to be sure.
[root@nas idle3-tools-0.9.1]# btrfs scrub start /myraid/
scrub started on /myraid/, fsid 1ec4f641-74a8-466e-89cc-e687672aaaea (pid=2442)
You can check status of the scrub with:
[root@nas idle3-tools-0.9.1]# btrfs scrub status -d /myraid/
scrub status for 1ec4f641-74a8-466e-89cc-e687672aaaea
scrub device /dev/sdb (id 1) status
scrub started at Fri Sep 16 19:59:37 2016, running for 00:01:05
total bytes scrubbed: 9.55GiB with 0 errors
scrub device /dev/sdc (id 2) status
scrub started at Fri Sep 16 19:59:37 2016, running for 00:01:05
total bytes scrubbed: 8.45GiB with 0 errors
scrub device /dev/sdd (id 3) status
scrub started at Fri Sep 16 19:59:37 2016, running for 00:01:05
total bytes scrubbed: 8.36GiB with 0 errors
scrub device /dev/sde (id 4) status
scrub started at Fri Sep 16 19:59:37 2016, running for 00:01:05
total bytes scrubbed: 9.59GiB with 0 errors
scrub device /dev/sdg (id 6) status
scrub started at Fri Sep 16 19:59:37 2016, running for 00:01:05
total bytes scrubbed: 9.71GiB with 0 errors
scrub device /dev/sdh (id 7) status
scrub started at Fri Sep 16 19:59:37 2016, running for 00:01:05
total bytes scrubbed: 9.64GiB with 0 errors
scrub device /dev/sdi (id 8) status
scrub started at Fri Sep 16 19:59:37 2016, running for 00:01:05
total bytes scrubbed: 9.56GiB with 0 errors
scrub device /dev/sdf (id 9) status
scrub started at Fri Sep 16 19:59:37 2016, running for 00:01:05
total bytes scrubbed: 9.55GiB with 0 errors
Now we wait again. I HATE WAITING!!
Using iotop while scrubbing. Each drive is showing about 150MB read. With a total showing 1,200MB. 150 x 8 = 1,200. These WD drives only get a max of 150MB so it looks good so far. Since I have about 790GB of data I will need to scrub, double that, as it is a mirrored setup (striped mirrors to be exact). So I’m looking at about 1.58TB of data to scrub as I will scrub both copies.
Finished the scrub.
[root@nas idle3-tools-0.9.1]# btrfs scrub status /myraid/
scrub status for 1ec4f641-74a8-466e-89cc-e687672aaaea
scrub started at Fri Sep 16 19:59:37 2016 and finished after 00:25:29
total bytes scrubbed: 1.54TiB with 0 errors
Zero Errors. Hell ya!
Ok now for benchmarks. NAS is back in its home with dual 1Gb Ethernet and dual 10Gb Ethernet connections now. One of the 10Gb connections is to my PC. My PC has a Samsung 950 pro M.2 SSD drive that has reads of over 2,500MB and writes over 1,500MB. I have benchmarked this and it is true. So with a 10Gb Ethernet connection, highly tuned (and mine is) I can get 1,200MB transfers across the wire. But let’s be real. Even with 8 striped mirrors WD Red NAS drives I will never get that. With each drive at 150MB x4 = 600MB. I say times 4 as the other four drives are mirrored. So for a non cached read from my btrfs 8 drive raid10 array across the 10GB Ethernet connection to my 950 pro SSD I should get a max about 600MB. Drum roll please...... 570MB transfer with a 10GB file across the wire. Sweet!!! This is over a samba share using windows explorer so I got a bit of overhead on that so 570MB is pretty darn good. Who else has a home NAS and network that allows them that kind of speed?
Now that the file I just transferred is cashed in RAM lets try the transfer again. Yikes Scoob! 1,150MB transfer speed. Basically maxing out a 10Gb Ethernet connection, considering a little smb and windows overhead.
I also have Samsung 850 pro SSD 120GB in the NAS i use for boot and OS, so it has some free space to say the least. Let’s try a transfer from btrfs raid10 array to the SSD in the NAS. This SSD has a mx Write of just over 500MB. 520MB I think it is. So let’s reboot the server so the file is not cached and try that test.
My 10GB test file is on the array in /myraid mount. file name is 10gtest.file
I wrote a script that gives the date and time, then copies the file from the array to the SSD drive, then gives the date and time again.
#!/bin/bash
date
cp /myraid/10gtest.file /root/10gtest.file
date
Now lets run the script.
[root@nas ~]# ./test.sh
Fri Sep 16 21:00:05 MST 2016
Fri Sep 16 21:00:30 MST 2016
25 seconds nets us about 400MB. was looking for more than that. Hmmmmm.....
Now again with file cached.
[root@nas ~]# ./test.sh
Fri Sep 16 21:07:34 MST 2016
Fri Sep 16 21:07:38 MST 2016
5 seconds nets us about 2,500MB. Ya memory and the system bus is really fast. Makes me smile.
Now for some write test. reboot NAS so files aren’t cached.
From PC to NAS was 1,200MB. OK, looks like it is going to RAM before making it to the array. I can live with That :) Maxing out the 10Gb Ethernet is always good.
Well guys and gals my time here is done. I’m calling this a success. I will be researching the slow SSD drive in my NAS. It is about three years old now has been in a few systems of mine for different purposes. Could be time for a new one. I will also be looking into the devid numbering. Kind of bummed it did not go back as 5, but rather there is no 5 and created a new one called 9. Still 8 drives just skipped a devid. I also need to figure out if I’m still a raid10 even though it used /dev/sdg when I removed /dev/sdf, that part makes since, but used /dev/sdc after I added the new drive and balanced.
Anyone have any ideas to my 3 questions? Did I do something wrong?
Until next time, Peace!
I use this blog to keep notes and share findings. Hope you get something out of it
Why?
Search This Blog
Friday, September 16, 2016
Thursday, September 8, 2016
CENTOS 6.8 and Asterisk 11.x Install (with dahdi card)
CENTOS 6.8 and Asterisk 11.x Install (with dahdi card)
---After the install if dahdi is not present in the asterisk cli then do the following:
service asterisk stop
service dahdi stop
--Go to Dahdi directory
make Install
make config
--Go to Dahdi tools directory
make install
reboot
--Go to Asterisk Addons directory
./configure
or ./configure --libdir=/usr/lib64 (for 64bit system)
make install
--Go to Asterisk 1.6.x.x directory
./configure
or ./configure --libdir=/usr/lib64 (for 64bit system)
make install
make config
service asterisk stop
service dahdi stop
service dahdi start
service asterisk start
---end After the install if dahdi is not present in the asterisk cli then do the following:
CENTOS 6.8 and Asterisk 11.x Install Part 1
1. Download CENTOS 6.8 Minimal Install ISO
2. Burn to CD/DVD
3. Boot PC on CD/DVD
4. Install OS
5. After Install reboot
6. Verify hostname
# hostname
7. Setup Network
# cd /etc/sysconfig/network-scripts
- change ifcfg-eth0 file to reflect like the following example below. Your UUID and HWADDR should not be changed. Below is just an example.
DEVICE=eth0
HWADDR=00:12:3F:B8:CD:FF
TYPE=Ethernet
UUID=d960e22d-214f-450c-b017-a3ad590bb225
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=x.x.128.8
NETMASK=255.255.255.0
GATEWAY=x.x.128.1
BROADCAST=x.x.128.255
DNS1=x.x.1.10
DNS2=x.x.1.12
8. Restart network services
# services network restart
9. Run ifconfig, ping from remote machine to server, and ping google.com from server to verify. Now you can ssh in from your workstation10. update new install
# yum update –y
11. Disable SELinux
# vi /etc/selinux/config
and chang to:
and chang to:
SELINUX=disabled
12. Stop iptables
# service iptables stop
13. Prevent iptables from auto starting
# chkconfig iptables off
14. Reboot
# reboot
Setting up NTP in Centos 6.8
First we need to make sure we have the correct date, time, and timezone. You can verify with:# cat /etc/localtime
Set your time zone with (for Phoenix AZ USA)
#cp /usr/share/zoneinfo/America/Phoenix /etc/localtime
Now check date
# date
Sun Mar 29 01:10:44 MST 2015
You can verify this at the hardware level with:
# hwclock -r
Sun 29 Mar 2015 01:13:51 AM MST -0.156631 seconds
If these are incorrect then we can set the OS first and then update the hardware with the date command. The syntax is: date "day month year hh:mm:ss" So if we wanted to change this to March 2 2015 at 1:00pm we would use:
# date -s "2 MAR 2015 13:00:00"
When we have the date correct in the OS we can update the hardware with:
# hwclock -w
Now that we have the correct date and time lets install and setup ntp. First lets install ntp if not already installed.
# yum -y install ntp*
Next edit the the ntp.conf file so we are using the correct time servers.
# vi /etc/ntp.conf
I usually use the public servers from the pool.ntp.org project of:
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
After you have those in lets start the service and make sure it is on in startup.
# service ntpd start
Starting ntpd: [ OK ]
# chkconfig ntpd on
Now lets verify the setup by looking at our peers.
# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*deekayen.net 209.51.161.238 2 u 43 64 3 75.257 -22.456 3.017
origin.towfowi. 204.9.54.119 2 u 43 64 3 31.405 -14.629 2.683
NTP2.playallian 129.6.15.30 2 u 43 64 3 50.576 -14.090 3.465
sola-dal-09.ser 10.0.77.54 4 u 42 64 3 59.879 -1.507 2.629l
CENTOS 6.8 and Asterisk 11.x Install Part 2
1 Installation of Basic Dependencies
Asterisk 11.15.0 requires some prerequisite dependencies. Here is the command line to install them:
# yum -y install gcc ncurses-devel libtermcap-devel kernel-devel gcc-c++ newt-devel zlib-devel unixODBC-devel libtool make wget libuuid-devel libxml2-devel sqlite sqlite-devel
# yum -y install kernel-devel-$(uname -r)
# yum -y groupinstall "Development Tools"
# yum -y groupinstall "Development Tools"
# reboot
2 Downloading Your Asterisk Source Code
Move to directory /usr/src by given command:
# cd /usr/src/
Now download the Source Code tar balls using these commands (one by one or at a time):
# wget http://downloads.asterisk.org/pub/telephony/dahdi-linux-complete/dahdi-linux-complete-2.6.3-rc1+2.6.3-rc1.tar.gz
# wget http://downloads.asterisk.org/pub/telephony/libpri/libpri-1.5.0.tar.gz
# wget http://downloads.asterisk.org/pub/telephony/asterisk/asterisk-11-current.tar.gz
3 Extraction of Downloaded Files
Extract the downloaded tar balls to their corresponding directories using:
# tar -zxvf dahdi-linux-complete*
# tar -zxvf libpri*
# tar -zxvf asterisk*
# tar -zxvf libpri*
# tar -zxvf asterisk*
4 DAHDI Installation
DAHDI (Digium Asterisk Hardware Device Interface) can be installed using the command line:
# cd /usr/src/dahdi-linux-complete*
# make && make install && make config
# make && make install && make config
5 LibPRI Installation
In order to enable your BRI, PRI and QSIG based hardware, you will be needing PRI Library or LibPRI. You can install these libraries using:
# cd /usr/src/libpri*
# make && make install
# make && make install
6 Changing Asterisk Directory
Now you have to move back to the Asterisk Installation Directory:
# cd /usr/src/asterisk*
7 Running Configure Script for Asterisk
At
this point, you need to know your CentOS 6 Architecture (32 or 64 Bit).
In many cases you are aware of it. In case you are not, try this
command:
# uname -a
For 32 Bit, you will be getting response like:
2.6.18-238.12.1.el5 #1 SMP Tue May 31 13:23:01 EDT 2011 i686 i686 i386 GNU/Linux
For 64 Bit, system will respond with something like:
2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:31:24 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
Based on your OS Architecture, go ahead with these commands for Asterisk Configuration Script. During the process the asterisk build menu may be displayed. Simply press the esc key one time to exit and continue the install.
For 32 Bit:
# ./configure && make menuselect && make && make install
For 64 Bit:
# ./configure --libdir=/usr/lib64 && make menuselect && make && make install
8 Installing Sample Files
Install Sample Files using:
# make samples
Once done, add the Asterisk Install Script in directory /etc/init.d/ using:
# make config
9 Starting DAHDI
To start DAHDI Device Drivers, use:
# service dahdi start
Loading DAHDI hardware modules:
wct4xxp: [ OK ]
wcte43x: [ OK ]
wcte12xp: [ OK ]
wcte13xp: [ OK ]
wct1xxp: [ OK ]
wcte11xp: [ OK ]
wctdm24xxp: [ OK ]
wcaxx: [ OK ]
wcfxo: [ OK ]
wctdm: [ OK ]
wcb4xxp: [ OK ]
wctc4xxp: [ OK ]
xpp_usb: [ OK ]
Running dahdi_cfg: [ OK ]
Loading DAHDI hardware modules:
wct4xxp: [ OK ]
wcte43x: [ OK ]
wcte12xp: [ OK ]
wcte13xp: [ OK ]
wct1xxp: [ OK ]
wcte11xp: [ OK ]
wctdm24xxp: [ OK ]
wcaxx: [ OK ]
wcfxo: [ OK ]
wctdm: [ OK ]
wcb4xxp: [ OK ]
wctc4xxp: [ OK ]
xpp_usb: [ OK ]
Running dahdi_cfg: [ OK ]
10 Start Asterisk
Finally, start Asterisk:
# service asterisk start
11. Reboot
11. Reboot
# reboot
CENTOS 6.8 and Asterisk 11.x Install Part 3 setting up DAHDI
I am using the Digium
Wildcard TE110P T1/E1 Card into an Adtran 908E. The Adtran has a SIP
trunk to a CLEC. The T1/PRI card in the Asterisk server is used deliver
23 phone channels (1 channel out of the 24 is used for clocking) to/from
the Adtran/Asterisk.
1 Generate system.conf file by running:
# /usr/sbin/dahdi_genconf
My /etc/dahdi/system.conf has the following
# Autogenerated by /usr/sbin/dahdi_genconf on Sun Dec 28 23:00:55 2014
# If you edit this file and execute /usr/sbin/dahdi_genconf again,
# your manual changes will be LOST.
# Dahdi Configuration File
#
# This file is parsed by the Dahdi Configurator, dahdi_cfg
#
# Span 1: WCT1/0 "Digium Wildcard TE110P T1/E1 Card 0" (MASTER) ESF/B8ZS
span=1,1,0,esf,b8zs
# termtype: te
bchan=1-23
dchan=24
echocanceller=mg2,1-23
# Global data
loadzone = us
defaultzone = us
As it says the above file, “Autogenerated by /usr/sbin/dahdi_genconf”. And “If you edit this file and execute /usr/sbin/dahdi_genconf again, your manual changes will be LOST.”
Now make sure your /etc/asterisk/chan_dahdi.conf has the following, add to end of file: (This will create the channels for you)
context=from-test
switchtype=national
siganling=pri_cpe
group=1
channel => 1-23
2 Now stop and start DAHDI and asterisk
# service asterisk stop
Stopping safe_asterisk: [ OK ]
Shutting down asterisk: [ OK ]
# service dahdi stop
Unloading DAHDI hardware modules: done
# service dahdi start
Loading DAHDI hardware modules:
wct4xxp: [ OK ]
wcte43x: [ OK ]
wcte12xp: [ OK ]
wcte13xp: [ OK ]
wct1xxp: [ OK ]
wcte11xp: [ OK ]
wctdm24xxp: [ OK ]
wcaxx: [ OK ]
wcfxo: [ OK ]
wctdm: [ OK ]
wcb4xxp: [ OK ]
wctc4xxp: [ OK ]
xpp_usb: [ OK ]
D: auto '/sys/bus/dahdi_devices/devices/pci:0000:04:00.0'
auto-assign /sys/bus/dahdi_devices/devices/pci:0000:04:00.0
Running dahdi_cfg: [ OK ]
# service asterisk start
Starting asterisk:
3 Now lets verify
# lsdahdi
### Span 1: WCT1/0 "Digium Wildcard TE110P T1/E1 Card 0" (MASTER) ESF/B8ZS
1 PRI Clear (EC: MG2 - INACTIVE)
2 PRI Clear (EC: MG2 - INACTIVE)
3 PRI Clear (EC: MG2 - INACTIVE)
4 PRI Clear (EC: MG2 - INACTIVE)
5 PRI Clear (EC: MG2 - INACTIVE)
6 PRI Clear (EC: MG2 - INACTIVE)
7 PRI Clear (EC: MG2 - INACTIVE)
8 PRI Clear (EC: MG2 - INACTIVE)
9 PRI Clear (EC: MG2 - INACTIVE)
10 PRI Clear (EC: MG2 - INACTIVE)
11 PRI Clear (EC: MG2 - INACTIVE)
12 PRI Clear (EC: MG2 - INACTIVE)
13 PRI Clear (EC: MG2 - INACTIVE)
14 PRI Clear (EC: MG2 - INACTIVE)
15 PRI Clear (EC: MG2 - INACTIVE)
16 PRI Clear (EC: MG2 - INACTIVE)
17 PRI Clear (EC: MG2 - INACTIVE)
18 PRI Clear (EC: MG2 - INACTIVE)
19 PRI Clear (EC: MG2 - INACTIVE)
20 PRI Clear (EC: MG2 - INACTIVE)
21 PRI Clear (EC: MG2 - INACTIVE)
22 PRI Clear (EC: MG2 - INACTIVE)
23 PRI Clear (EC: MG2 - INACTIVE)
24 PRI HDLCFCS
# dahdi_scan
[1]
active=yes
alarms=OK
description=Digium Wildcard TE110P T1/E1 Card 0
name=WCT1/0
manufacturer=Digium
devicetype=Digium Wildcard TE110P T1/E1
location=PCI Bus 04 Slot 01
basechan=1
totchans=24
irq=0
type=digital-T1
syncsrc=0
lbo=0 db (CSU)/0-133 feet (DSX-1)
coding_opts=B8ZS,AMI
framing_opts=ESF,D4
coding=B8ZS
framing=ESF
Now from asterisk console do:
# asterisk -r
localhost*CLI> dahdi show status
Description Alarms IRQ bpviol CRC Fra Codi Options LBO
Digium Wildcard TE110P T1/E1 Card 0 OK 8 0 0 ESF B8ZS 0 db (CSU)/0-133 feet (DSX-1)
localhost*CLI>
localhost*CLI> dahdi show channels
Chan Extension Context Language MOH Interpret Blocked State Description
pseudo default default In Service
1 from-test default In Service
2 from-test default In Service
3 from-test default In Service
4 from-test default In Service
5 from-test default In Service
6 from-test default In Service
7 from-test default In Service
8 from-test default In Service
9 from-test default In Service
10 from-test default In Service
11 from-test default In Service
12 from-test default In Service
13 from-test default In Service
14 from-test default In Service
15 from-test default In Service
16 from-test default In Service
17 from-test default In Service
18 from-test default In Service
19 from-test default In Service
20 from-test default In Service
21 from-test default In Service
22 from-test default In Service
23 from-test default In Service
Edit your /etc/asterisk/sip.conf and /etc/asterisk/extensions.conf files and get your phones registered.
CENTOS 6.8 and Asterisk Install 11.x Part 4 Setting up Phones
Edit /etc/asterisk/sip.conf and add the example below, to bottom of file. Substitute phone number you will be using
[6026356915]
type=friend
callerid="Asterisk 100" 6026356915
secret=password
context=internal
host=dynamic
allow=all
dtmfmode=rfc2833
Edit /etc/asterisk/extensions.conf and add the following to end of file
[internal]
;used to pass special character’s to DAHDI (T1/PRI)
exten => _[*#0-9]!,1,Dial(DAHDI/g1/${EXTEN})
exten => _[*#0-9]!,n,Hangup
;used to pass numbers dialed to DAHDI (T1/PRI)
exten => _X.,1,Dial(DAHDI/g1/${EXTEN})
exten => _X.,n,Hangup
;used to pass extension dialed, 100, to registered phone of 6026356915
exten => 100,1,Dial(SIP/6026356915,20)
exten => 100,n,Playback(vm-goodbye)
exten => 100,n,Hangup
;this is what we used in chan_dahdi.conf
[from-test]
exten => 6026356915,1,Dial(SIP/6026356915,20)
exten => 6026356915,n,Playback(vm-goodbye)
exten => 6026356915,n,Hangup
Go to the asterisk console:
# asterisk –r
Issue command:
localhost*CLI> core reload
This will reload asterisk modules and reflect your changes.
Setup phone and register. Verify by:
localhost*CLI> sip show peers
Watch messages with
localhost*CLI> core set verbose 99
END
Subscribe to:
Posts (Atom)