ZFS Cheatsheet
This is a quick and dirty cheatsheet
on Sun's ZFS
Directories
and Files
|
|
error messages
|
/var/adm/messages
console |
States
|
|
DEGRADED
|
One or more top-level devices is
in the degraded state because they have become offline. Sufficient replicas
exist to keep functioning
|
FAULTED
|
One or more top-level devices is
in the faulted state because they have become offline. Insufficient replicas
exist to keep functioning
|
OFFLINE
|
The device was explicity taken
offline by the "zpool offline" command
|
ONLINE
|
The device is online and
functioning
|
REMOVED
|
The device was physically removed
while the system was running
|
UNAVAIL
|
The device could not be opened
|
Scrubbing
and Resilvering
|
|
Scrubbing
|
Examines all data to discover
hardware faults or disk failures, only one scrub may be running at one time,
you can manually scrub.
|
Resilvering
|
is the same concept as rebuilding
or resyncing data on to new disks into an array, the smart thing resilvering does
is it does not rebuild the whole disk only the data that is required (the
data blocks not the free blocks) thus reducing the time to resync a disk.
Resilvering is automatic when you replace disks, etc. If a scrub is already
running it is suspended until the resilvering has finished and then the
scrubbing will continue.
|
ZFS
Devices
|
|
Disk
|
A physical disk drive
|
File
|
The absolute path of pre-allocated
files/images
|
Mirror
|
Standard raid-1 mirror
|
Raidz1/2/3
|
## non-standard distributed
parity-based software raid levels, one common problem called
"write-hole" is elimiated because raidz in ## zfs the data and
stripe are written simultanously, basically is a power failure occurs in the
middle of a write then you have the ## data plus the parity or you dont, also
ZFS supports self-healing if it cannot read a bad block it will reconstruct
it using the
## parity, and repair or indicate that this block should not be used.
## You should keep the raidz array
at a low power of two plus partity
raidz1 - 3, 5, 9 disks raidz2 - 4, 6, 8, 10, 18 disks raidz3 - 5, 7, 11, 19 disks ## the more parity bits the longer it takes to resilver an array, standard mirroring does not have the problem of creating the parity ## so is quicker in resilvering
## raidz is more like raid3 than
raid5 but does use parity to protect from disk failures
raidz/raidz1 - minimum of 3 devices (one parity disk), you can suffer a one disk loss raidz2 - minimum of 4 devices (two parity disks), you can suffer a two disk loss raidz3 - minimum of 5 devices (three parity disks) , you can suffer a three disk loss |
spare
|
hard drives marked as "hot
spare" for ZFS raid, by default hot spares are not used in a disk
failure you must turn on the "autoreplace" feature.
|
cache
|
Linux caching mechanism use what
is known as least recently used (LRU) algorithms, basically first in first
out (FIFO) blocks are moved in and out of cache. Where ZFS cache is different
it caches both least recently used block (LRU) requests and least frequent used
(LFU) block requests, the cache device uses level 2 adaptive read cache
(L2ARC).
|
log
|
There are two terminologies here
|
Storage
Pools
|
|
displaying
|
zpool list
zpool list -o name,size,altroot
# zdb can view the inner workings
of ZFS (zdb has a number of options)
zdb <option> <pool> Note: there are a number of properties that you can select, the default is: name, size, used, available, capacity, health, altroot |
status
|
zpool status
## Show only errored pools with more verbosity zpool status -xv |
statistics
|
zpool iostat -v 5 5
Note: use this command like you would iostat |
history
|
zpool history -il
Note: once a pool has been removed the history is gone |
creating
|
## You cannot shrink a pool
only grow it
## performing a dry run but don't actual perform the creation (notice the -n) zpool create -n data01 c1t0d0s0 # you can persume that I created two files called /zfs1/disk01 and /zfs1/disk02 using mkfile zpool create data01 /zfs1/disk01 /zfs1/disk02 # using a standard disk slice zpool create data01 c1t0d0s0 ## using a different mountpoint than the default /<pool name> zpool create -m /zfspool data01 c1t0d0s0 # mirror and hot spare disks examples, hot spares are not used by default turn on the "autoreplace" feature for each pool zpool create data01 mirror c1t0d0 c2t0d0 mirror c1t0d1 c2t0d1 zpool create data01 mirror c1t0d0 c2t0d0 spare c3t0d0 ## setting up a log device and mirroring it zpool create data01 mirror c1t0d0 c2t0d0 log mirror c3t0d0 c4t0d0 ## setting up a cache device zpool create data 01 mirror c1t0d0 c2t0d0 cache c3t0d0 c3t1d0
## you can also create raid pools
(raidz/raidz1 - mirror, raidz2 - single parity, raidz3 double partity)
zpool create data01 raidz2 c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 |
destroying
|
zpool destroy /zfs1/data01
## in the event of a disaster you can re-import a destroyed pool zpool import -f -D -d /zfs1 data031 |
adding
|
zpool add data01 c2t0d0
Note: make sure that you get this right as zpool only supports the removal of hot spares and cache disks, for mirrors see attach and detach below |
Resizing
|
## When replacing a disk with a
larger one you must enable the "autoexpand" feature to allow you to
use the extended space, you must do this before replacing the first disk
|
removing
|
zpool remove data01 c2t0d0
Note: zpool only supports the removal of hot spares and cache disks, for mirrors see attach and detach below |
clearing faults
|
zpool clear data01
## Clearing a specific disk fault zpool clear data01 c2t0d0 |
attaching (mirror)
|
## c2t0d0 is an existing disk that
is not mirrored, by attaching c3t0d0 both disks will become a mirror pair
zpool attach data01 c2t0d0 c3t0d0 |
detaching (mirror)
|
zpool detach data01 c2t0d0
Note: see above notes is attaching |
onlining
|
zpool online data01 c2t0d0
|
offlining
|
zpool offline data01 c2t0d0
## Temporary offlining (will revent back after a reboot) zpool offline data01 -t c2t0d0 |
Replacing
|
## replacing like for like
zpool replace data03 c2t0d0 ## replacing with another disk zpool replace data03 c2t0d0 c3t0d0 |
scrubbing
|
zpool scrub data01
## stop a scrubbing in progress, check the scrub line using "zpool status data01" to see any errors zpool scrub -s data01
Note; see top of table for more
information about resilvering and scrubbing
|
exporting
|
zpool export data01
## you can list exported pools
using the import command
zpool import |
importing
|
## when using standard disk
devices i.e c2t0d0
zpool import data01 ## if using files in say the /zfs filesystem zpool import -d /zfs
## importing a destroyed pool
zpool import -f -D -d /zfs1 data03 |
getting parameters
|
zpool get all data01
Note: the source column denotes if the value has been change from it default value, a dash in this column means it is a read-only value |
setting parameters
|
zpool set autoreplace=on data01
Note: use the command "zpool get all <pool>" to obtain list of current setting |
upgrade
|
## List upgrade paths
zpool upgrade -v ## upgrade all pools zpool upgrade -a ## upgrade specific pool, use "zpool get all <pool>" to obtain version number of a pool zpool upgrade data01 ## upgrade to a specific version zpool upgrade -V 10 data01 |
Filesystem
|
|
displaying
|
zfs list
## list different types zfs list -t filesystem zfs list -t snapshot zfs list -t volume
zfs list -t all -r <zpool>
## recursive display zfs list -r data01/oracle ## complex listing zfs list -o name,mounted,sharenfs,mountpoint Note: there are a number of attributes that you can use in a complex listing, so use the man page to see them all |
creating
|
## persuming i have a pool called
data01 create a /data01/apache filesystem
zfs create data01/apache ## using a different mountpoint zfs create -o mountpoint=/oracle data01/oracle ## create a volume - the device can be accessed via /dev/zvol/[rdsk|dsk]/data03/swap zfs create -V 50mb data01/swap swap -a /dev/zvol/dsk/data01/swap
Note: don't use a zfs volume as a
dump device it is not supported
|
destroying
|
zfs destroy data01/oracle
## using the recusive options -r = all children, -R = all dependants zfs destroy -r data01/oracle zfs destroy -R data01/oracle |
mounting
|
zfs mount data01
# you can create temporary mount
that expires after unmounting
zfs mount -o mountpoint=/tmpmnt data01/oracle Note: there are all the normal mount options that you can apply i.e ro/rw, setuid |
unmounting
|
zfs umount data01
|
share
|
zfs share data01
## Persist over reboots zfs set sharenfs=on data01 ## specific hosts zfs set sharenfs="rw=@10.85.87.0/24" data01/apache |
unshare
|
zfs unshare data01
## persist over reboots zfs set sharenfs=off data01 |
snapshotting
|
## snapshotting is like taking a
picture, delta changes are recorded to the snapshot when the original file
system changes, to
## remove a dataset all previous snaphots have to be removed, you can also rename snapshots. ## You cannot destroy a snapshot if it has a clone ## creating a snapshot zfs snapshot data01@10022010 ## renaming a snapshot zfs snapshot data01@10022010 data01@keep_this ## destroying a snapshot zfs destroy data01@10022010 |
rollback
|
## by default you can only
rollback to the lastest snapshot, to rollback to older one you must delete
all newer snapshots
zfs rollback data01@10022010 |
cloning/promoting
|
## clones are writeable
filesystems that was upgraded from a snapshot, a dependency will remain on
the snapshot as long as the
## clone exists. A clone uses the data from the snapshot to exist. As you use the clone it uses space separate from the snapshot.
## clones cannot be created across
zpools, you need to use send/receive see below topics
## cloning
zfs clone data01@10022010 data03/clone zfs clone -o mountpoint=/clone data01@10022010 data03/clone ## promoting a clone, this allows you to destroy the original file ssytem that the clone is attached to zfs promote data03/clone Note: the clone must reside in the same pool |
renaming
|
## the dataset must be kept within
the same pool
zfs rename data03/ora_disk01 data03/ora_d01
Note: you have two options
-p creates all the non-existing parent datasets -r recursively rename the sanpshots of all descendent datasets (used with snapshots only) |
Compression
|
## You enable compression by
seeting a feature, compressions are on, off, lzjb, gzip, gzip[1-9] ans zle,
not that it only start
## compression when you turn it on, other existing data will not be compressed zfs set compression=lzjb data03/apache
## you can get the compression
ratio
zfs get compressratio data03/apache |
Deduplication
|
## you can save disk space using
deduplication which can be on file, block or byte, for example using file
each file is hashed with a
## cryptographic hashing algorithm such as SHA-256, if a file matches then we just point to the existing file rather than storing a ## new file, this is ideal for small files but for large files a single character change would mean that all the data has to be copied ## block deduplication allows you to share all the same blocks in a file minus the blocks that are different, this allows to share the ## unique blocks on disk and the reference shared blocks in RAM, however it may need a lot of RAM to keep track of which blocks ## are shared and which are not., however this is the preferred option other than file or byte deduplication. Shared blocks are ## stored in what is called a "deduplication table", the more deduplicated blocks the larger the table, the table is read everytime ## to make a block change thus the table should be held in fast RAM, if you run out of RAM then the table will spillover onto disk.
## So how much RAM do you need,
you can use the zdb command to check, take the "bp count", it takes
about 320 bytes of ram
## for each deduplicate block in the pool, so in my case 288674 means I would need about 92MB, for example a 200GB would need ## about 670MB for the table, a good rule would be to allow 5GB of RAM for every 1TB of disk. ## to see the block the dataset consumes zdb -b data01 ## to turn on deduplicate zfs set dedup=on data01/text_files ## to see the deduplicatio ratio zfs get dedupratio data01/text_files
## to see the histrogram of howm
many blocks are referenced how many time
zdb -DD <pool> |
getting parameters
|
## List all the properties
zfs get all data03/oracle ## get a specific property zfs get setuid data03/oracle ## get a list of a specific properites for all datasets zfs get compression
Note: the source column denotes if
the value has been change from it default value, a dash in this column means
it is a read-only value
|
setting parameters
|
## set and unset a quota
zfs set quota=50M data03/oracle zfs set quota=none data03/oracle
Note: use the command "zfs
get all <dataset> " to obtain list of current settings
|
inherit
|
## set back to the default value
zfs inherit compression data03/oracle |
upgrade
|
## List the upgrade paths
zfs upgrade -v
## List all the datasets that are
not at the current level
zfs upgrade ## upgrade a specific dataset upgrade -V <version> data03/oracle |
send/receive
|
## here is a complete example of a
send and receive with incremental update
## create some test files
mkfile -v 100m /zfs/master mkdir -v 100m /zfs/slave ## create mountpoints mkdir /master mkdir /slave ## Create the pools zpool create master zpool create slave
## create the data filesystem
zfs create master/data ## create a test file echo "created: 09:58" > /master/data/test.txt ## create a snapshot and send it to the slave, you could use SSH or tape to transfer to another server (see below) zfs snapshot master/data@1 zfs send master/data@1 | zfs receive slave/data
## set the slave to read-only
because you can cause data corruption, make sure if do this before accessing
anything the
## slave/data directory zfs set readonly=on slave/data ## update the original test.txt file echo "`date`" >> /master/data/text.txt ## create a second snapshot and send the differences, you may get an error message saying that the desination had been ## modified this is because you did not set the slave/data to ready only (see above) zfs snapshot master/data@2 zfs send -i master/data@1 master/data@2 | zfs receive slave/data
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## using SSH
zfs send master/data@1 | ssh backup_server zfs receive backups/data@1
## using a tape drive, you can
also use cpio
zfs send master/data@1 > /dev/rmt/0 zfs receive slave/data2@1 < /dev/rmt/0 zfs rename slave/data slave/data.old zfs rename slave/data2 slave/data ## you can also save incremental data zfs send master/data@12022010 > /dev/rmt/0 zfs send -i master/data@12022010 master/data@13022010 > /dev/rmt/0
## Using gzip to compress the
snapshot
zfs send master/fs@snap | gzip > /dev/rmt/0 |
allow/unallow
|
## display the permissions set and
any user permissions
zfs allow master ## create a permission set zfs allow -s @permset1 create,mount,snapshot,clone,promote master
## delete a permission set
zfs unallow -s @permset1 master ## grant a user permissions zfs allow vallep @permset1 master ## revoke a user permissions zfs unallow vallep @permset1 master Note: there are many permissions that you can set so see the man page or just use the "zfs allow" command |
Quota/Reservation
|
## Not strickly a command but
wanted to discuss here, you can apply a quota to a dataset, you can reduce
this quota only if the
## quota has not already exceeded, if you exceed the quota you will get a error message, you also have reservations which will ## guarantee that a specified amount of disk space is available to the filesystem, both are applied to datasets and there ## descendants (snapshots, clones)
## Newer versions of Solaris allow
you to set group and user quota's
## you can also use refquota and refreservation to manage the space without accounting for disk space consumed by descendants ## such as snapshots and clones. Generally you would set quota and reservation higher than refquota and refreservation
## set a quota
zfs set quota=100M data01/apache ## get a quota zfs get quota data01/apache
## setup user quota (use
groupquota for groups)
zfs set userquota@vallep=100M data01/apache
## remove a user quota (use
groupquota for groups)
zfs set userquota@vallep=none data01/apache
## List user quota (use groupspace
for groups), you can alsolist users with quota's for exampe root user
zfs userspace data01/apache zfs get userused@vallep data01/apache |
ZFS
tasks
|
|
Replace failed disk
|
# List the zpools and identify the
failed disk
zpool list # replace the disk (can use same disk or new disk) zpool replace data01 c1t0d0 zpool replace data01 c1t0d0 c1t1d0
# clear any existing errors
zpool clear data01 # scrub the pool to check for anymore errors (this depends on the size of the zpool as it can take a long time to complete zpool scrub data01
# you can now remove the failed
disk in the normal way depending on your hardware
|
Expand a pools capacity
|
# you cannot remove a disk from a
pool but you can replace it with a larger disk
zpool replace data01 c1t0d0 c2t0d0 zpool set autoexpand=on data01 |
Install the boot block
|
# the command depends if you are
using a sparc or a x86 system
sparc - installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t1d0 x86 - installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c0t1d0s0 |
Lost root password
|
# You have two options to recover
the root password
## option one ok> boot -F failsafe whne requested follow the instructions to mount the rpool on /a cd /a/etc vi passwd|shadow init 6
## Option two
ok boot cdrom|net -s (you can boot from the network or cdroml) zpool import -R /a rpool zfs mount rpool/ROOT/zfsBE cd /a/etc vi passwd|shadow init 6 |
Primary mirror disk in root is
unavailable or fails
|
# boot the secondary mirror
ok> boot disk1 ## offline and unconfigure failed disk, there may be different options on unconfiguring a disk depends on the hardware zpool offline rpool c0t0d0s0 cfgadm -c unconfigure c1::dsk/c0t0d0 # Now you can physically replace the disk, reconfigure it and bring it online cfgadm -c configure c1::dsk/c0t0d0 zpool online rpool c0t0d0 # Let the pool know you have replaced the disk zpool replace rpool c0t0d0s0
# if the replace above fails the
detach and reattach the primary mirror
zpool deatch rpool c0t0d0s0 zpool attach rpool c0t1d0s0 c0t0d0s0 # make checks zpool status rpool # dont forget to add the boot block (see above) |
Resize swap area (and dump areas)
|
# You can resize the swap if it is
not being used, first record the size and if it is being used
swap -l # resize the swap area, first by removing it swap -d /dev/zvol/dsk/rpool/swap zpool set volsize=2G rpool/swap # Now activate the swap and check the size, if the -a option does not work then use "swapadd" command swap -a /dev/zvol/dsk/rpool/swap swap -l
Note: if you cannot delete the
original swap area due to being too busy then simple add another swap area,
the same procedure is used for dump areas but using the "dumpadm"
command
|
No comments:
Post a Comment