#37 - ZFS on Linux (Part 2 of 2)

Links, Code, and Transcript


In this episode, lets continue our look at ZFS on Linux. In part one, we covered the basic ins and outs of ZFS, so in this episode, lets focus on some of the more advanced filesystem features offered by ZFS.

If you have not already watched ZFS on Linux part #1, episode #35, I highly suggest you go do that now, as it plays well into this episode. Before we dive into compression, snapshots, quotas, and data deduplication, using ZFS. I thought it would be a good idea to cover a couple best practices just in case these episodes inspire you to try ZFS for yourself.

The first suggestion, is that if you are going to be using ZFS in production, it is highly recommended that you use ECC memory. ZFS is great at repairing failures and data corruption via checksums, but if a bad memory module is silently introducing corruption via RAM, and the bad data gets saved to disk, you could be in a very bad situation. If you want to read more about this, you can check out the FreeNAS forums for a nice discussion, there is also this this good Google Groups thread, and finally the ZFS Administration Guide on why you should use ECC memory. This is a large topic, and it is not limited to ZFS, so if you are interested in reading more about it, these links can be found in the episode notes below.

Let me just refresh your memory about our test setup. You might remember that, back in episode #35, I talked about how I created a virtual environment with 10 disks, those being /dev/sdb through sdk, each roughly 100 MB in size. We then used these virtual devices to play around with ZFS pools. You should also remember that, we can take a look at existing ZFS pools, by running zpool status.

# ls -l /dev/sd[bcdefghijk]
brw-rw---- 1 root disk 8,  16 Sep  9 03:50 /dev/sdb
brw-rw---- 1 root disk 8,  32 Sep  9 03:50 /dev/sdc
brw-rw---- 1 root disk 8,  48 Sep  9 03:50 /dev/sdd
brw-rw---- 1 root disk 8,  64 Sep  9 03:50 /dev/sde
brw-rw---- 1 root disk 8,  80 Sep  9 03:50 /dev/sdf
brw-rw---- 1 root disk 8,  96 Sep  9 03:50 /dev/sdg
brw-rw---- 1 root disk 8, 112 Sep  9 04:27 /dev/sdh
brw-rw---- 1 root disk 8, 128 Sep  9 04:27 /dev/sdi
brw-rw---- 1 root disk 8, 144 Sep  9 04:28 /dev/sdj
brw-rw---- 1 root disk 8, 160 Sep  9 04:28 /dev/sdk
# zpool status
  pool: e37pool
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    e37pool     ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        sdd     ONLINE       0     0     0
        sde     ONLINE       0     0     0
      mirror-2  ONLINE       0     0     0
        sdf     ONLINE       0     0     0
        sdg     ONLINE       0     0     0

errors: No known data errors
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      8.4G  1.2G  6.9G  15% /
tmpfs                 230M     0  230M   0% /dev/shm
/dev/sda1             485M   54M  406M  12% /boot
e37pool               225M     0  225M   0% /e37pool

I just wanted to quickly show you how to grow ZFS pools because I think it plays well into a second recommended best practice. So, here we have our e37pool, and it is made up of several disk mirrors. The e37pool is also mounted and it is about 225 MB is size. Now, lets say that you have reached the capacity of this pool, and you wanted to add some additional space, how would you go about that? Well, since ZFS is a combined volume manager and filesystem, it is actually extremely easy to add additional capacity to a ZFS filesystem. It just to happens that we have four spare virtual disks on our system, so lets work through adding those to this pool. Lets type, zpool add e37pool mirror sdh sdi mirror sdj sdk. This will create two mirrors, one of sdh and sdi, and a second mirror of sdj and sdjk, then it will add these to the e37pool.

# zpool add e37pool mirror sdh sdi mirror sdj sdk

We can verify this worked by running, zpool status again, and as you can see we have our two new mirrors down here, but we can also verify the filesystem was resized by running df again. So, before we had 225 MB of space, and now we have 396 MB. Personally, I think this is pretty cool and a major advancement over existing filesystems, in that if you wanted to do something like this with LVM, you would have to run all these commands to add devices to the raid, grow the volume group, and finally grow the filesystem. You might even have to take the volume off-line for some of these steps. With ZFS, it is one command, and it was all done on-line. I should also mention, that you can only expand a zpool, you cannot shrink it.

# zpool status
  pool: e37pool
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    e37pool     ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        sdd     ONLINE       0     0     0
        sde     ONLINE       0     0     0
      mirror-2  ONLINE       0     0     0
        sdf     ONLINE       0     0     0
        sdg     ONLINE       0     0     0
      mirror-3  ONLINE       0     0     0
        sdh     ONLINE       0     0     0
        sdi     ONLINE       0     0     0
      mirror-4  ONLINE       0     0     0
        sdj     ONLINE       0     0     0
        sdk     ONLINE       0     0     0

errors: No known data errors
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      8.4G  1.2G  6.9G  15% /
tmpfs                 230M     0  230M   0% /dev/shm
/dev/sda1             485M   54M  406M  12% /boot
e37pool               396M     0  396M   0% /e37pool

Okay, so now that we refreshed our memories about ZFS pools, and what they look like. Lets chat about the second recommended best practice. This has to do with how I created the pool in episode #35 using Linux device names, devices like /dev/sdb, sdc, sdd, etc. Lets, just start fresh by running zpool status again. I am talking about these devices down here. If this were a real production system you would likely not want to use device names, because it creates a management nightmare if you have lots of devices. There is a great little blurb about this on the ZFS on Linux FAQ page. You actually have many choices how you references devices in a pool. For example, in episode #35, we used the Linux devices names, but you can also use drive identifier numbers, things like serial number, etc. Or you can choose to use physical layout information, things like PCI slot and port numbers. Finally, you can also create your own label types, these might describe the physical locations.

So, why would you want to do something like this? Well, it basically boils down to ease of management when things go wrong. Say for example that you think ZFS is great, you purchase some hardware that supports lots of disks, lets say a 48 bay chassis. You have it all up and running, using ECC memory, and things are going great, until you have a failed disk. If you used device names provided by Linux, like I did in these examples, it is likely going to be a nightmare trying to find the failed disk, as it is not clear how these device names map to physical locations. Once you have figured the mapping out, then you will likely want to swap out the failed disk and rebuild the ZFS storage pool on-line. The issue is that, there is an extremely high chance that when you reboot the machine down the road, and your drive letters will have shifted, since when that disk failed, and you replaced it with a new one, you were likely given a new device name! So, if you created the pool using device names, it is now out of whack, since before and after the reboot device names likely moved around, this was done to remove the dead disk from the system. This is fixable, just by asking ZFS to import the array, by rereading the headers off the disks, but it requires manual intervention. So, I would suggest creating the pool using something like, drive identifier numbers, then no matter what the device drive letter, your pool should always come back on-line without manual intervention. A handy trick, would be to create the pool using disk by id, then you can add a physical label to each disk carrier, which indicates unique information about that disk, like the serial number. So, when a failure happens, it is easy to match up the zpool status failed disk information with the physical label on each disk.

If you happen to create a ZFS pool using explict /dev/sdb drive letters, do not worry, you can convert to using disk by id after the fact. Lets export our ZFS pool using /dev/sd device names, then import it using the new disk by id names. After this is done, even if we swap the underlining devices names around, our pool will always come back.

# zpool status
  pool: e37pool
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    e37pool     ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        sdd     ONLINE       0     0     0
        sde     ONLINE       0     0     0
      mirror-2  ONLINE       0     0     0
        sdf     ONLINE       0     0     0
        sdg     ONLINE       0     0     0
      mirror-3  ONLINE       0     0     0
        sdh     ONLINE       0     0     0
        sdi     ONLINE       0     0     0
      mirror-4  ONLINE       0     0     0
        sdj     ONLINE       0     0     0
        sdk     ONLINE       0     0     0

errors: No known data errors

You can export the array, which unmounts it and effectively removes it from the system, by running zpool export, then the pool name, so in our case e37pool. Lets verify that it is actually gone, by running zpool status, and as you can see there are no pools available, and lets just verify it is unmounted by running df. So one cool thing about ZFS, and this is possible with other raided devices, is that you could actually move these disks to a different system, and it would work, because all the data about the ZFS array is on the disks.

# zpool export e37pool
# zpool status
no pools available
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      8.4G  1.2G  6.9G  15% /
tmpfs                 230M     0  230M   0% /dev/shm
/dev/sda1             485M   54M  406M  12% /boot

Now lets import our pool again, but this time using the new disk by id names, so that we can survive devices, like /dev/sdb, from being renamed out from under us. We are going to import the ZFS pool by reading the metadata off each disk. So, lets type, zpool import -d, this specifies the devices we want to use, /dev/disk/by-id, then the pool name, in our case e37pool, and finally the -f option, to force it.

# zpool import -d /dev/disk/by-id e37pool -f

Then, lets verify it worked, by running, zpool status. As you can see, we are using unique drive identification numbers to reference the device, rather than the block device name, /dev/sdb for example. We can also verify it was mounted correctly by running df. I should mention, that when we ran zpool import, it scans the devices looking for ZFS metadata and constructs our pool, this could take a while if you have lots of disks.

# zpool status
  pool: e37pool
 state: ONLINE
  scan: none requested
config:

    NAME                                       STATE     READ WRITE CKSUM
    e37pool                                    ONLINE       0     0     0
      mirror-0                                 ONLINE       0     0     0
        ata-VBOX_HARDDISK_VBc142d444-59aeef39  ONLINE       0     0     0
        ata-VBOX_HARDDISK_VBabc3cdd9-ba6145ab  ONLINE       0     0     0
      mirror-1                                 ONLINE       0     0     0
        ata-VBOX_HARDDISK_VBdd3e501a-2b4e57c7  ONLINE       0     0     0
        ata-VBOX_HARDDISK_VBdb63c965-9604aaeb  ONLINE       0     0     0
      mirror-2                                 ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB7f5992e7-bfa19485  ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB52eabb4c-f0840b75  ONLINE       0     0     0
      mirror-3                                 ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB55b43204-09b13de0  ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB9983ccb7-24ace9eb  ONLINE       0     0     0
      mirror-4                                 ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB16ce4ace-5fa97d8b  ONLINE       0     0     0
        ata-VBOX_HARDDISK_VB7a9e31e7-963cae3f  ONLINE       0     0     0

errors: No known data errors
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      8.4G  1.2G  6.9G  15% /
tmpfs                 230M     0  230M   0% /dev/shm
/dev/sda1             485M   54M  406M  12% /boot
e37pool               396M     0  396M   0% /e37pool

Okay, so that covers several recommended best practices and how you grow your ZFS pool. Lets move onto advanced filesystem features offered by ZFS on Linux, things like compression, deduplication, snapshots, and quotas.

Up until this point, we have mainly talked about the zpool command for working with ZFS on Linux, but there is actually a second commands called, just zfs. The zfs command, allows you to turn on and off various features, along with getting and setting properties. The following is probably best described through diagrams. Lets say for example, you are working at a research lab, and your ZFS pool has 100TB of storage. This is a lot of storage to play around with, and you will likely have many projects and people working in this storage pool, so you will likely want to shape the way they do things.

You see with ZFS, you create a storage volume, and it is presented as just one large chunk of storage, as shown by this box diagram. You will sometimes hear people using pool of storage, or tank of storage, these are the most common, but they all refer to the same thing.

Lets say we have four projects, A, B, C, and D. All of these projects can use the same pool or tank of storage, but these project all have different requirements. Lets say project A, is a group of scientists working with gene sequencing data. Typically, these are large text files coming off sequencers, so we want to add a policy to their area for data deduplication. Also, since these are mainly text files, we should add compression to their area, as we can likely save lots of space. Finally, since we know they have large storage requirements, we want to enforce some type of quote, just to make sure they do not use space assigned to projects B, C, or D.

It just to happens that ZFS allows you to create areas called datasets, these looks just like directories on the end system, but you can assign all types of advanced features of each dataset. We will use these datasets to carve up the large pool or tank of storage.

Next, lets say project B is mainly user data and home directories, and you have had many requests to restore files from backup since things occasionally get deleted. You think it would be a good idea to add hourly snapshots during office hours, so that files are quickly restored from the snapshot folder, rather than tape backup.

Actually, I should remove this hard line, and add a dotted line, same goes for projects C and D. Reason being, is that project A is the only one with a quota, meaning project A cannot consume more than a set limit of storage. Where, projects B, C, and D, are basically sharing the rest of the storage pool. I created datasets for projects C and D even though we do not have any special settings yet, this just adds the ability down the road, say for example that you have to have snapshots or something. Anyways, now that we have a high level overview of what we want, lets look at setting this up. I think you will be blown away with how easy it is.

So, lets just get our bearings, by running df -h, and zpool list. First off, lets create our four project directories, by running zfs create e37pool/project-a, b, c, and finally d. You will notice a convention happening here, the e37pool is our large chunk of storage, then these project a, b, c, and d, are our datasets used to divide the storage. Lets run df -h again, and you will notice that we have four new mounts, one for each of our project datasets.

# zfs create e37pool/project-a
# zfs create e37pool/project-b
# zfs create e37pool/project-c
# zfs create e37pool/project-d
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      8.4G  1.3G  6.7G  17% /
tmpfs                 230M     0  230M   0% /dev/shm
/dev/sda1             485M   54M  406M  12% /boot
e37pool               395M     0  395M   0% /e37pool
e37pool/project-a     395M     0  395M   0% /e37pool/project-a
e37pool/project-b     395M     0  395M   0% /e37pool/project-b
e37pool/project-c     395M     0  395M   0% /e37pool/project-c
e37pool/project-d     395M     0  395M   0% /e37pool/project-d

First, let configure compression. I thought it would be cool to show you examples of all these features, so I downloaded a large log file and we will use this as a test. Right now, I am sitting in roots home directory. Let me just list the files here, as you can see there is a NASA web server log, I have used these in previous episodes, #28 would be an example, and it is about 161 MB in size. Let me just show you the ZFS mounts again, as you can see, there is zero used space. So, lets turn on compression for project a, by running zfs set compression=lz4, lz4 is the compression algorithm, then the pool and dataset we want to use, in our case, e37pool/project-a. That is it, files going into and coming out of the project-a dataset will not be compressed and decompressed on the fly, this is totally transparent to the end user. Lets test this out by copying our 161MB web server log into project As area. Lets run df again, and you will notice that only 32MB are used. You can also verify this by running zfs list. So, it looks like compression is working, but you can get the exact values by running, zfs get compressration e37pool/project-a. As you can see, we are getting a compression ration of 5.12. Pretty cool! If you want to learn more about this, check out the ZFS Administration guide on Compression and Deduplication.

# pwd
/root
# ls -l
total 163908
-rw-------. 1 root root      1386 Jun  6  2013 anaconda-ks.cfg
-rw-r--r--. 1 root root      8526 Jun  6  2013 install.log
-rw-r--r--. 1 root root      3314 Jun  6  2013 install.log.syslog
-rw-r--r--  1 root root 167813770 Sep 10 03:53 NASA_access_log_Aug95
# head -n 1 NASA_access_log_Aug95 
in24.inetnebr.com - - [01/Aug/1995:00:00:01 -0400] "GET /shuttle/missions/sts-68/news/sts-68-mcc-05.txt HTTP/1.0" 200 1839
# du -h NASA_access_log_Aug95
161M    NASA_access_log_Aug95
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      8.4G  1.3G  6.7G  17% /
tmpfs                 230M     0  230M   0% /dev/shm
/dev/sda1             485M   54M  406M  12% /boot
e37pool               395M     0  395M   0% /e37pool
e37pool/project-a     395M     0  395M   0% /e37pool/project-a
e37pool/project-b     395M     0  395M   0% /e37pool/project-b
e37pool/project-c     395M     0  395M   0% /e37pool/project-c
e37pool/project-d     395M     0  395M   0% /e37pool/project-d
# zfs set compression=lz4 e37pool/project-a
# cp NASA_access_log_Aug95 /e37pool/project-a
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      8.4G  1.3G  6.7G  17% /
tmpfs                 230M     0  230M   0% /dev/shm
/dev/sda1             485M   54M  406M  12% /boot
e37pool               364M     0  364M   0% /e37pool
e37pool/project-a     395M   32M  364M   8% /e37pool/project-a
e37pool/project-b     364M     0  364M   0% /e37pool/project-b
e37pool/project-c     364M     0  364M   0% /e37pool/project-c
e37pool/project-d     364M     0  364M   0% /e37pool/project-d
# zfs list
NAME                USED  AVAIL  REFER  MOUNTPOINT
e37pool            31.7M   364M    35K  /e37pool
e37pool/project-a  31.3M   364M  31.3M  /e37pool/project-a
e37pool/project-b    30K   364M    30K  /e37pool/project-b
e37pool/project-c    30K   364M    30K  /e37pool/project-c
e37pool/project-d    30K   364M    30K  /e37pool/project-d
# zfs get compressratio e37pool/project-a
NAME               PROPERTY       VALUE  SOURCE
e37pool/project-a  compressratio  5.12x  -

Actually, speaking of deduplication, lets chat about that. I was not actually able to get deduplication working on CentOS 6.5 using ZFS 0.6.3, and I will be honest, I have never actually used deduplication in production before, so it was totally new for me. I did play around with it for a couple hours and nothing seemed to work. I will likely try a dev version and see if that does anything. Anyways, it probably does not matter, as deduplication is not recommended unless you really know what you are doing, as talked about in the Things Nobody Told You About ZFS page. There is also a couple articles that mention, for every TB of space you want to dedupe it takes roughly 3.5 to 5 GBs of RAM to maintain the dedupe table. So, in our example with 100TB of storage, that would be roughly 350 to 500GBs of RAM just for the deduplication table. You will want to heavily research this before enabling it. I have included links to these three sites below.

To finish off our requirements for the project a dataset, lets set the quota, by running zfs set quota=150M e37pool/project-a, but you can also configure a storage reservation, by running zfs set reservation=150M e37pool/project-a. By running df again, you can see that we have limited the amount of space available to the project a dataset, but we have also reserved that space, so that someone else in our pool cannot use it. If we jump back to the diagram for a minute, you can think of the quota and reservation as hard walls that carve off the project a dataset from the rest of the pool.

# zfs set quota=150M e37pool/project-a
# zfs set reservation=150M e37pool/project-a
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      8.4G  1.3G  6.7G  17% /
tmpfs                 230M     0  230M   0% /dev/shm
/dev/sda1             485M   54M  406M  12% /boot
e37pool               245M     0  245M   0% /e37pool
e37pool/project-a     150M   32M  119M  21% /e37pool/project-a
e37pool/project-b     245M     0  245M   0% /e37pool/project-b
e37pool/project-c     245M     0  245M   0% /e37pool/project-c
e37pool/project-d     245M     0  245M   0% /e37pool/project-d

Actually, now that we have this diagram up, lets play around with snapshots for project Bs dataset. Snapshots allow you to capture the filesystem state, similar to how other filesystems do this, but ZFS allows you to explore these on-line.

Lets change into the e37pool/project-b directory and create some files. Lets say we are working on an important paper, called paper zfs 101, and we will add some example text in here, and also create some example directories.

# cd /e37pool/project-b
# ls -l
total 0
# cat > paper-zfs-101.txt
The quick brown fox jumped over the lazy dog.
# ls -la
total 4
drwxr-xr-x 2 root root  3 Sep 10 05:03 .
drwxr-xr-x 6 root root  6 Sep 10 04:43 ..
-rw-r--r-- 1 root root 46 Sep 10 05:03 paper-zfs-101.txt
# mkdir testing1 testing2 testing3
# ls -la
total 6
drwxr-xr-x 5 root root  6 Sep 10 05:04 .
drwxr-xr-x 6 root root  6 Sep 10 04:43 ..
-rw-r--r-- 1 root root 46 Sep 10 05:03 paper-zfs-101.txt
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing1
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing2
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing3

Finally, lets take a snapshot, by running, zfs snapshot e37pool/project-b@, this at sign denotes the pool and dataset we want to capture, then everything to the right of the at sign, is the snapshot name, so lets pass in a dynamic date, so that on each run it will be different. You can list the snapshots by running, zfs list -t snapshot, and as you can see we have our snapshot here. What is cool about these snapshots, is that they only contain the changes between, when the snapshot was taken, and the current state. So, they can be extremely small if no changes are being made.

# zfs snapshot e37pool/project-b@`date +%F`
# zfs list -t snapshot
NAME                           USED  AVAIL  REFER  MOUNTPOINT
e37pool/project-b@2014-09-10      0      -  33.5K  -

Lets just clear the page so we can start fresh. Okay, at this point we have our snapshot of the project b dataset, so lets delete our important file called paper zfs 101. Okay, so the file is gone. ZFS snapshots are really cool, in that you can actually get access to the live, by going into this hidden directory called .zfs, you will notice that it is not listed here. In this .zfs directory we can see a listing of all snapshots for this dataset. Lets just list the snapshots again by running zfs list -t snapshot, and you will notice that these two values match. You could have thousands of snapshots and they would all show in this hidden .zfs directory.

# ls -l
total 6
-rw-r--r-- 1 root root 46 Sep 10 05:03 paper-zfs-101.txt
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing1
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing2
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing3
# rm paper-zfs-101.txt 
rm: remove regular file `paper-zfs-101.txt'? y
# ls -la
total 8
drwxr-xr-x 5 root root 5 Sep 10 05:19 .
drwxr-xr-x 6 root root 6 Sep 10 04:43 ..
drwxr-xr-x 2 root root 2 Sep 10 05:04 testing1
drwxr-xr-x 2 root root 2 Sep 10 05:04 testing2
drwxr-xr-x 2 root root 2 Sep 10 05:04 testing3
# cd .zfs
# pwd
/e37pool/project-b/.zfs
# ls -la
total 2
dr-xr-xr-x 1 root root 0 Sep 10 04:43 .
drwxr-xr-x 5 root root 5 Sep 10 05:19 ..
dr-xr-xr-x 2 root root 2 Sep 10 05:16 shares
dr-xr-xr-x 3 root root 3 Sep 10 05:09 snapshot
# cd snapshot/
# ls -la
total 2
dr-xr-xr-x 3 root root 3 Sep 10 05:09 .
dr-xr-xr-x 1 root root 0 Sep 10 04:43 ..
drwxr-xr-x 5 root root 6 Sep 10 05:04 2014-09-10
# zfs list -t snapshot
NAME                           USED  AVAIL  REFER  MOUNTPOINT
e37pool/project-b@2014-09-10  20.5K      -  33.5K  -
# cd 2014-09-10/

Lets just clear the screen again, because this is getting a little busy with all the text. So, as you can see here, we are in our snapshot folder, and our deleted file is here. We can use the snapshot to restore files, by simply copying them from the live snapshot over to our project-b directory. Lets just go back to the project-b directory and verify it worked. Pretty cool.

# pwd
/e37pool/project-b/.zfs/snapshot/2014-09-10
# ls -l
total 6
-rw-r--r-- 1 root root 46 Sep 10 05:03 paper-zfs-101.txt
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing1
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing2
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing3
# cat paper-zfs-101.txt 
The quick brown fox jumped over the lazy dog.
# cp -a paper-zfs-101.txt /e37pool/project-b
# cd /e37pool/project-b
# ls -la
total 9
drwxr-xr-x 5 root root  6 Sep 10 05:21 .
drwxr-xr-x 6 root root  6 Sep 10 04:43 ..
-rw-r--r-- 1 root root 46 Sep 10 05:03 paper-zfs-101.txt
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing1
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing2
drwxr-xr-x 2 root root  2 Sep 10 05:04 testing3
# cat paper-zfs-101.txt 
The quick brown fox jumped over the lazy dog.

Snapshots are so easy to work with and can take so little space if you are not changing lots of large files. Snapshots are especially great for things like home directories. Say for example, that you configure a cron to take a ZFS snapshot Monday to Friday 9am to 5pm. This will allow you to go back throughout the week and retrieve files if someone mistakenly deletes something. This is not a replacement for backups, but does give you some extra flexibility. You can actually do all sorts of things with snapshots, like mounting a clone somewhere else, or restoring to a point in time, say for example after a botched upgrade.

Alright, I have definitely talked for way to long. So, hopefully at this point, you have a pretty good idea of what ZFS on Linux is, and some of the advanced features it offers. You will also notice that we did everything on-line, in that we did not have to unmount datasets, or do any funny business, it just works.

Metadata
  • Published
    2014-09-10
  • Duration
    18 minutes
  • Download
    MP4 or WebM
You may also like...