In this episode, I wanted to talk about the differences between hard and symbolic links. We will look at why links are useful, the pros and cons of hard vs symbolic links, then have a look under the hood at inodes and filesystem metadata to see how links actually work.
Links in general serve the very useful function of acting as an alias or shortcut to a particular file or directory. We are all familiar with creating desktop shortcuts, well symbolic links are very similar to this, basically, we create a symbolic link that acts as a pointer to the real file, which in turn points to the data on the disk.
As you can see in the diagram here, we have created a firefox symbolic link, from our desktop, which points to user bin firefox file. Personally, I find symbolic link intuitive and easy to understand.
So, what about hard links? Well, hard links are a little different, in that, instead of creating a file, which acts as a pointer to the real file, we are essentially giving the real file a second name. As you can see in the diagram here, we created a firefox hard link from our desktop, but instead of pointing to the user bin firefox file using a symbolic link, we eventually, just gave this firefox file a second name, and that second name points to the data on disk. A long time again, when I was first learning about Linux, this concept went against my mental model of how files worked, I assumed files could only have one name but with hard links, this idea goes out the window, because files can have multiple names.
Lets look as some real life examples. So, I have created a directory called project, and in that directory, we have three files, foo, bar, and baz. To create links we use the ln command. Lets have a look at the man page for this command. So, as you can see, the ln command makes links between files, then down here, it says that the ln command, makes hard links by default, and to create symbolic links, you need to use a specific option.
Okay, so lets create a symbolic link first, which are often referred to just as sym links. So, we are going to create a sym link to this foo file. Lets run ln dash s, then the syntax is, target, then the new link name, so lets type foo as the target, and foo dash symlink as the new name.
ln -s foo foo-symlink
Running ls again, you can see that our new link was created, and it give this nice output visually indicating that this is a link to the foo file. You can also see this l letter indicated over here, this denotes that the file type if a symbolic link. Lets run the stat command on the foo file, you can see that it is a regular file, then lets do the same for the foo symlink, here it says that it is a symbolic link. This goes back to the diagram from earlier about symlinks just being pointers to the real files.
Okay, so what about hard links? Lets look at the same project directory again, with the same files. Lets create a hard link to the foo file. Since the ln command creates hard links by default, just run ln, the target, in this case foo, and the new file, lets call it foo dash hardlink. Then lets list the directory again.
ln foo foo-hardlink
One thing you will notice right away, is that it is not easy to tell that the foo and foo hardlink files are linked in some way, and that they are in fact pointing to the same blocks on disk. With symlinks at least you get this l indicator and you can visually see that it is a symbolic link. There is one tell tell though, there is the number 2 here, where before it was just a 1. So, lets run the stat commands again, against foo and the new foo hardlink. You will notice that the link count is now 2, and that the inode numbers match. This indicates that these two files are in fact pointing at the same blocks on disk. Eventually, this means this file has two names.
To really understand what is happening here, we need to know a little about inodes and how directories actually work. You can read all about this on the inode or index node wiki page, I have provided all these wiki page links in the episode notes below. To give you a simplified idea of how directories and inodes actually work, lets look at some diagrams.
You can think of a directory listing as a really simple database, with two columns. One column listing the file name, and the other column listing something called an inode or index node number. The index node number points to a second database which holds lots of metadata about each file. Things like the file size, device, ownership and permissions, timestamps, the link count, which we notice earlier, and this inode also points to the blocks on the disk, where our files contents actually sit. I should note, that the blocks diagram has been simplified, it is actually more complex, but if you want to read about it, check out the inode pointer structure wiki page listed in the episode notes below.
Okay, so now that we know about inodes, and that each file actually maps to an inode, and that each inode maps to the blocks on disk. Lets look at the differences between symbolic links and hard links again.
Symbolic links actually work like this. We create a symbolic link file, which actually creates a new inode entry. This inode 456, points to the physical path user bin firefox, and user bin firefox points to its own inode 123, and that inode points to the blocks on disk.
Hard links are much different in that there is no additional inode. All that happened was that the directory listing database was updated with a new filename, using the same inode entry as the file we are linking too. For this reason you cannot create hard links outside a specific filesystem, for the simple reason, that we are actually linking to the file contents, and the file contents will not exist on anther filesystem. This is actually were symbolic links some in really handy.
Lets jump back to our foo, bar, and baz files. In this example, the directory listing will look like this. We have the foo and foo hard files which has the exact same inode number, where as the foo sym file actually has a different inode number. We can verify this by running ls dash li, the i option will show the inode numbers. You can see the inode numbers listed on the left hand column. You will notice that foo and foo hardlink have the same inode number, which actually means that it is the same file with two names. Then down here, we have the symlink, which has its own inode number, and that inode acts as a pointer to the foo file.
Hopefully this has not been too confusing, I just wanted to show you the differences, and what was actually happening under the hood, so that you can really understand what is going on. I do not usually create hard links, because I do not generally expect a files to have multiple names, and if it does, then I expect to see a symlink.
When would you use a symbolic link over the a hard link? Well 99% of the time, I use symlinks, in fact, I can probably count the number of times I have used hard links, on one hand. I just find symlink more intuitive, and in a shared environment with multiple people using the same system, I want to error on the side of clarity. So my suggestion would to be use symlinks unless you have a good reason not too.
Just to recap, lets look at a summary of the differences between the two types of links.
- Easy to understand
- Works for files and directories
- Works across filesystems
- Hard to recognize
- Only works for files
- Not across filesystems
- Better performance
So, if you find yourself needing to create a link, hopefully this episode will clear up some of the differences between hard and symbolic links, and help you choose the correct type.