Episode #26 - CLI Monday: head, tail, and wc

Loading the player...

About Episode - Duration: 4 minutes, Published: 2014-07-07

In this episode, I wanted to introduce a new episode type called CLI Monday. The idea being, we will review useful command line utilities every Monday. Today, we are going to review the head, tail, and wc commands.

Download: mp4 or webm

Get notified about future content via the mailing list, follow @jweissig_ on Twitter for episode updates, or use the RSS feed.

Links, Code, and Transcript


In this episode, I wanted to introduce a new episode type called CLI Monday. The idea being, we will review useful command line utilities every Monday. Today, we are going to review the head, tail, and wc commands.

These three commands are actually really basic, but come in very handy, because in my day to day work I often find myself working with or viewing text files. To highlight why these commands can be useful, I searched on-line for sample log files, and NASA is nice enough to publish some. So, I went ahead and downloaded a web server log file from 1995, which we will use in our examples today.

wget ftp://ita.ee.lbl.gov/traces/NASA_access_log_Jul95.gz
gunzip NASA_access_log_Jul95.gz

As you can see we have our log file downloaded and sitting in a directory, but how do you get a feeling for what the file looks like, along with how many entries it has, without actually loading the entire file into an editor? Lets start off by using the head command, which will by default grab the first 10 lines from the file and output them. So, without loading the entire file, we are able to get a sense of what the format is like.

head NASA_access_log_Jul95 

Next, the wc command will give us a count of the lines in the file. I should mention that I almost always use the wc command with the dash l switch, so that the count is based off the amount of new lines in the file.

wc -l NASA_access_log_Jul95 

Finally, the tail command functions in a similar manner to the head command, except that by default it looks at the last 10 lines of the file.

tail NASA_access_log_Jul95 

So, based off these three commands, we now know what the file looks like, that it spans from July 1st, 1995 till July 28, 1995, and wc tells us that there are roughly 1.8 millions rows. I highly recommend checking out the manual pages for these commands as there are other switches you can use to alter their default behaviour. But this basically sums of what these commands do.

So, what would a real world use case be for these commands? Well, I often find myself reviewing log data while troubleshooting issues. For example, say that you are debugging an issue and have narrowed it down to a specific service, and this service logs events to a file. Rather than continually opening the log file to look at new events, the tail command allows you to follow the log file in real-time, printing out new events as they happen.

As a demonstration, I will reply the NASA web server log, and then we can look at how this works in practice. So, as you can see by running the wc -l command multiple times, that the file is continually getting new entries. Now, lets use the tail command with the dash f or follow option, so that we get to see new events in real-time. I honestly cannot tell you how useful this is for debugging and reproducing problems, mainly by doing something and then watching the log for new data in real-time.

wc -l NASA_access_log_Jul95-replay
wc -l NASA_access_log_Jul95-replay
wc -l NASA_access_log_Jul95-replay
tail -f NASA_access_log_Jul95-replay 

By the way, as a bonus, here is the commands used for replaying the NASA web server log. I will leave this as an exercise for you to figure out how it works.

cat NASA_access_log_Jul95 | while read line;
    do sleep 0.3;
    echo "$line" >>NASA_access_log_Jul95-replay; 
done