- Wikipedia: head (Unix)
- UNIX man pages : head
- Wikipedia: tail (Unix)
- UNIX man pages : tail
- Wikipedia: wc (Unix)
- UNIX man pages : wc
- NASA-HTTP Log File
In this episode, I wanted to introduce a new episode type called CLI Monday. The idea being, we will review useful command line utilities every Monday. Today, we are going to review the head, tail, and wc commands.
These three commands are actually really basic, but come in very handy, because in my day to day work I often find myself working with or viewing text files. To highlight why these commands can be useful, I searched on-line for sample log files, and NASA is nice enough to publish some. So, I went ahead and downloaded a web server log file from 1995, which we will use in our examples today.
wget ftp://ita.ee.lbl.gov/traces/NASA_access_log_Jul95.gz gunzip NASA_access_log_Jul95.gz
As you can see we have our log file downloaded and sitting in a directory, but how do you get a feeling for what the file looks like, along with how many entries it has, without actually loading the entire file into an editor? Lets start off by using the head command, which will by default grab the first 10 lines from the file and output them. So, without loading the entire file, we are able to get a sense of what the format is like.
Next, the wc command will give us a count of the lines in the file. I should mention that I almost always use the wc command with the dash l switch, so that the count is based off the amount of new lines in the file.
wc -l NASA_access_log_Jul95
Finally, the tail command functions in a similar manner to the head command, except that by default it looks at the last 10 lines of the file.
So, based off these three commands, we now know what the file looks like, that it spans from July 1st, 1995 till July 28, 1995, and wc tells us that there are roughly 1.8 millions rows. I highly recommend checking out the manual pages for these commands as there are other switches you can use to alter their default behaviour. But this basically sums of what these commands do.
So, what would a real world use case be for these commands? Well, I often find myself reviewing log data while troubleshooting issues. For example, say that you are debugging an issue and have narrowed it down to a specific service, and this service logs events to a file. Rather than continually opening the log file to look at new events, the tail command allows you to follow the log file in real-time, printing out new events as they happen.
As a demonstration, I will reply the NASA web server log, and then we can look at how this works in practice. So, as you can see by running the wc -l command multiple times, that the file is continually getting new entries. Now, lets use the tail command with the dash f or follow option, so that we get to see new events in real-time. I honestly cannot tell you how useful this is for debugging and reproducing problems, mainly by doing something and then watching the log for new data in real-time.
wc -l NASA_access_log_Jul95-replay wc -l NASA_access_log_Jul95-replay wc -l NASA_access_log_Jul95-replay
tail -f NASA_access_log_Jul95-replay
By the way, as a bonus, here is the commands used for replaying the NASA web server log. I will leave this as an exercise for you to figure out how it works.
cat NASA_access_log_Jul95 | while read line; do sleep 0.3; echo "$line" >>NASA_access_log_Jul95-replay; done