Episode #6 - Locate files by name

Loading the player...

About Episode - Duration: 3 minutes, Published: 2013-06-09

In this episode we are going to review the locate and updatedb commands provided by the mlocate package. Locate is a useful command for finding files quickly. We will also look at updatedb.conf, reviewing the PRUNEPATHS option, using this to limit the stress on your servers.

Download: mp4 or webm

Get notified about future content via the mailing list, follow @jweissig_ on Twitter for episode updates, or use the RSS feed.

Links, Code, and Transcript


In this episode, we are going to review the locate command, which is provided via the mlocate package. Lets take a look at the mlocate package information by running, rpm query info mlocate.

# review mlocate package info
$ rpm --query --info mlocate
Name        : mlocate                           Relocations: (not relocatable)
Version     : 0.22.2                            Vendor: CentOS
Release     : 4.el6                             Build Date: Wed 10 Oct 2012 09:06:25 AM UTC
Install Date: Sun 09 Jun 2013 11:47:12 PM UTC   Build Host: c6b10.bsys.dev.centos.org
Group       : Applications/System               Source RPM: mlocate-0.22.2-4.el6.src.rpm
Size        : 285873                            License: GPLv2
Signature   : RSA/SHA1, Wed 10 Oct 2012 11:56:46 AM UTC, Key ID 0946fca2c105b9de
Packager    : CentOS BuildSystem http://bugs.centos.org
URL         : https://fedorahosted.org/mlocate/
Summary     : An utility for finding files by name
Description :
mlocate is a locate/updatedb implementation.  It keeps a database of
all existing files and allows you to lookup files by name.

The 'm' stands for "merging": updatedb reuses the existing database to avoid
rereading most of the file system, which makes updatedb faster and does not
trash the system caches as much as traditional locate implementations.

mlocate is a helpful package for finding files by name, at its core are two executable utilities, locate and updatedb. Locate is used to find files, and updatedb usually runs in the background indexing the files on your system.

Now that we know a little about the locate command, lets try it out.

Lets says for example, you are looking for all the files in /etc with yum in their name. Lets run, locate yum, as you can see this yields quite a bit of output, lets pipe the output to grep, and filtering for etc. That is a little better, but you can always add additional filter, like so.

# locate yum files in /etc
$ locate yum |grep etc |grep repo
/etc/yum.repos.d
/etc/yum.repos.d/CentOS-Base.repo
/etc/yum.repos.d/CentOS-Debuginfo.repo
/etc/yum.repos.d/CentOS-Media.repo
/etc/yum.repos.d/CentOS-Vault.repo
/etc/yum.repos.d/puppetdeps.repo

It should be noted that you can do the filtering via the locate command itself, but I find piping the output to grep much easier.

Now that we know about the mlocate package, and the locate command, lets dig a little deeper. We are going to run, rpm query list mlocate, and pipe the output to grep again, this time using an invert match to filter out any files with /usr/share. This will hide mostly man pages from the listing, making the output much cleaner.

# review files installed via the mlocate package
$ rpm rpm --query --list mlocate |grep -v usr/share
/etc/cron.daily/mlocate.cron
/etc/updatedb.conf
/usr/bin/locate
/usr/bin/updatedb
/var/lib/mlocate
/var/lib/mlocate/mlocate.db

Included in the mlocate package, is a cron script that executes updatedb daily, a config file which can be used to tell the indexer, updatedb, about your system, and the locate and updatedb commands, and finally the mlocate database. On my system this database is roughly 600k.

Lets take a look at the /etc/updatedb.conf file. You can easily tune the updatedb indexer via these options. I typically use prune paths to enter anything that should be excluded.

For example, lets say you have a large backup server, mail server, or storage server containing millions of files. With these types of servers, it is not uncommon for the mlocate database to grow very large, I am talking about GBs in size. This not only puts additional load on your server, but also thrashes the caches. You will definitely want tweak this configuration file to exclude your storage. You can also disable the cron or remove the mlocate package entirely.

Hopefully now you know a little more about the mlocate package.