CLI Monday: How to Keep Your SSH Sessions Alive

In this episode, we are go to review a common cause of broken ssh sessions when connecting from home, coffee shops, or hotels. It can be very annoying to be working on something, going to grab a coffee, and coming back to a dropped or frozen ssh connection.

Let me give you a little demo of what a broken ssh session actually looks like. Say for example, that we ssh into a remote server from home. We are happily working away on something, we get distracted with something else, and let the ssh connection go idle for 10 minutes, then when we return there is a “Write failed: Broken pipe” error message, and our ssh session has been disconnected.

$ ssh remote.example.com
vagrant@localhosts password: 
Last login: Mon Nov 17 04:31:42 2014 from 10.0.2.2
                          _           _                       _       
     Welcome to          | |         (_)                     | |      
  ___ _   _ ___  __ _  __| |_ __ ___  _ _ __     ___ __ _ ___| |_ ___ 
 / __| | | / __|/ _` |/ _` | '_ ` _ \| |  _ \   / __/ _` / __| __/ __|
 \__ \ |_| \__ \ (_| | (_| | | | | | | | | | | | (_| (_| \__ \ |_\__ \
 |___/\__, |___/\__,_|\__,_|_| |_| |_|_|_| |_|  \___\__,_|___/\__|___/
       __/ |                                                          
      |___/   #39 - CLI Monday: How to Keep Your SSH Sessions Alive
                  
[vagrant@remote ~]$ systemctl status sshd
sshd.service - OpenSSH server daemon
   Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled)
   Active: active (running) since Tue 2014-11-11 06:15:57 UTC; 5 days ago
 Main PID: 862 (sshd)
   CGroup: /system.slice/sshd.service
           ??862 /usr/sbin/sshd -D

[vagrant@remote ~]$ 
Write failed: Broken pipe

It is also common to have the session just hang, or freeze, after letting it sit idle for a while, and the connection will eventually timeout and terminate. This is a problem that is very annoying, and has personally bitten me, along with several coworkers. Using something like screen can make this less painful, but you still have to reconnect many times. So, I thought it might be useful to share my thoughts on the topic, and look at ways to solve the problem.

[vagrant@remote ~]$ 
Connection to localhost closed.

There is an OpenSSH FAQ item that sheds some light on this issue. You can find this page linked to in the episode notes below. I wanted to briefly chat about this FAQ, and then look at it in detail via some diagrams. So the questions is, my ssh connection freezes or drops out after N minutes of inactivity. The answer says that, this is usually the result of packet filtering, or a NAT device timing out TCP connections due to inactivity. Then that there are a couple ssh client and server configuration settings we can use to tune this, namely ServerAliveInterval and ClientAliveInterval. The answer goes on to say, that using these settings will ensure that the connection is kept fresh in a packet filtering firewalls or a NAT devices connection table.

So, what does this actually mean? Well, I thought I would throw together some diagrams to illustrate the problem, and to show you what these ssh configuration settings actually do. So, lets say in our example setup, which actually mimics my use case very closely, we have a ssh client, and several ssh servers. I typically connect to AWS, work, and school, from home. I wanted to focus on the connection path between the ssh client and server, as this is typically the cause of dropped or frozen ssh connections. So, what does a typically connection path look like when coming from home, a coffee shops, or hotel? Well, we almost always connect through some type of NAT device, highlighted here in blue, this NAT device proxies our traffic and acts as an intermediary for us on the internet, as our connection eventually finds the destination ssh server. I do not want to spend to much time talking about the internal and external networks of the NAT device, but I have linked to the Network Address Translation Wikipedia page in the episode notes below.

Okay, so now that we have a basic idea of how the packets flow, what does an example ssh connection look like? Well, lets say we connect out to AWS. As successful connections are established, the NAT device, these are your modern cable modems, wifi routers, and coffee shop access points, which do Network Address Translation. They will add connections to some type of NAT connection or session table, like so. Same goes for connections to out to work, or school. The main causes of dropped or frozen ssh sessions are likely these NAT devices that almost all of us connect through when working remotely. For example, my home cable modem, which acts as a router, will drop idle connections after 5 minutes.

SSH Packet Filtering Firewall NAT Connection Table

In this example, lets say I am actively working on the AWS and work ssh sessions, but have left the school connection sit for more than 5 minutes. Well, my NAT router is going to come through after 5 minutes of inactivity and close the connection by deleting them from its connection table. Why does this happen? I have done plenty of reading on the topic, and it basically boils down to a couple things. One is that, these home routers are cheap, and maintaining the connection table costs limited system resources. The second answer, is that it is just a software timeout to clean up, what the router considers, to be stale or hung connections. Think about what the connection table looks like on one of those routers at a popular coffee shop, there could be thousands, or tens of thousands of active connections. So, cleaning up stale connections is a really good idea.

So, what can we do about our idle ssh connections being killed? Well, lets quickly hop back to the OpenSSH FAQ page, where it provides configuration suggestions for both ssh clients and servers, along with links to their respective man pages.

The ssh client man page provides this ServerAliveInterval setting along with an integer value, which allows you to send ssh noop or null commands through the encrypted ssh channel on a set interval, this fools routers into thinking the session is active, even when idle.

ServerAliveInterval

Sets a timeout interval in seconds after which if no data has been received from 
the server, ssh(1) will send a message through the encrypted channel to request a 
response from the server. The default is 0, indicating that these messages will 
not be sent to the server. This option applies to protocol version 2 only.

Lets hop over to the ssh server man page, as it provides the ClientAliveInterval setting along with a integer value. This works in the exact same way as the client setting, but these null packets from from the server, rather than the client. I thought it might make more sense to actually show you what these configuration settings do, through the use of some animated diagrams, so lets have a look.

ClientAliveInterval

Sets a timeout interval in seconds after which if no data has been received from 
the client, sshd(8) will send a message through the encrypted channel to request 
a response from the client. The default is 0, indicating that these messages will 
not be sent to the client. This option applies to protocol version 2 only.

In this example, we have established a ssh connection from home to work. Lets add this client side configuration setting called ServerAliveInterval with a value of 60, you can do this in the ssh global client configuration file located in /etc, or on a user by user basis, through their personal ssh configuration file, located in each users home directory. Once this setting is active, regardless of activity on your part, the ssh client will send noop commands through the ssh tunnel, keeping activity flowing through the connection, and the entire goal is to keep our ssh session fresh in the routers connection table.

On the flip side, we can add a ClientAliveInterval setting to the ssh servers configuration file. This does the exact same thing as the client configuration tweaks we just covered. Basically, regardless of activity on our part, the ssh server sends noop commands through the encrypted ssh tunnel, keeping the ssh session fresh in our routers connection table.

You might be wondering, what are some of the pros and cons of having this on the client side vs the server side? Well, there is nothing stopping you from doing both actually. Personally, I like to do this on the client side, as I typically initiate connects from the same places most of the time. Meaning that, no matter the ssh server I connect to, I know that the session should not be dropped due to a router thinking it is a stale session. That begs the question, in what instances would you want to do this on the server side? Well, most corporate networks, if they are running ssh on the internet, have some type of bastion host, or jump host, which funnels all ssh traffic through it. If you are running this type of server, then it would make sense to enable the ClientAliveInterval setting, as you will improve the user experience of many people who use your jump host, mainly from home or hotels while out on the road.

By now, you should know what the problem looks like, what causes it, and how to fix it. So, lets see what those changes actually look like in the real world. This is going to be a bit anticlimactic after the build up, as this is really simple. To add the client side configuration tweaks, all you need to have is a file in your home directory, under the .ssh directory, called config. As you can see, for all ssh hosts, we set a ServerAliveInterval of 60 seconds. You can really set this to any value that works for you, but 60 seconds seems to work well for me. If you had to create this file, then you should make sure it has the correct permissions, and change the ownership too, if needed. You can find these commands in the episode notes below.

$ cat ~/.ssh/config
Host *
  ServerAliveInterval 60

$ chmod 600 ~/.ssh/config
$ chown user:group ~/.ssh/config

Okay, so that is the way to configure client settings through home directories, but you can also do this via the files located in /etc/ssh too. You can modify the global ssh client configuration through the ssh_config file, or the server configuration, through the sshd_config file. Lets modify the global client configuration first, by opening it up in vi. My global client configuration file does not already have the value set, so I am just going to add the it along with a comment down at the bottom here, lets fix the formatting too. Once we save it, all users on this machine should see the benefit of this change, although if they have existing connections, they will not see the update until they reconnect.

$ sudo vi /etc/ssh/ssh_config

# keep ssh sessions fresh
ServerAliveInterval 60

What about if this machine is a ssh server? Well, lets add the values to the sshd_config file. Again, my current configuration does not have this setting right now, so lets just add it down here, along with a comment. Finally, lets restart the ssh server, or you could also reload it too, personal preference. So, now any new ssh connections into this server, should have a ClientAliveInterval noop command sent out every 60 seconds keeping our ssh connections fresh.

$ sudo vi /etc/ssh/sshd_config

# keep ssh sessions fresh
ClientAliveInterval 60

$ sudo service ssh restart

But wait, there is more, what about the guys and gals out there using the extremely popular Putty and WinSCP utilities on Windows? Well, I thought I would quickly show you how to change those setting too. I fired up a Windows virtual machine on AWS, install Putty and WinSCP, so we can see what those configuration settings look like.

Lets start with Putty, you can do this on a connection by connection basis, or add it to the default settings. To add the keepalive noop command, go over to the connection category, on the left hand side here. From there, you will see this section talking about sending null keep alive packets to the ssh server. Lets enter a value of 60 here. You can also enter TCP keep alive packets, this works at the TCP level, where the ssh noop commands actually work inside the encrypted tunnel, so it is harder for something doing packet inspection to see what is going on. Having both does not hurt anything. Then lets go back to the sessions category and save our changes.

The configuration within WinSCP is also easy to update. Lets open it up and have a look. If you have used this before, you might have saved sessions over here, to change the setting, go into the Advanced menu here. Head into the connection category over on the left hand side. Then in the right hand side, you will see something about keepalives, click the send null ssh packets radio button. Again, I like to send these packets every 60 seconds, but it really is up to you. My home router kills connections after 300 seconds, so I need to have something smaller than that. I figure 60 seconds should cover me, at home, a coffee shop, or at a hotel. Finally, lets click okay, and you would save the setting here.

So, if you ever have a ssh sessions drop out, or frozen due to a “Write failed: Broken pipe” error message, this episode should help you solve that problem! It helped me fix the issue I was having.

#39 - CLI Monday: How to Keep Your SSH Sessions Alive

Links, Code, and Transcript

You may also like...