Using rsync to synchronize a local and remote directory
Recently I had moved my blog from WordPress to a custom python script that generates static HTML pages. After generating files I need to copy them to my web servers. While it is easy enough to FTP or SCP the files from my local machine to the remote web servers. I am looking for a little more elegant and automated solution. For that reason I have chosen to use the
What is rsync
In the simplest terms
rsync is a tool that copies files from one place to another.
In a more detailed explanation
rsync does more than just copy files, it will read the source and destination directories and only copy the files that are new or updated. This makes
rsync perfect for copying my HTML files; as not every file will be changed.
The fact that
rsync only copies new or updated files also makes
rsync a great option for cases where there is a large and active directory. Recently I had to copy a multi terabyte file system from one server to another. The only problem was that file system was extremely active. Even though the copy took a long time to complete, I was able to use
rsync to single out and copy only the changed files.
The following will outline setting up a cronjob that uses
rsync to keep two directories synchronized.
On most systems
rsync is installed by default, if
rsync is not installed it can be installed with
# apt-get install rsync
Create an SSH Key
While you can run
rsync without setting up SSH keys; because we want the copy to be unattended we will need to setup SSH keys. I previously covered how to setup SSH keys, so I will keep the below instructions basic. In my case I will be executing
rsync as an unprivileged user and will need to create the SSH keys as that specific user.
$ su - testuser $ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/data/web/testuser/.ssh/id_rsa): Created directory '/data/web/testuser/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /data/web/testuser/.ssh/id_rsa. Your public key has been saved in /data/web/testuser/.ssh/id_rsa.pub.
Before we can SSH to our server without a password we will need to copy the public key to the remote server, in my case I know that there is no
.ssh directory on the remote server already. I will be creating everything from scratch, if your remote system already has a
.ssh directory you will need to copy the public key into the
authorized_keys file manually.
Make sure the Remote user has a shell
In general I always make my application users with no login shell. Since we will be copying files over SSH the user will require a login shell. The below command will change the testuser's login shell to
# usermod -s /bin/bash testuser
Create the .ssh directory on the Remote Server
The below commands will create a basic
.ssh directory and copy the local systems public key to the remote systems
On Remote Server:
$ mkdir .ssh $ chmod 700 .ssh/
On Local Server:
$ scp .ssh/id_rsa.pub [email protected]:~testuser/.ssh/authorized_keys
On Remote Server:
# chown testuser:webusr ~testuser/.ssh/authorized_keys
Test the SSH keys
Before moving on it is a good idea to test that the SSH keys work. You can do this by opening a simple SSH connection to the remote system from the local system.
On Local Server
$ ssh remote.example.com Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-virtual x86_64)
Now that we can SSH to the remote host without entering a password we can use
rsync to copy the directories.
Testing with the dry run flag
Before going willy nilly and copying a directory and all of it's contents it is a good idea to test the
rsync command with the
$ rsync -avzr --dry-run public_html remote.example.com:~/ < output truncated > public_html/js/ public_html/js/bootstrap.js public_html/js/jquery-1.10.2.js sent 27035 bytes received 2185 bytes 11688.00 bytes/sec total size is 4059365 speedup is 138.92 (DRY RUN)
While it may look like everything has copied
rsync hasn't actually copied anything. This is great for testing the syntax of the command and making sure that the correct directories are being targeted.
Running the rsync manually
Now that we have tested
rsync we are ready to copy the directory for real. If you are using
rsync to copy a large directory I highly suggest using the
nohup commands to leave the process running after your terminal exits.
$ rsync -avzr public_html remote.example.com:~/ public_html/2013/12/23/yum-plugins-verifying-packages-and-configurations-with-yum-verify/index.html public_html/feed/index.xml sent 85615 bytes received 32456 bytes 33734.57 bytes/sec total size is 4059365 speedup is 34.38
Before we move on I want to break down the
-avzr flags that were given.
rsyncinto archive mode which makes it retain file attributes such as permissions and ownership similar to the
rsyncinto a verbose mode, this will make
rsyncoutput status of the copy.
rsyncto compress files during the copy, this will save time for slow network connections.
rsyncto recursively copy files and directories.
Using the delete flag to remove old files
rsync has the ability to remove files that are on the destination directory but not the source directory. This can be very helpful in my case as I may remove an HTML file from time to time from my source directories and do not want to be bothered having to remove it on each web server. This can also be helpful if you are setting up
rsync to copy a large and active directory that will take several days to copy. This means you can run
rsync a second time and remove any unneeded files. To enable this simply add the
--delete flag to the
$ rsync -avzr --delete public_html remote.example.com:~/ sending incremental file list public_html/ deleting public_html/html deleting public_html/1.txt sent 25230 bytes received 376 bytes 10242.40 bytes/sec total size is 4059365 speedup is 158.53
Setup an rsync cronjob
Now that the
rsync command is tested and we know that we can login to the remote server without user input; we can now place our command into a cronjob. This will allow us to run an
rysnc command at a scheduled interval throughout the day.
$ crontab -e
*/0 * * * * /usr/bin/rsync -avzr --delete /data/web/testuser/public_html remote.example.com:/data/web/testuser/ > /dev/null 2>&1
The above example will execute the
rsync command every hour, while this is good for a directory with a little bit of data this would not work very well for a directory with ton's of data. If you are running
rsync to copy large amounts of data it may be better to create a wrapper script or use this one to ensure that other
rsync jobs aren't already running.
rsync is a great tool to use when you want to mirror any directory to another system. There are a ton of options that give you great flexibility to change the way
rsync copies files and what it does with the files once they are copied.