Backup Docker volumes (and restore them) done right

This article explains how to use tar correctly, to backup Docker volumes and restore them. I explain why two top-ranked tutorials are not doing a good job, by taking them apart. Finally, I give hints for creating backups of Docker volumes in production. Table Of Contents IntroductionHow to do it wrong official Docker tutorialHow to do it wrong HowToGeek tutorialHow to correctly backup Docker volumes and restore themMaking backups in productionConclusion Introduction Making a local backup of a Docker volume and restoring it is a common task if you operate container-based software with Docker engine directly (not using Kubernetes). Unfortunately, most search results you find on the internet for a query such as Docker volume backup yield incorrect and dangerous tutorials, whose makers have either not understood how the underlying tools work, or have not bothered explaining some obscure command line parameters which you need to adapt. In this piece, I will explain why these tutorials are wrong, and how to actually correctly back up and restore Docker volumes. How to do it wrong official Docker tutorial Case study #1 is the official Docker tutorial. Following their commands to the letter does work, but adapting it to your (real-world) use case, e.g. a MySQL container, will fail. At the time of writing this article, the tutorial does this: Create a new container named dbstore that creates and mounts an anonymous volume (if you do not know what this means, look for anonymous here) to /dbdata in the container. The ultimate goal is to back up and restore the content of that anonymous volume. The full command in the tutorial is:docker run -v /dbdata --name dbstore ubuntu /bin/bash Create a temporary container using the ubuntu:latest image, used to back up the volume. That temporary container is given two volume mounts: the one from the dbstore container (using --volumes-from dbstore) and a bind mount that maps the working directory of the host to /backup in the container. In the container, they run tar cvf /backup/backup.tar /dbdata. The full command is:docker run --rm --volumes-from dbstore -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata To restore the backup, they run another temporary ubuntu container with the same volume mounts as used the previous temporary ubuntu container. Inside the container, they run multiple commands, so they have to change the start command (CMD) to bash -c "..." to make this work. The full command is:docker run --rm --volumes-from dbstore2 -v $(pwd):/backup ubuntu bash -c "cd /dbdata && tar xvf /backup/backup.tar --strip 1" Lets analyze the reasons why following that tutorial is not a good idea: The tutorial assumes that your volume is mounted to some path on the root level in the container, e.g. /dbdata in the tutorial. In practice, most volumes are mounted into a deeply nested path, e.g. /var/lib/mysql for a MySQL container. So if you were to follow the Docker tutorial and simply replaced /dbdata with /var/lib/mysql , the directory structure in the created backup archive starts with the relative path var. The reason is that the directory structure that tar creates is relative to the current containers working directory, if possible, or relative to / (root) otherwise. In this case, the working directory of the ubuntu container is already / (root), therefore tar creates the directory structure starting from there. When restoring the backup, the tutorial uses --strip 1 which tells tar to remove only the very first segment of the paths in the tar file. For the scenario of the tutorial this works fine: dbdata is stripped, because cd /dbdata made sure the working directory (the extraction destination) is set correctly. But for a MySQL backup, tar will try to extract a lib folder into /var/lib/mysql, so you end up with your backup data being restored to /var/lib/mysql/lib/… which wont work. If you adapted that tutorial to MySQL and observed that tar does not throw any errors, it looks as if the restore process had worked, even though it did not. The tutorial glosses over the fact that you should first delete the existing data prior to restoring a backup. Otherwise, the restore process will only overwrite existing data, but leave other (obsolete) data in place. While this is not an issue if the target volume has just been created (and is therefore empty), the situation is different if the target volume already contained data, and the goal of the restore operation is to revert the volume to a previous state. Here, not deleting the old data first means that (after the restore process finished) your volume contains a mixture of data prior and after the point of time of restore, which can confuse the software reading the restored data. How to do it wrong HowToGeek tutorial This tutorial is another one that pops up at the very top of the search result page of a big search engine. It basically adapts the above tutorial of the Docker folks to a realistic scenario, where you want to backup the volume of a MySQL container. By default, the MySQL engine running in the official mysql image reads and writes data to /var/lib/mysql. At the time of writing this article, the tutorial does this: Create a new container named mysql with a volume named mysql_data bound to /var/lib/mysql. The full command of the tutorial is:docker run -d --name mysql -v mysql_data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=mysql mysql:8 To create the backup: see step 2 of the other Docker tutorial above: this tutorial does exactly the same, just using different mount paths. Full command:docker run --rm --volumes-from mysql -v $PWD:/backup-dir ubuntu tar cvf /backup-dir/mysql-backup.tar /var/lib/mysql To restore the backup: see step 3 above. Full command:docker run --rm --volumes-from mysql -v $PWD:/backup-dir bash -c "cd /var/lib/mysql && tar xvf /backup-dir/mysql-backup.tar" The HowToGeek tutorial has two problems: Following the tutorial to the letter does not work. The underlying reason is similar to reason #1 from above: the directory structure of the produced mysql-backup.tar file starts with var. In the restore process, the author does not use the --strip argument for tar at all, which means that the resulting restored directory structure of your backup ends up in /var/lib/mysql/var/lib/mysql where it wont do any good. Same as reason #2 from above. Also, both tutorials decided that wasting disk space is somehow a good idea, by using tar without any compression. No further questions, your honor. How to correctly backup Docker volumes and restore them In short, I recommend the following commands: # Define the name of your volume, on macOS/Linux VOLUME="replace with the name of your volume" # Define the name of your volume, on Windows (PowerShell) $VOLUME="replace with the name of your volume" # Backup: docker run --rm -v "${VOLUME}:/data" -v "${PWD}:/backup-dir" ubuntu tar cvzf /backup-dir/backup.tar.gz /data # Restore: docker run --rm -v "${VOLUME}:/data" -v "${PWD}:/backup-dir" ubuntu bash -c "rm -rf /data/{*,.*}; cd /data && tar xvzf /backup-dir/backup.tar.gz --strip 1"Code language: Bash (bash) You can of course exchange /data for any other destination in the container, e.g. /var/lib/mysql. Note: do not be alarmed by the first two lines of the output of the restore command: rm: refusing to remove '.' or '..' directory: skipping '/data/.' rm: refusing to remove '.' or '..' directory: skipping '/data/..'Code language: JavaScript (javascript) This output is generated by rm -rf /data/{*,.*} which deletes all files in the volume, including files starting with a dot. It is basically a shorthand for running two commands: rm -rf /data/* (which deletes all files and folders except for those starting with a dot) and rm -rf /data/.* (only deletes files starting with a dot). Unfortunately, this means that the rm command also tries to delete the (pseudo) files named . and .. (the first two lines you see whenever you run ls -la), which fails, because they are not real files and thus cannot be deleted. Here are a few pointers regarding my solution, and why this works better than the discussed tutorials: We explicitly encode our knowledge about the volume to be backed up or restored into the backup & restore commands (particularly: the target path a backup is mounted to in the container that uses the volume). This is safer than relying on --volumes-from which hides the actual volumes (and their target paths) from you. We add the z flag to tar, to compress the archive with gzip, which avoids wasting space. We first delete the existing data in the volume, prior to restoring the tar.gz archive. Since the rm command exits with code 1 (because it cannot delete the pseudo files), we need to use a semicolon instead of && between the rm and the tar xvzf command, because && would have stopped executing after the first failing command. I leave it as an exercise to the reader whether the following restore command would also have worked: docker run --rm -v "${VOLUME}:/data" -v "${PWD}:/backup-dir" ubuntu bash -c "rm -rf /data/{*,.*}; tar xvzf /backup-dir/backup.tar.gz" Making backups in production The above scenario is actually just a toy example, for a one-off backup. Running the above command regularly (e.g. in a cron job) would product a full backup each time, which would consume too much space over time even when using compression. There are better ways to do regular backups, using incremental and differential storage techniques. Also, you should store the backup on a remote location, not on the same file system where your data itself is stored. Fortunately, there are dedicated tools which support these kinds of compressed, incremental remote backups. Battle-tested tools include: Duplicity, which is also dockerized as wernight/duplicity (non-official, but well-established) Restic, dockerized as restic/restic (official) Borg, dockerized as horaceworblehat/borg-server (non-official, rather new) While there are also higher-level tools that manage these tools, e.g. borgmatic or autorestic, I recommend that you keep it simple and rather use your chosen backup tool directly, learning about its intricacies. High-level tools add even more complexity, of which there is already plenty! Conclusion Although the task of making a backup of a Docker volume (or restoring it) just using tar seems simple, this is a great example of hidden complexity, where the devil is in the details. It is a signal to all DevOps folks out there: blindly following tutorials does not always work you still need to understand the details. Whenever possible, I live by the motto if I dont know why something works, I wont know how to fix it once it fails. Therefore, investing time into studying the tools further is time well spent. This scenario also demonstrated that just because you do not see an error message, it does not mean that the executed command worked correctly. Did you run into similar issues of this kind? Let me know in the comments.

zum Artikel gehen

DMS Paperless-ngx Windows Docker Desktop Umsetzung

Question: Hallo, wollte mal mein Chaos neu sortieren und DMS einsetzen. Sagte mir alles nicht so zu. Nun bin ich bei Paperless-ngx auf Docker hängen geblieben.... 0 Kommentare, 217 mal gelesen.

zum Artikel gehen

GSA Backup Manager v2.3.2

The GSA Backup Manager is a program that will help you create regular backups of your computer files and store them either on external devices or on the local file system. You can choose what files to backup in many ways by filter settings. Changes sinc

zum Artikel gehen

heise-Angebot: iX-Workshop: Docker und Kubernetes für Cloud-native Softwareentwicklung

Cloud-native Anwendungsentwicklung – der schnelle Praxiseinstieg mit Docker und Kubernetes. Es sind noch Plätze frei.

zum Artikel gehen

3 Wege, um ein Backup einer WordPress Website zu erstellen

Die Datensicherung einer WordPress Website kann auf unterschiedlichen Wegen erfolgen und hängt von den eigenen Anforderungen und Vorkenntnissen ab. Je nachdem kann man das Backup selbst in die Hand nehmen, ein Plugins verwenden oder über einen Hoster abwi

zum Artikel gehen

WordPress-Backup: Schritt-für-Schritt-Anleitung

Ein WordPress-Backup ist in jeder Situation von großer Bedeutung. Selbst wenn es jetzt gerade kein Problem mit der eigenen Website gibt, kann eine aktuelle Kopie der Site viele Probleme ersparen. Im Durchschnitt werden täglich 30.000 Websites gehackt. All

zum Artikel gehen