Backing up a nextcloud file share via webdav

The first question after having set up a NAS: what could we use it for? Mine is already running a timemachine server, but that’s barely making a dent in the disk space utilization of the system. Well, the one entity that stores almost all of our pictures, music, documents, and everything else is a nextcloud instance that is synced across all relevant devices. It has it’s own existing backup strategy, and I excluded it from the Mac’s timemachine scope for that reason - but now that we have our own storage option, why not add a backup path there? One more replica, one more physical location, one more fallback option.

Accessing the nextcloud share

The two obvious options for accessing the nextcloud file share that we are using are the nextcloud-cmd client, and the webdav protocol used by nextcloud to enable filesystem-style access. Normally the official nextcloud client would be the preferred option, but there are two major snags in this case:

  • the nextcloud client comes with UI support and pulls in 100s of megabytes of related packages that we neither need nor want on the NAS machine
  • as far as I could see, the client offers no option to define a one-way sync (mirroring) setup - but in the context of a backup solution, I definitely do not want any propagation from backup to server

So webdav to the rescue… ? The good part is that there is an option to mount webdav shares directly into the file system via davfs2 - and when that is possible we can mount the nextcloud share read-only, plus use all kinds of filesystem-based tools for synchronization.

Choosing a file synchronizer tool

The bad part about webdav is that it’s not fast. At all. The nextcloud share in question contains >160 GB of data, distributed across many 10.000s of files. The first synchronization attempt with rsync turned out to be a non-starter, as even comparing existing files took upwards of half a second (per file) in my setup. The usual performance optimization options didn’t do anything to help. There are other options for file synchronization, however, for example the venerable unison file synchronizer. I’ve been using unison in various contexts over the years, so this is what I tried next. And it did not disappoint! After a looong initial synchronization run, subsequent daily syncs just take a couple of minutes. With that, we have an option to access the nextcloud file share and found a tool that works reasonably well for performing the actual synchronization.

Backup strategy…

With the basics in place, the next step is deciding on a backup strategy. Ours is very uninspired:

  • I want a daily synchronization from the nextcloud file server to a local copy
  • this daily copy should be mirrored to a weekly copy once every week (duh)
  • this weekly copy gets archived to “long-term storage” tar balls once a month

For the first two activities I will use unison, the third will be handled by good old tar + compression.

… and implementation

The most direct way to implement the desired backup actions would be crond on the NAS host linux system. But that is too simple, right? Also, I am trying to minimize the scope of configuration I’m doing on the host system, to minimize effort involved for future re-creation of these systems. So the plan is to use crond inside a minimal docker container, which mounts the source and backup file system folders of the host system. This way, reinstating the backup setup is as simple as cloning the docker build files from github and running docker-compose up.

The Swearing

So far so simple - minimal docker base images usually means Alpine Linux, and setting this up with unison, unison profiles and simple crond configuration is a matter of minutes. Or so I thought. It turned out that I could do whatever I wanted - the Alpine (busybox) crond implementation didn’t care. It happily accepts my crontab configuration. It starts and runs fine. But it doesn’t move one finger to actually execute what’s defined in the crontab. I tried removing the newfangled /etc/periodic stuff, tried all kinds of simplistic example configurations, even looked through the busybox crond code a little bit, no joy. But looking through that code gave me the prompt to try the /etc/periodic approach, which works as it turned out! What the heck… So I decided to forgo the option to precisely define the starting time of my backup scripts, and instead use /etc/periodic - at that point I had already spend three evenings trying to get busybox crond to work and wanted to move on.

The Result

The entire backup container project can be found on github. The upshot is that this docker-compose file allows building of the container image with one docker-compose build command:

version: '3.7'

services:
  nxtcldbackup:
    container_name: nxtcld-backup
    build:
      context: .
      dockerfile: alpine.dockerfile
    init: true
    environment:
      - SOURCE=/mnt/source
      - BACKUP=/mnt/backup
    volumes:
      - unison-conf:/root/.unison
      - /mnt/nextcloud:/mnt/source
      - /mnt/raid1/nextcloud:/mnt/backup

volumes:
  unison-conf:

There is a reason why I chose to handle the unison configuration directory (/root/.unison) via docker volume: this is the place where unison keeps it’s synchronization data, so it is desirable to keep that directory persistent across container restarts. At the same time, I didn’t want these synchronization status files to pollute my host file system, which is why I decided to go with a docker volume and add the unison profiles at container build time:

FROM alpine:latest

RUN apk add --update rsync unison && rm -rf /var/cache/apk/*

COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

COPY periodic /etc/periodic
RUN chmod -R +x /etc/periodic

# This directory holds unison config profiles, but also the unison sync
# reports/datasets. To have this work properly (profiles should be there,
# but also the sync reports should be persistent across container restarts),
# this directory should be done as a volume-mount when running the container.
# ATTENTION: to make a rebuild pick up changes here, remove this volume manually!
#   docker volume rm nxtcldbackup_unison-conf
RUN mkdir /root/.unison
COPY unison/* /root/.unison/

ENTRYPOINT ["/entrypoint.sh"]

# -f | Foreground
CMD ["crond", "-f", "-d", "8", "-c", "/etc/crontabs"]

As stated in the comment above, some care needs to be taken when trying to update/change the unison profile configuration on a system where a unison-conf docker volume already exists: rebuilding the container will not change the files in that volume, so in that case either remove the existing volume with docker volume rm nxtcldbackup_unison-conf, or copy the profile files into the running container (docker cp foo.txt container_id:/foo.txt).

The unison profiles and crond scripts are very straightforward, and can be found in the github repository.


Waymarks

  • I am 66% certain that busybox crond is currently broken, as it steadfastly ignores crontab configuration other than the /etc/periodic hooks
  • unison is really deserving more attention - it is robust, reliable and fast; the only thing I’m missing is a configuration option to determine storage location for the synchronization database files