Protect Research Data with Real-Time, Automated, Bidirectional Synchronization
date
Jun 29, 2024
slug
realtime-automatic-twoway-backups-of-research-data
status
Published
summary
We used GitHub private repositories for code backup, collaboration, and data synchronization. These repositories share all code crucial for replication within our teams. Moreover, Git enables us to trace previous changes. This incident underscores the importance of real-time synchronization and code backups.
tags
Academic
Engineering
Data Analysis
type
Post
Recently, our lab encountered a significant data loss due to the default setting of a Raid0 SSD disk series. This event led to many professors and students losing their invaluable research data. However, we managed to successfully preserve our data across multiple projects. Here's our approach, using Syncthing, Tailscale, and Git.
Sync via Syncthing
We require a lightweight, fast, and versatile synchronization tool to fulfill our needs. Syncthing, a free and open-source solution, is our top choice.
Syncthing (website) is an open-source file synchronization tool that allows you to securely sync files between multiple devices over a local network or the internet without relying on a central server. It uses peer-to-peer communication and robust encryption to ensure data privacy and integrity. Syncthing is highly customizable, cross-platform, and easy to set up, making it a versatile solution for personal and professional use.
The installation process is straightforward and smooth. Please refer to the following: https://docs.syncthing.net/intro/getting-started.html.
Call
setsid syncthing
to start in background.Network Topology
Basically, is a fully connected network where any nodes can be hosts, clients, relays.
However, sometimes, network securities policies forbids internal - external network exchanges. In this scenario, we use Tailscale to construct a secure internal network to sync.
On some machines powered by docker containers, such as Nvidia H100, and A100 workstations, users can not run service
tailscaled
as the service
is not accessible. In this case, we can start tailscale in two steps:- Use a user-space network tun device:
sudo nohup tailscaled --tun=userspace-networking --socks5-server=localhost:1055 --outbound-http-proxy-listen=localhost:1055 >/dev/null 2>&1 &
- Start it
tailscale up
- Now, the server has a bran new internal IP address like
100.95.x.x
, which is accessible across all devices.
Git Collaboration
We used GitHub private repositories for code backup, collaboration, and data synchronization. These repositories share all code crucial for replication within our teams. Moreover, Git enables us to trace previous changes.
In conclusion, this incident underscores the importance of real-time synchronization and code backups.