Protect Research Data with Real-Time, Automated, Bidirectional Synchronization

date
Jun 29, 2024
slug
realtime-automatic-twoway-backups-of-research-data
status
Published
summary
We used GitHub private repositories for code backup, collaboration, and data synchronization. These repositories share all code crucial for replication within our teams. Moreover, Git enables us to trace previous changes. This incident underscores the importance of real-time synchronization and code backups.
tags
Academic
Engineering
Data Analysis
type
Post
Recently, our lab encountered a significant data loss due to the default setting of a Raid0 SSD disk series. This event led to many professors and students losing their invaluable research data. However, we managed to successfully preserve our data across multiple projects. Here's our approach, using Syncthing, Tailscale, and Git.

Sync via Syncthing

We require a lightweight, fast, and versatile synchronization tool to fulfill our needs. Syncthing, a free and open-source solution, is our top choice.
Syncthing (website) is an open-source file synchronization tool that allows you to securely sync files between multiple devices over a local network or the internet without relying on a central server. It uses peer-to-peer communication and robust encryption to ensure data privacy and integrity. Syncthing is highly customizable, cross-platform, and easy to set up, making it a versatile solution for personal and professional use.
notion image
The installation process is straightforward and smooth. Please refer to the following: https://docs.syncthing.net/intro/getting-started.html.
Call setsid syncthing to start in background.

Network Topology

Basically, is a fully connected network where any nodes can be hosts, clients, relays.
However, sometimes, network securities policies forbids internal - external network exchanges. In this scenario, we use Tailscale to construct a secure internal network to sync.
On some machines powered by docker containers, such as Nvidia H100, and A100 workstations, users can not run service tailscaled as the service is not accessible. In this case, we can start tailscale in two steps:
  1. Use a user-space network tun device: sudo nohup tailscaled --tun=userspace-networking --socks5-server=localhost:1055 --outbound-http-proxy-listen=localhost:1055 >/dev/null 2>&1 &
  1. Start it tailscale up
  1. Now, the server has a bran new internal IP address like 100.95.x.x , which is accessible across all devices.

Git Collaboration

We used GitHub private repositories for code backup, collaboration, and data synchronization. These repositories share all code crucial for replication within our teams. Moreover, Git enables us to trace previous changes.
In conclusion, this incident underscores the importance of real-time synchronization and code backups.

© Rongxin 2021 - 2024