Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
RAID vs rsync. Your preferences, experiences?
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18556

PostPosted: Fri Aug 21, 2020 4:19 pm    Post subject: RAID vs rsync. Your preferences, experiences? Reply with quote

For home use, RAID seems like a layer of complexity that I'm not sure is worth the benefit. Although I do use LVM sometimes for the convenience of more flexible management of volumes. RAID and LVM is even more complexity, and I'm unsure about LVM's built-in RAID abilities. Ultimately it is still two layers of complexity.

For a bulk data repository using HDD (not OS, other than backups), I was planning to do a two disk mirror, possibly extending it to a third disk to have two mirrored copies.

But now I'm leaning toward rsync. The main disadvantage I see with rsync would be the third copy and more reads causing extra wear on the source disk. I haven't used batch mode, so it isn't immediately clear that would address that concern.

Any thoughts or other solutions?
_________________
Your lips move, but I can't hear what you're saying.
Back to top
View user's profile Send private message
alamahant
Guru
Guru


Joined: 23 Mar 2019
Posts: 550

PostPosted: Fri Aug 21, 2020 4:32 pm    Post subject: Reply with quote

Rsync is perfect.
I have an rsync invocation in my daily update script,before the emerge part.
So in case something goes wrong with the state of the my machine post-update I just revert.
I use this formula
Code:

rsync -aAXv --delete --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found","/home/<user>/shared/*","/home/<user>/ssd/*","/boot/efi/*"} / /mnt/

I mount an lvm partition in /mnt
I love it.
:D
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4866
Location: Dallas area

PostPosted: Fri Aug 21, 2020 4:34 pm    Post subject: Reply with quote

My nas/ws (small) has 2 usb3 raid boxes (4 tb dual drive, mirrored) one for backup of all linux machines, portage, distfiles, pkgs, etc, the other is windows backup, media (music and movies), I use rsync for the transfer for the backups. I just retired two raid boxes that were older (drive max of 3tb) so I merged them into one of the 4tb new ones. All 4 of the retired drives (wd red 2tb) have no problems (after several years of daily backups) and I'll re-purpose them for some type storage. Given the life time of drives now (hd and ssd) I wouldn't worry about any problems related to rsync and longevity.
_________________
PRIME x570-pro, 3700x, RX 550 - 5.8 zen kernel
Acer E5-575 (laptop), i3-7100u - i965 - 5.5 zen kernel
---both---
gcc 9.3.0, profile 17.1 (no-pie & modified) amd64-no-multilib, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
Banana
l33t
l33t


Joined: 21 May 2004
Posts: 649
Location: Germany

PostPosted: Fri Aug 21, 2020 5:47 pm    Post subject: Reply with quote

have a look at http://moo.nac.uci.edu/~hjm/parsync/
_________________
My personal space
Back to top
View user's profile Send private message
steve_v
Apprentice
Apprentice


Joined: 20 Jun 2004
Posts: 177
Location: New Zealand

PostPosted: Fri Aug 21, 2020 7:11 pm    Post subject: Re: RAID vs rsync. Your preferences, experiences? Reply with quote

pjp wrote:
The main disadvantage I see with rsync would be the third copy and more reads causing extra wear on the source disk.
The main disadvantage I would see with rsync is that when a disk fails, anything accessing it falls on it's face until redirected to one of the rsync mirrors somehow. With RAID that would be entirely transparent to applications.

pjp wrote:
Any thoughts
Immediate thought: Rsync is for backups and replication, RAID is for keeping things running until you can swap out failed hardware. Maybe I'm misconstruing your intent though.
If what you want is [network]replication and/or snapshots, rather than uptime and redundancy, rsync is certainly the more flexible solution.
If you want all of the above in one solution there's always ZFS, which I hereby shamelessly plug yet again because it's awesome.
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18556

PostPosted: Fri Aug 21, 2020 10:52 pm    Post subject: Reply with quote

alamahant wrote:
Rsync is perfect.
I have an rsync invocation in my daily update script,before the emerge part.
So in case something goes wrong with the state of the my machine post-update I just revert.
I use this formula
Code:

rsync -aAXv --delete --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found","/home/<user>/shared/*","/home/<user>/ssd/*","/boot/efi/*"} / /mnt/

I mount an lvm partition in /mnt
I love it.
:D
Have you tried using lvm snapshosts and restoring from those? I tested it briefly in a VM once. Seemed to work, but I needed to refine the process for repeatability. More specifically I think there was a minor issue with not reverting to the previous kernel.

For the issue specifically relating to this thread, rsync would be use for "whole disk" syncing within a single host. Although eventually I'll be updating from at least one other host to parts of the primary target disk, which would then do the whole disk sync.


Anon-E-moose wrote:
I wouldn't worry about any problems related to rsync and longevity.
Other than human error, I'm not. but /path/dir/ vs /path/dir is really annoying. I'll have to come up with some means to test any updates to prevent blowing away the entire disk. That seems more fragile than RAID.


@Banana:

Interesting, thanks. They seem to slightly discourage copying within the same host, but they do suggest fpsync, which is part of fpart, so that may be useful.


steve_v wrote:
The main disadvantage I would see with rsync is that when a disk fails, anything accessing it falls on it's face until redirected to one of the rsync mirrors somehow. With RAID that would be entirely transparent to applications.
At least initially, I don't think that will be a big issue. Other than the rsync mirroring within the host, and possibly some automated syncs from clients, I don't think that is going to be a big problem. In some ways, that may be better. If monitoring isn't working well enough, the disk failure could be missed for a longer period of time. Or at least I'd like to think I'd notice the lack of response. But that would only be the case if the primary disk failed. Hmm. I'll have to think about that some more as it relates to monitoring. Good reminder.

steve_v wrote:
Immediate thought: Rsync is for backups and replication, RAID is for keeping things running until you can swap out failed hardware. Maybe I'm misconstruing your intent though.
If what you want is [network]replication and/or snapshots, rather than uptime and redundancy, rsync is certainly the more flexible solution.
If you want all of the above in one solution there's always ZFS, which I hereby shamelessly plug yet again because it's awesome.
Others disagree, but I consider RAID to be the first backup. A type of "hot" backup. If you have a single disk and it fails, well, enterprise backup solutions often have holes in them. In that situation, RAID doesn't protect against accidental deletions or any other live activity such as a virus. That's where other backups come into play. As in any situation, "what is the risk you are protecting against"?

In my case, service availability isn't the top priority. If a disk fails, then I need to fix that, or at least have the disk on the way (and why I'm thinking of having the 2 mirror disks).

At least with my initial expectations, I have no plans of snapshots using rsync. I don't think rsync alone is the correct tool for the job, and I don't know that I care to try customizing a tool above it.

Eventually my plan is to test a system with ZFS (I prefer it), but I'm hesitant to deal with the kernel patching. I had originally wanted to do that using the disks I'm talking about in this thread, but various roadblocks keep getting in the way, so I'm just "getting it done" and will figure out what the next iteration looks like. I'm even considering one of the "minis" from iXsystems.


Thanks for the feedback!
_________________
Your lips move, but I can't hear what you're saying.
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4866
Location: Dallas area

PostPosted: Fri Aug 21, 2020 11:18 pm    Post subject: Reply with quote

pjp wrote:
Anon-E-moose wrote:
I wouldn't worry about any problems related to rsync and longevity.
Other than human error, I'm not. but /path/dir/ vs /path/dir is really annoying. I'll have to come up with some means to test any updates to prevent blowing away the entire disk. That seems more fragile than RAID.


Yeah, I agree about the whole [dir|dir/] thing although I suppose its the way it is because you can make it do one of two things depending on the trailing /


So what I do is whenever I'm not sure about what will happen, I use the "-n" dry-run flag, it will show you what rsync would do.
_________________
PRIME x570-pro, 3700x, RX 550 - 5.8 zen kernel
Acer E5-575 (laptop), i3-7100u - i965 - 5.5 zen kernel
---both---
gcc 9.3.0, profile 17.1 (no-pie & modified) amd64-no-multilib, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18556

PostPosted: Sat Aug 22, 2020 1:26 am    Post subject: Reply with quote

That it can is fine, but it reminds me of being able to rm -rf / without intentionally meaning to do so. I believe that's been updated to not allow that. I use -n as well, but its output isn't as blatantly obvious as I'd like. I just need to memorize the difference in some explicit manner, then pause to consider before each run. Also, I've rarely used rsync, so using it as a common tool is still pretty new to me and why I don't consider it a suitable replacement for scp. Unfortunately sftp isn't either.
_________________
Your lips move, but I can't hear what you're saying.
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4866
Location: Dallas area

PostPosted: Sat Aug 22, 2020 9:46 am    Post subject: Reply with quote

I find rsync less confusing (on what it's doing) if I use the -i flag

Code:
$ head root-link.log
.d..t...... ./
*deleting   etc/portage/patches/media-video/makemkv/notify_linux.patch.14.2
*deleting   etc/portage/patches/media-video/makemkv/configure.patch.14.6
*deleting   etc/portage/patches/media-video/mkvtoolnix/qt5-m4.patch.old
*deleting   etc/portage/patches/media-video/mkvtoolnix/qt5-configure.patch.old
*deleting   etc/portage/patches/media-video/mkvtoolnix/qt-disable-dbus.patch
*deleting   etc/portage/patches/media-video/mkvtoolnix/configure.patch.old
.d..t...... bin/
.d..t...... dev/



Quote:
--itemize-changes, -i output a change-summary for all updates



Edit to add: Trailing slash vs none
Trailing slash says copy all files under what 1st arg points to
No trailing slash says copy what 1st arg points to (including 1st arg name)

DirA
File1
File2

rsync -aix DirA/ nas::tmp/test

this would copy File1 and File2 to test but not DirA

rsync -aix DirA nas::tmp/test

this would copy DirA w/Files to test

and using -naix it would clearly show the above.
_________________
PRIME x570-pro, 3700x, RX 550 - 5.8 zen kernel
Acer E5-575 (laptop), i3-7100u - i965 - 5.5 zen kernel
---both---
gcc 9.3.0, profile 17.1 (no-pie & modified) amd64-no-multilib, eudev, openrc, openbox, palemoon


Last edited by Anon-E-moose on Sat Aug 22, 2020 10:51 am; edited 1 time in total
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7447

PostPosted: Sat Aug 22, 2020 9:47 am    Post subject: Reply with quote

i think you mistake raid for backup
mirroring is not made as a backup, but as safety to failure
the raw idea is having disk1,disk2 or disk3 to repair the array to match other disks content, this "might" be seen as backup solution, but it's not a real backup solution.
if you crush file1, file1 will be crushed on all disks, and there's no way you can recover file1, that's a backup task not a raid task
however unlike backup, if you were working on file1 and someone shoot your computer, backup will only help you recovering lastest saved version of file1, while raid will come to help to restore it to the freshest version possible.

or are you saying using raid in the backup source as a safety to backup errors?
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5847
Location: Removed by Neddy

PostPosted: Sat Aug 22, 2020 10:56 am    Post subject: Reply with quote

krinn wrote:
i think you mistake raid for backup
mirroring is not made as a backup, but as safety to failure
the raw idea is having disk1,disk2 or disk3 to repair the array to match other disks content, this "might" be seen as backup solution, but it's not a real backup solution.
if you crush file1, file1 will be crushed on all disks, and there's no way you can recover file1, that's a backup task not a raid task
however unlike backup, if you were working on file1 and someone shoot your computer, backup will only help you recovering lastest saved version of file1, while raid will come to help to restore it to the freshest version possible.

or are you saying using raid in the backup source as a safety to backup errors?

Bingo! I was about to post this. RAID is not a backup. RAID is a layer of data protection between backup cycles (or for speed).

So the question is ... what do you want? redundancy or backup. Rsync is great for backup, especially over ssh to backup a headless for instance. For redundancy... sure RAID or one of hte fancier filesystem (zfs, btrfs)
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
sitquietly
Tux's lil' helper
Tux's lil' helper


Joined: 23 Oct 2010
Posts: 104
Location: On the Wolf River, Tennessee

PostPosted: Sat Aug 22, 2020 5:50 pm    Post subject: Re: RAID vs rsync. Your preferences, experiences? Reply with quote

steve_v wrote:
The main disadvantage I would see with rsync is that when a disk fails, anything accessing it falls on it's face until redirected to one of the rsync mirrors somehow. With RAID that would be entirely transparent to applications ..... Rsync is for backups and replication, RAID is for keeping things running until you can swap out failed hardware ..... there's always ZFS, ... it's awesome.


I had once set up a system as the OP suggested, with the redundant disks being updated via rsync rather than being kept in sync via raid. It was very flexible. But I consider an on-machine backup to be no backup at all and always back everything to other hosts (i.e. backup servers). So given that a network backup must always be kept, everything on my work computers must be backed up elsewhere, the rsync'ed drive didn't take advantage of of its potential.

ZFS was the perfect use for multiple data disks! Now I keep all of my working stuff on a zfs mirror and the backup on another computer via rsync. I've been through several OS changes (FreeBSD -> Debian -> Gentoo/Calculate) and my files have been always available and always safe. I also always keep the OS on its own ssd in a hot swap bay. Operating Systems change all the time but my data lives forever! A workstation may not need raid for the OS -- if that disk dies a complete re-install and restore from backup would take very little time. I've never actually had an ssd die. I've got an Intel X25-E 32 gb SLC ssd from 2009 still in service. It has seen a lot of throughput in the past decade. And a SanDisk Extreme 120 gb ssd from 2012. And Samsung and Crucial ssd's from 2014 to 2019. I try to kill them compiling software for Gentoo, FreeBSD, and OpenBSD but they have kept working.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18556

PostPosted: Sat Aug 22, 2020 10:49 pm    Post subject: Reply with quote

@Anon-E-moose:

Thanks for the mention of itemize. I think I'd seen that before but had forgotten about it.

I think the major risk is with tab completion, which adds the trailing /.

So /data/adir to /newplace/bdir/ would work as expected (bdir/adir), but an accidental adir/ could create a mess, especially if if using delete.

I just need to be careful and specific about what I intend to do. The time I mentioned wiping out a bunch of data was before I realized that behavior. Fortunately nothing significant was lost.


I completed temporary copies to two different drives within the same system. The downside is that it was very slow. The second copy summary is as follows:
Code:
sent 624.67G bytes  received 30.10M bytes  19.64M bytes/sec
total size is 707.03G  speedup is 1.13
Unfortunately I don't think I have any crossover cables, so I'm not looking forward to seeing how slow it goes over the network.
_________________
Your lips move, but I can't hear what you're saying.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18556

PostPosted: Sun Aug 23, 2020 12:00 am    Post subject: Reply with quote

krinn wrote:
i think you mistake raid for backup
mirroring is not made as a backup, but as safety to failure
the raw idea is having disk1,disk2 or disk3 to repair the array to match other disks content, this "might" be seen as backup solution, but it's not a real backup solution.
if you crush file1, file1 will be crushed on all disks, and there's no way you can recover file1, that's a backup task not a raid task
however unlike backup, if you were working on file1 and someone shoot your computer, backup will only help you recovering lastest saved version of file1, while raid will come to help to restore it to the freshest version possible.

or are you saying using raid in the backup source as a safety to backup errors?
Naib wrote:
Bingo! I was about to post this. RAID is not a backup. RAID is a layer of data protection between backup cycles (or for speed).

So the question is ... what do you want? redundancy or backup. Rsync is great for backup, especially over ssh to backup a headless for instance. For redundancy... sure RAID or one of hte fancier filesystem (zfs, btrfs)


As it relates to the thread title, the distinction was between redundant copies across two drives. One using RAID, the other using rsync. From an earlier reply:
pjp wrote:
Others disagree, but I consider RAID to be the first backup. A type of "hot" backup. If you have a single disk and it fails, well, enterprise backup solutions often have holes in them. In that situation, RAID doesn't protect against accidental deletions or any other live activity such as a virus. That's where other backups come into play. As in any situation, "what is the risk you are protecting against"?
To expand on that, some people don't consider a backup to be sufficient if it is: a) in the same machine, b) at the same site or c) within the same region.

For points b) and c), the issue is of risk and cost. That's a value choice. However, while important, offsite redundant copies of (some) data are stale, likely by at least hours if not a day or more.

A backup in the same machine absolutely is a valid backup. The most obvious example would be snapshots. These are very much useful in the case of people accidentally intentionally deleting a file. It is typically recoverable in the fastest possible time. Some people may not consider that a backup, but it demonstrably is. In fact, given the possibility of open files, offsite backups are often incomplete.

I think the RAID is not a backup perspective is one of human timescale (nanosecond demonstration). Also related, but not specifically about backups: "What about the cost of that information? The cost of collecting data and information at the time of an event is very low. But the further you get away from it in time, the more it's costing you to store it and maintain it."

if you lose your only physical copy of those snapshots, well, your offsite copy of data on tape from last night isn't much help. So maybe instead of a backup, that should be called an incomplete offsite copy of data that might not contain what you wanted.
_________________
Your lips move, but I can't hear what you're saying.
Back to top
View user's profile Send private message
Goverp
l33t
l33t


Joined: 07 Mar 2007
Posts: 905

PostPosted: Sun Aug 23, 2020 8:51 am    Post subject: Reply with quote

pjp wrote:
...
Unfortunately I don't think I have any crossover cables, so I'm not looking forward to seeing how slow it goes over the network.

AFAIR you probably don't need crossover cables; if you have modern ethernet cards at either end, they'll sort the cable out themselves.
_________________
Greybeard
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5847
Location: Removed by Neddy

PostPosted: Sun Aug 23, 2020 9:45 am    Post subject: Reply with quote

pjp wrote:
krinn wrote:
i think you mistake raid for backup
mirroring is not made as a backup, but as safety to failure
the raw idea is having disk1,disk2 or disk3 to repair the array to match other disks content, this "might" be seen as backup solution, but it's not a real backup solution.
if you crush file1, file1 will be crushed on all disks, and there's no way you can recover file1, that's a backup task not a raid task
however unlike backup, if you were working on file1 and someone shoot your computer, backup will only help you recovering lastest saved version of file1, while raid will come to help to restore it to the freshest version possible.

or are you saying using raid in the backup source as a safety to backup errors?
Naib wrote:
Bingo! I was about to post this. RAID is not a backup. RAID is a layer of data protection between backup cycles (or for speed).

So the question is ... what do you want? redundancy or backup. Rsync is great for backup, especially over ssh to backup a headless for instance. For redundancy... sure RAID or one of hte fancier filesystem (zfs, btrfs)


As it relates to the thread title, the distinction was between redundant copies across two drives. One using RAID, the other using rsync. From an earlier reply:
pjp wrote:
Others disagree, but I consider RAID to be the first backup. A type of "hot" backup. If you have a single disk and it fails, well, enterprise backup solutions often have holes in them. In that situation, RAID doesn't protect against accidental deletions or any other live activity such as a virus. That's where other backups come into play. As in any situation, "what is the risk you are protecting against"?
To expand on that, some people don't consider a backup to be sufficient if it is: a) in the same machine, b) at the same site or c) within the same region.

For points b) and c), the issue is of risk and cost. That's a value choice. However, while important, offsite redundant copies of (some) data are stale, likely by at least hours if not a day or more.


I totally agree (and I expected you to get it ;) just the usual google-foo and someone comes across something that implied RAID is synonymous with backup :)
Offsite backup is typically for insurance reasons (we have a separate "brick building" requirement for weekly tapes)

I also use a 2nd drive as a local copy/archive drive and yes it is valid. So the real question is do you want the expense (space, "complexity") of RAID or do you want the delay of rsync

1) RAID-1 (mirroring). Simple to setup but it is a 1:1 of the entire drive. do you want that considering an OS can be rebuilt but data can't? It is almost instantaneous and you gain double the read speed
2) RSYNC. Selectively target the data but must be executed and could take some time thus delaying shutdown
3) lsyncd. A daemon that uses inotify to sync a target/s directory from one place to another
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7447

PostPosted: Sun Aug 23, 2020 11:41 am    Post subject: Reply with quote

Naib wrote:
3) lsyncd. A daemon that uses inotify to sync a target/s directory from one place to another

Which appears a good idea, but at end, it doesn't look good
inotify report any modifitcations, which then will sync the delete/broken/bork file with the backup (aie), like mirroring, but you don't get mirroring speed

To me, the real solve is both : mirroring the datas with raid to protect them from local damage, and rsync those datas to another place.
the raid part need nothing, it will just works
the rsync will be slow only the first time, next time, it only copy the changes, which is fast (well, it's relative of course)
and the shutdown delay could be manage easy thru a script, ie: touch a file the script seek to know if he should shutdown after rsync or not.
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1941

PostPosted: Sun Aug 23, 2020 11:52 am    Post subject: Reply with quote

Quote:
AFAIR you probably don't need crossover cables; if you have modern ethernet cards at either end, they'll sort the cable out themselves.
AFAIR modern ethernet devices (like in 1Gbps and faster) use all 4 pairs, in both directions, at the same time.
Crossover cables belong in the past.
(Some 100Mbps devices, notably from Intel, would randomly switch port's mode between straight/crossed until they could make sense of the noise on the wire)


My main box I setup almost a decade ago uses LVM on top of RAID1 as the mains storage, and copies the bits of data I'm particularly interested in to another disk with rsync.
Rsync has a really cool feature: it can reference another directory. In this mode you can copy modified files and hard-link unmodified files, effectively providing an incremental backup (it's just file-level, but it's still an amazing space-saver).
Scripted with a weekly rotation, I can always have 2 independent sets (sometimes called "cylinders") of backups (weekly full + daily incremental), so I wouldn't lose all 1 copies in case of a bad block in a rarely-modified file. (I doing a weekly full I always have at least 2 copies of those)
And I'm quite happy with this setup.

Also, I can remove any of the backups at any time without breaking all the other backups - because the hardlinked files remain accessible via the other paths, so the whole thing is very easy to manage. Just delete what you don't need anymore, and the space will be reclaimed once nothing references it anymore.
And since rsync can transparently run over network, it's very easy to deploy something like that across multiple machines... Including a dedicated backup server.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5847
Location: Removed by Neddy

PostPosted: Sun Aug 23, 2020 1:01 pm    Post subject: Reply with quote

krinn wrote:
Naib wrote:
3) lsyncd. A daemon that uses inotify to sync a target/s directory from one place to another

Which appears a good idea, but at end, it doesn't look good
inotify report any modifitcations, which then will sync the delete/broken/bork file with the backup (aie), like mirroring, but you don't get mirroring speed

To me, the real solve is both : mirroring the datas with raid to protect them from local damage, and rsync those datas to another place.
the raid part need nothing, it will just works
the rsync will be slow only the first time, next time, it only copy the changes, which is fast (well, it's relative of course)
and the shutdown delay could be manage easy thru a script, ie: touch a file the script seek to know if he should shutdown after rsync or not.


true,
The rsync could also be a 6h cronjob so it is little and often with a shutdown local service for a final sync
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
steve_v
Apprentice
Apprentice


Joined: 20 Jun 2004
Posts: 177
Location: New Zealand

PostPosted: Sun Aug 23, 2020 2:35 pm    Post subject: Re: RAID vs rsync. Your preferences, experiences? Reply with quote

sitquietly wrote:
ZFS was the perfect use for multiple data disks! Now I keep all of my working stuff on a zfs mirror and the backup on another computer via rsync.

Not ZFS send/receive? :P

TBF, I too use rsync extensively, though in the other direction - rsync backups from several machines to a ZFS (RAIDZ6) fileserver, which snapshots and sends the filesystem (off-site) nightly.
Important live data on the fileserver gets snapshotted every 15 minutes locally for those "whoops" moments, and snapshots are presented via "previous versions" to windoze boxen via samba.

Yes, those frequent snaps waste considerable space. But they also save considerable bacon. :D
If the building burns down, a day-old backup is just fine.
_________________
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.
Back to top
View user's profile Send private message
sitquietly
Tux's lil' helper
Tux's lil' helper


Joined: 23 Oct 2010
Posts: 104
Location: On the Wolf River, Tennessee

PostPosted: Sun Aug 23, 2020 8:49 pm    Post subject: Re: RAID vs rsync. Your preferences, experiences? Reply with quote

steve_v wrote:
sitquietly wrote:
ZFS was the perfect use for multiple data disks! Now I keep all of my working stuff on a zfs mirror and the backup on another computer via rsync.

Not ZFS send/receive? ..... I too use rsync extensively, though in the other direction - rsync backups from several machines to a ZFS (RAIDZ6) fileserver...


You do it the right way. The backup server here needs to be upgraded to a zfs mirror ... soon. :)
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18556

PostPosted: Sun Aug 23, 2020 10:38 pm    Post subject: Reply with quote

Goverp wrote:
pjp wrote:
...
Unfortunately I don't think I have any crossover cables, so I'm not looking forward to seeing how slow it goes over the network.

AFAIR you probably don't need crossover cables; if you have modern ethernet cards at either end, they'll sort the cable out themselves.
I had thought that, but couldn't recall the specifics and if it included consumer devices. I also could recall if there were any power / damage concerns. I don't think there is, but the most recent occasion when I was wrong, I lost the original of the 3 4TB drives related to this thread. But, other than PoE, I can't recall any power issues between devices.


Naib wrote:
just the usual google-foo and someone comes across something that implied RAID is synonymous with backup.
No more or no less than someone who doesn't know what they are doing being mistaken about other "backup" solutions that aren't what they expected. To be more explicit, yes, I do consider RAID an actual form of backup. And as with every other solution I have encountered, they all have areas they do not protect against. Don't forget the duct tape and bailing wire to assemble a solution for your needs.


Naib wrote:
So the real question is do you want the expense (space, "complexity") of RAID or do you want the delay of rsync
Not having considered the potential read performance you mentioned from RAID mirroring, I had mostly chosen the rsync option short of someone mentioning an "oh, yeah, I should go with RAID mirroring as opposed to rsync mirroring." The $ cost is not the factor as I had already decided to dedicate 1 or 2 disks to mirroring.

Cost of complexity was the deciding factor (barring the "oh, yeah" moment). Dealing with RAID failures in an enterprise environment has its own concerns, but I don't have that hardware environment at home, so I'm much less comfortable relying on the "consumer" equivalent.

So at least until I get a ZFS server going, I'll be relying on the rsync equivalent of RAID1. I may use the 3rd drive for rsync snapshots, though I'm not sure if you can package only the difference between points A and B onto C. Worse case ought to be obtaining the difference from an rsync dry run.

Now I need to go NIC shopping.


Naib wrote:
The rsync could also be a 6h cronjob so it is little and often with a shutdown local service for a final sync
This gets into the finer points of implementation. Overnight makes the most sense for the daily sync. But I think I'm going to aim for some degree of hourly and maybe 10 - 15 minute interval, somewhat like an applications autosave feature. I'll probably start with the hourly/nMinute snapshots from a workstation onto a dedicated disk within the workstation. Then have the nightly transfer to the system hosting the disks in question which are the subject of this thread. That's the starting point goal anyway. I suspect it will need to be tweaked.
_________________
Your lips move, but I can't hear what you're saying.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18556

PostPosted: Sun Aug 23, 2020 10:45 pm    Post subject: Re: RAID vs rsync. Your preferences, experiences? Reply with quote

sitquietly wrote:
steve_v wrote:
sitquietly wrote:
ZFS was the perfect use for multiple data disks! Now I keep all of my working stuff on a zfs mirror and the backup on another computer via rsync.

Not ZFS send/receive? ..... I too use rsync extensively, though in the other direction - rsync backups from several machines to a ZFS (RAIDZ6) fileserver...


You do it the right way. The backup server here needs to be upgraded to a zfs mirror ... soon. :)
Well, yeah. Maybe.

Once I get it stabilized, then I need to get back to making generic binaries for everything (laptop, workstation, backup "server" ). I'm only doing that for my laptop now, and it is only at an 80% solution with the remaining 20% resulting in me putting off upgrades.
_________________
Your lips move, but I can't hear what you're saying.
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 7447

PostPosted: Mon Aug 24, 2020 4:38 am    Post subject: Reply with quote

pjp wrote:
aim for some degree of hourly and maybe 10 - 15 minute interval, somewhat like an applications autosave feature.

Keep in mind, the most important won't be how much you will be able to save the current state of a file, but how much you will be able to recover it!
If your backup is too short, and a file is damage, the time you will have to SEE the file is damage and recover it will only be the gap between your backup time!
For incremental backup (which cost lot of space) that would work, for mirroring, it mean you will only have those 10-15 minutes to see the file has been damage... pass the delay, the file will be sync and you are dead
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18556

PostPosted: Mon Aug 24, 2020 5:39 am    Post subject: Reply with quote

That's true of every backup solution, and I've yet to see any solution verify that a file isn't damaged. If it is damaged at the source and backed up damaged, well it isn't going to get fixed. I've seen that happen when users request a file, and when we finally find an undamaged version, it is older than the version they had hoped to retrieve.

My goal is to reasonably minimize the gap between daily backups. Some days I might not change much, other days I'd prefer to not lose some of the work. Anything will be an improvement, since it isn't currently being done.

The only way I'll do 15 minute snapshots is if I can skip performing them if there are no changes. And I don't know if that is possible with rsync. If I change a file early in the day and it is caught by an "rsync snapshot", I'm thinking that same change will be caught every time a snapshot is attempted after that.

As for space, those would all be temporary., and I'd probably skip certain files or directories. When the daily snapshot is completed, then the 15min/hourly or whatever increment snapshots would be cleaned up, depending on available space.

As far as progress goes, I have the two local copies of the data as previously mentioned, and have tested ~10% to the backup host. I'm about to kick off the full sync to that host and hopefully it will be done in the morning.
_________________
Your lips move, but I can't hear what you're saying.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum