Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Online Backups

Status
Not open for further replies.

mofusjtf

IS-IT--Management
Apr 20, 2004
471
US
Does anyone have good recommendations for online backup solutions for businesses? I've always been of the opinion that online backups are not cost effective for businesses due to bandwidth and storage costs. Also, how do you backup Exchange or SQL or other database(s) to an online backup when these type of files cannot be incrementally backed up? Backing up a 30 GB information store would take days through an online backup would take days.
 
Hello. I wrote the following and dumped it on a page somewhere. Saw your question and thought it might help you out. I don't know what you want to online backup TO, but the rsync idea would probably work for you as well (although in the opposite direction than I describe below). The 30GB would have to go through once, but you would then only be sending updates. Keep in mind though that a setup like this is doable on any platform, even if you are backing up a windows server.

**

I was recently faced with a situation where several iMac-using architects were not getting their Desktop and Documents folders backed up, as were their Windows XP counterparts in this domain I administer, which has roaming profiles and redirected folders onto the server, which is of course backed up.

It looked like a bit of a catch-22 at first - you can't back up a computer once it's off and the Macs turned off at night or at least went into deep-sleep where you have to click the mouse or type on the keyboard to wake them up. And you can't back up an architect's computer during the day - they are ALWAYS there... in my opinion it was also way uncool to tweak the power settings so that these things would stay bright awake all night - energy costs and LCD lifetimes are just two of the problems with that practice.

So if you can't back it up, just tell the architect to save everything on the server and voila' right? Wrong. With their multi-GB 3D rendering files, they would rather shoot themselves in the foot than work on it over a network link, which will always be slower than a local hard disk.

The ideal solution was an automated backup regime that is open source (for many reasons besides the cost, the main one being standards-compatibility), runs when noone is in the office, cycles out old backups so it never fills up and crashes the server (or stops backing up new stuff, shiver), writes informative log files that can be reviewed, is secure and, while we are writing the wishlist, executes fairly quickly and impacts network traffic and server time as little as possible.

Solution? I discovered that the integrated network cards in iMacs are set to answer a network "magic packet" - a broadcast packet to the entire physical network containing a code and that network card's MAC address several times - and wake up the mac on command from the network. So I got a PHP script working that would wake up a mac when it was time to back it up.

Are you still reading this? Trust me, it's boring and not worth it.. I mostly posted it as a reminder later...

Now it was time to get into the mac remotely. An old-fashioned Linuxarian such as I of course never strays too far from SSH and it came in handy even now. By dumping SSH keys in root's ~/.ssh folder on each of the macs, root on the server could now log in to any of the macs without any human intervention - meaning an admin did not have to sit there typing up a password and none would be stored on a hard disk somewhere (aghhh!) - ssh doesn't even let you do that, anyways.

Now I could have used SCP to copy the files over, or mounted each of the macs using FUSE's sshfs, true, but I like rsync a lot and it has several advantages in this situation. Rsync will only transfer what files have changed, and it can also be configured to delete files on the destination that are no longer present at the source (the destination in this case is my server, where I'm backing up to). It will also keep partially transferred files in case it gets disconnected, and complete the transfer later, and better yet with big files will only transfer the CHANGED PORTIONS of the files. What's more to love? just the fact that it works seamlessly over SSH :)

As far as storing the backup on the server, I made a backup directory for the mac backups and made directories for each of the users. Now for the fun part.

My server (Fedora Core Linux) uses the ext 3 filesystem, which has this cool feature of being able to utilize multiple hard links. A hard link is a pointer that points to a file or folder's physical location. This is true on any filesystem. When you ask any computer to "open a folder" and it shows you a list of what's inside the folder, you are actually looking at a bunch of hardlinks that each have stored the location of the file on the physical disk itself. Imagine trying to list a folder by having the computer actually look all over the hard disk and find each file before listing it - it would be like opening all of them just to get a list of what they are. Thus, hardlinks.

The cool thing is that ext 3 (linux filesystem, or one of them at least) supports having MULTIPLE hard links point to the same physical file stored on disk. So let's say you have an accounting directory on the accountant's computer and you back it up to the server, you call it accounting. It's a very large folder, with many Gs of data - you probably could not fit even two backups of it side by side. But you still want a daily backup of the whole thing.

So the next day, before backing up the accounting directory to the server again, you HARD-LINK accounting and all it's sub-directories to accounting.old. That just cost you a couple of megs of disk space, at most. Now you synchronize the accountant's version of accounting with the accounting on your server. The accountant deleted several files, and so they are deleted on the server's version of accounting as well. What happens to accounting.old? Nothing. But several files were deleted, you say... true. But in accouting.old you have hard links still pointing to those files, which means that they were not wiped off the disk, as happens when the last hard link referencing a file is "deleted".

In other words, as you go forward in time, leaving snapshots of your data in the past behind you, very little additional disk data is taken up. Of course, if you change a file, all of the hard links now point to the changed file, so you only really have 24 hours of protection against "bad changes" to files users may make before these changes are replicated to the server and no previous version now exists, anywhere. However, when it comes to DELETED files,this works a charm. Doesn't matter - that file will be available until the last hard link pointing to it is deleted. That could be in a month (it is, on my server). What's more, using an intelligent sync program like GNU rsync, very little bandwidth is taken up every day to upkeep this system - only the changes are replicated. It's a hot system for backing up your files, really.

So, to combine all of these hot technologies into a cohesive whole, I wrote a BASH shell script to do all the work - it runs all the diagnostic programs I want and dumps them into a log file first, in case something goes wrong or I try to track something down (don't you HATE skimpy log files??) then loads all the host names into an array, all the corresponding user names into another one, and then runs a loop to wake each computer up in turn, hard-link the current backup for that user on the server, update it with rsync over ssh (logged in automatically using public/private keypairs on the machines) and then onto the next one. After a typical day of work, it runs in about 10 minutes.

Daily snapshots of 400+GBs of architectural and graphical data going back a month, available as a special read-only share on the server at the click of a button and automatically backing up in very little time each night, all in less than 500GB of server space, you say? No problem. Oh, and did I mention it was all open-source?
 
There are quite a number of online backup solutions out there but always ask the following:

1. Do you know where your data will be stored - if not ask them.
2. Do you know if your data will be sorted in resilient storage?
3. Is a second copy kept (just in case their servers fail and you loose your backups)?
4. Do you need a separate account to backup each machine or can all machines be backed up to the same account?

This is in addition to standard questions about encryption, compression, etc.




Lee Mason
Optimal Projects Ltd
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top