This is a public service article encouraging all of us to back up our data (which more and more is our lives). I sketch some methods and resources for doing this.
As more of our life becomes digital (work, finances, passwords, pictures, contacts,dairies,videos and email) we must be more diligent in backing up our data. If your hard drive fails at work you might lose some spreadsheets (and you might not lose anything if your IT department is on their toes) if you computer fails at home you lose your wedding album. Your hard disk will fail and try to take all of your data (life) with it- it is a matter of when not a matter of if. You want this to be an inconvenience, not a disaster. Become expert at backing up and take the time to help others.
First some definitions. Everything stored on your computer is called “data” and it is most commonly stored on a single “hard drive.” The act of making an extra copy of your data is traditionally called “backing up.” The act of trying to get access to your extra copy of your data is traditionally called “restoring.” The whole point of backing up is to be able to restore. If you can’t get your data back it really doesn’t matter what steps you took. Backing up with no ability to restore is just cargo cult behavior.
If you have a professional service available they will likely do a better job than you can (this is one reason that larger businesses have professional IT staffs). However, at home you are likely on your own.
This is an opinion piece and I am advocating backing up everything (whole drives) locally. If you do not back up everything you will need to choose what to back up and what to skip- and you will make mistakes and lose things. If you do not have a local back up, you might not be able to restore (back up service goes out of business, internet connections are still too slow to be practical). At the very least you should have a local back up; a remote back up is a good second step. Remote back up services are a good idea for important data and there are some high quality ones, but a few have gone out of business (Xdrive) so do not want one to be your only chance of salvation.
Let us first address a technical issue- what sort of set-up are you backing up? The three most common situations are: Windows, OSX and other Unix (Linux/BSD, yes I know OSX is a Unix). Each of these have different appropriate tools:
- Windows:
For Windows Home type operating systems you are unlikely to have access to Microsoft’s back up tools (which is a real shame, the tools are more useful at home than at a business). So you need to install something.
I have not researched the Windows world extensively, so I can not give advice. I can, however relate my experiences and current policies.
I now avoid EMC Retrospect (often comes free with USB drives) at all costs. My experience has been that EMC Retrospect is hard to use to restore your data (the whole point of backing up). For me it often refused to run (due to licensing issues) and it was very sensitive to the exact version of the Microsoft.Net framework that was installed on my Windows system. Two separate times an update in the Microsoft.Net system rendered EMC Retrospect unusable (and broke nothing else).
I have happily purchased Acronis True Image three times now (twice for myself and once for a friend). Their website is a bit confusing (you must be careful to get the retail product, not the many thousands of dollar enterprise product). The software seems to be very good. It can back up, restore and can even read data from an “image” (which means you can get to your data with out even restoring).
- OSX:
An embarrassment of riches:
The free options include following Jamie Zawinski’s wonderful advice (which I am shamelessly stealing from here) , using the free copy of SuperDuper! (which is very good and a complete back up solution even in the free version) or Time Machine (the back up utility included in the current Mac operating system: Leopard).
One huge advantage of modern Macs is if you have formatted your drive correctly you can boot off a USB drive. So if you use the above instructions you can plug your back in and use it to run (delaying your need to open up your machine or attempt a restore until later). This is also important in rehearing your restore procedures.
Finally, if you have the cash there is the somewhat over-priced (but wonderful) Time Capsule. You can live without Time Capsule, but it is part of my “dream set up” (described below).
- Unix:
Follow any sort of advice on how to script back ups (such as Jamie Zawinski’s) and you should be protected. Rsync is a great tool.
More important than the back up tools is having a precise back up goal and a matching back up plan. I use my own goal and plan as an example and you can use it as a basis for safer or more risky plans (depending on your resources and needs).
My goal is to: (with very high probability) not lose more than a week of my life. The plan to achieve this is a full local back up every week and the willingness to buy some new equipment if I have to do a restore. A failure could delay my work for a day or so, but not put me out of business. For my business it does not make sense to ensure “no down time”- this is an unreasonably expensive thing to try to achieve (and the inappropriateness of this goal is one reason many people have no back ups at all). My worst case “restore” plan is to drive to a store and buy the cheapest temporary computer. A more likely case is I just need to use one of my extra drives to do my restore (very cheap). I would then restore the back up onto a fresh drive (or the temporary computer) and work from there until I could repair or replace my major system.
My back up plan has several “eyes open” weaknesses. I only back up every week, so I could lose a week’s of data if my disk dies right before a back up. Also, to restore my data could take a day and $500 (trip to store to buy a temporary computer and hours to restore drive contents). Knowing these weaknesses are the point of the back up plan: I am trading hoping that my drive doesn’t blow up and take all of my data away for hoping my drive doesn’t blow up and cost me a day of work and few hundred dollars. That is I am trading the Sword of Damocles for worrying about something like stubbing my toe. Drive failures while inevitable are not frequent. if I put a quarter in a jar every day I don’t have a drive failure I would more than likely have the $500 needed to perform an emergency restore saved up long before I have a drive failure. By not purchasing excess extra equipment (computers) before the failure I save money by maybe not having to purchase it all or at least purchasing cheaper and better equipment at the time of failure (instead of now).
Now to describe my implementation of my plan. First I purchased the following
things:
- Time Capsule (optional):
-
Thermaltake External Hard Drive SATA Dock ($40 : Newegg):
-
Two 1TB drives ($90/each Newegg, these
are the cheaper “internal” drives that go into desktop computers or into the Themaltake dock. If you don’t like the ugly you could skip the Themaltake dock and buy USB drives instead.):
So for a little over $220 I am in business. Every week I could take one of the drives out of its envelope, stick it in the Thermaltake dock and use one of the tools described above to create a complete back up. What I actually do is even better. Any time I want I ask my computer to use Time Machine to back up to the Time Capsule (typically takes about 20 minutes) and then once a week I stick a drive in the Themaltake dock and let the Time Capsule copy itself onto the drive (so both me and my computer are completely uninvolved in the 8 hours this step can take). For offsite back ups (to defend against things like fire) I can take one of the drives to a safe place off site (locker, safe deposit box). I recommend physical protection (locks, fire safes) to protect your drives (not encryption, there is a good chance you will get something wrong with encryption and not be able to restore).
Using Time Machine gives me the benefit of having multiple back ups so I can look at earlier versions of files and the speed of only needing to perform incremental back ups (only what has changed needs to be copied). Another way to get the advantage of having extra versions of all of your files is to put most of your files under management of a “source control system” like Bazaar. Systems like this (free, runs on Windows, OSX and Unix) let you keep all versions of all of your files (answers things like “what did I have in the file before I deleted it last week?”) and are incredibly useful (you will wonder how you lived without them).
Finally I end with some “defensive thinking” required to succeed with back ups. I have not said why I purchased two extra drives. This is so I can rotate which extra drive I back up onto. Drives most often fail when being used- so it is very plausible that my main machine could die while backing up. If the main machine dies while backing up then not only is its data lost but the back up is also useless (as the main machine was interrupted while trying to write it out). This is not quite ironic because while it is contrary to what you would want it is not unexpected. To be safe from a failure during the back up procedure you must have a second drive that is not being used. Only after the first back up is known to have succeeded can you then back up onto the other drive.
You must rehearse and think through all of your back up steps. If you are lucky you will find flaws in your plan during rehearsals instead of when you go to restore. For example tape back up procedures are notorious for writing out years incremental back ups that don’t work during a restore attempt. Use a system that allows safe rehearsals (such as trying to boot from a bootable back up or inspect a file from an Acronis image or Time Machine archive). Plans that only allow restores are not safely rehearsable (if the rehearsal fails you damage something on your primary machine). Also: if you are really trying to restore you are not likely to be in a good mood, iron out potential kinks with rehearsals not during a panic.
No plan is perfect- we can not cheaply eliminate all risk. In this case what we can do is eliminate exposure to likely scenarios. Data loss can still happen, but it does not have to be not inevitable.
Update 7-8-2011: Survived a double failure. MacBook Pro logic board died. Dock burned up trying to duplicate TimeCapsule contents. In the end MacBook Pro was repaired with no cost or data-loss (yey Apple care and Boo defective Nvidia chips) and TimeCapsule was able to back up to another USB drive. So came out of a double failure with two usable copies of the data.
For Linux machines adding a second drive devoted to Rsnapshot right in the case (like TimeCapsule inside the box, still need external backup rotation of course).
Categories: Administrativia Opinion Public Service Article
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
This article was timely, John. I ended up buying that Thermaltake SATA dock to pull the data off two of my old HDDs. If I hadn’t seen this post, I’m not sure I’d have known it existed.
Now to save my pennies for a Time Capsule…
@Kevin
Or just define a work-flow using TimeMachine without TimeCapsule (though for multiple machines I like something like TimeCapsule).
You acknowledge that when your drive fails, you will lose a day of productivity as you restore your drive. Have you considered using a RAID configuration to square the probability of losing the day’s productivity? It’s not a backup solution because you are still susceptible to viruses or fire/flood/theft, but it is much easier to hot-swap a failed drive and have it rebuild while you keep working than it is to have to take a day off to fix your computers.