The Joys of Data Loss

Friday, 2008-02-01; 03:16:18

the motivation for a new app

It's been almost two months since my last weblog entry, and I have to say that I do miss writing pretty regularly. There've been a lot of things going on in the Mac universe (*cough*MacBook Air*cough*), and I probably would've written about some of them if it hadn't been for a pesky issue that I've been dealing with for the past month.

Back in mid-December, I left on a cross-country road trip with my friend from Boston. As you might have guessed, I live in the San Francisco Bay Area, he lives in Boston, so you can probably figure out our departure and arrival cities with pretty good accuracy, I imagine. It's been a long, long time since I've had as much fun as I had on that trip; being on the road for seven days, and then spending another seven in Boston for Christmas and getting to know my friend's friends and relatives was a great experience. Just being able to let go of worrying about anything was incredibly cathartic.

I'm also kind of a sucker for driving because I usually don't do it on a day-to-day basis, owing to the fact that I ride my bike everywhere. (It sucks during the rainy season, but you get used to it.)

Of course, coming back from the trip comes as a big smack in the face, and this time it was an especially big one: someone had broken into the house where I was living, and had stolen my iMac G5 (my main computer), and in the process had dropped my external hard drive on the floor.

It's kind of aggravating because this thief was not particularly bright. He took my computer (2 years old), my housemate's iBook G4 (3 years old), and my other housemate's monolithic PC laptop (5 years old). Oh, and my fourth generation iPod (3 years old) and my wireless Might Mouse (about 1 month old). Not exactly the shiniest booty.

I was a little relieved to know that my external hard drive had been left at the scene of the crime, because not two months earlier I had purchased that hard drive and started doing regular Time Machine backups onto the drive. So none of my data (except for some ridiculously huge video files that I really had no real need for) was gone, and that's really all I cared about. I initially connected it to my MacBook and saw that it mounted and I can see all my files in a cursory examination, so all was (relatively, anyway) well.

Of course, that relief was not very long-lived. When I actually tried to go and get something off of the drive, I encountered problems. Not only did it take an abnormally long time to actually mount the drive, it took an abnormally long time to just browse folders. Like, minutes long. I would click on a folder, and the Finder would spin its little progress indicator for a few minutes before it would ever come back and give me a listing of files. I could see all my Time Machine backups and everything, I just couldn't access them.

And then things started to get worse: just connecting the drive to a computer would start locking it up for a long time, and so I resorted to just connecting it to my old iMac G4 and leaving it for a while to see what would happen. Again, the drive finally mounted, and I was able to browse the drive, but at extremely slow speeds. I would literally click on a folder, and then go to sleep or do something else for an hour or so, and go back to the iMac and see that it would still be spinning the progress indicator trying to get the list of files. Not good.

So began my long quest to try and get data off the drive. I tried DiskWarrior, which I let run overnight, and which just came back with a bunch of "Scan failed due to disk malfunction" counts up into the thousands (which basically means it attempted to read data off the drive and failed). And I tried TechTool Pro which couldn't even see the drive, and I also tried Data Rescue II which just stalled when scanning for drives.

anwnn from Twitter suggested that I just try scraping the data off the drive using dd, which doesn't care if the drive is mounted in the Finder or not, as long as it appears in /dev . (Hard drives and their partitions appear in /dev -- for example, if you use System Profiler to look at your drives, the drives themselves usually get put at /dev/disk1 , whereas the partitions on that same drive get put at /dev/disk1s1 , /dev/disk1s2 , and so on.) anwnn suggested dd_rescue, which continues to try and copy data off the drive even if it encounters errrors. (Turns out that ddrhelp is the successor to dd_rescue, and ddrescue is the successor to ddrhelp, so if you ever do need to scrape data off a damaged hard drive, ddrescue is the one to use.) ddrescue can be installed via MacPorts, so I did so on my iMac G4, connected my drive, and started a transfer. Since my iMac G4 wasn't doing anything else, I just let it run through the night.

After about a day of ddrescue, I had managed to retrieve about 20 MiB of data off the drive. Now, for those with a calculator, I was getting data off the drive at about 300 bytes/sec. Remember, the damaged external drive had a capacity of 750 GB, so if you do the calculation, that means that all the data would be retrieved from the drive in, oh, only 80 years or so.

When I realized that was futile, I managed to get in touch with the engineer behind Data Rescue II, Michael Heins, who was very accommodating to my plight and wanted to help out. Through a bit of back-and-forth e-mailing and IM-ing, he managed to send me a build that didn't stall when scanning drives, and did manage to see my drive. I did a quick analysis on the drive and this is the result:

If you look at the y-axis on that graph that was produced from Data Rescue II, you'll note that average read times from the drive were between 5000 msec and 20000 msec. That's an average of 5-20 seconds to read one single kilobyte from the disk. Apparently, my drive had suffered some kind of weird plight where the data seemed intact (because Data Rescue only reported 2 errors out of 750 data points in the log) but only communicated the data to the computer at a really, excruciatingly slow speed.

Clearly, no software was going to be able to get the data off that drive in any reasonable amount of time.

At this point I thought, maybe it's the connection between the hard drive and the case that's the problem. Since the graph indicated horrible read times across the whole disk, it would be a virtual impossibility for the whole disk to be damaged to that extent with just one drop of the disk. (I suppose it's possible that the thief dropped the drive, and then picked it up and shook it a bunch and then left it on the floor.)

The way that LaCie knows if you've opened up the case or not is by attaching a little seal over one of the screws that needs to be unscrewed in order to get the hard drive from the case. When you attempt to take it off, a silver backing irreparably breaks apart at just the slightest tug.
Since MacWorld was in town, I went up a second day (I had previously gone up to see the MacBook Air in person, which is pretty cool) to look for an extra case. I talked to a guy at OWC who was all ready to sell me a $90 case, but then after hearing exactly what I wanted to do, pointed me to a $20 Newer Technology hard drive adaptor cable. That's exactly what I needed: I opened up my hard drive (thereby voiding the LaCie warranty, which was probably void anyway after the drop on the ground), and then directly connected the hard drive to a computer via a USB cable. *sigh* Still no luck, I still got the slow transfer times.

Time to send off to a data recovery service. I knew about DriveSavers through word of mouth. I knew about Ontrack through an Apple technote, since I subscribe to the RSS feed that indicates any changes in the support database. And I knew about TechRestore since I just happened to pass by their booth at MacWorld while trying to find the DriveSavers booth.

DriveSavers is supposedly a best-of-breed recovery service, but the problem is that their prices are off the charts! You give them your drive details, and then they come back with a range of prices that the recovery can potentially cost. For me, it was from $770-3339 (and this was the discounted educational service). Make note that the single most important variable of where the actual price falls is the "success" of the recovery. You tell DriveSavers what you want off the drive, and if it's a high recovery rate, you'll get charged towards the high end of that range. If you get only a little bit recovered, they charge you towards the bottom end of the scale. The difficulty of the recovery actually has little to do with how much you're charged. It does impact it a little bit, but not as much as "success". The rep at the MacWorld booth was pretty straightforward that you should ask DriveSavers to save as much data as possible, because that increases your odds that the success of the recovery is diminished, thereby diminishing the cost to you. If nothing is recoverable, you don't pay anything.

The problem with DriveSavers is that after they quote you that wide range of prices, you only have the option of sending the drive in and accepting the price that they give you after recovering the data. You never get a more precise estimate. My data is worth a lot, but not $3000. So I didn't want to send in my drive and pay that much.

TechRestore is different. They're not a top-tier data recovery service (has anybody else heard of them), and their prices are similarly much more reasonable. They top out at $1000, and just like DriveSavers, they don't charge you anything if they can't recover anything. You do have to pay $20 (shipping not included) or $50 to do an initial evaluation, but that seems reasonable. I can probably live with $1000 if it means getting my data back. But I was a little nervous about the unknown reputation; the only thing I knew about them was that they were at MacWorld and that their website says that they're really, really good!!!!!oneoneone111eleven

I'd never heard of Ontrack before either, until I saw that Apple technote, but the good thing about Ontrack is that they do a free evaluation of your drive. You tell them the details of your drive via the phone or e-mail, and they give you a similar wide range of prices as DriveSavers does. It's a little cheaper, but still quite steep: my quote was $500-2700. And they, too, don't charge you anything if they can't recover anything.

You then send in your drive (and they pay for shipping if you can find a UPS Store near you) and they analyze your drive and give you a firm estimate. Then you can decide whether to proceed with the recovery or not at that price, and if so they do, and if not they send you your damaged drive back.

At least, in theory. I never got to that part. I sent them my drive, and I got a confirmation that they received it and was able to see that they almost immediately took it into a clean room. (You can check the status of your drive on the web through Ontrack.) And then they didn't say anything for about a week. I e-mailed the Ontrack representative back, and I was surprised to hear that they were having similar troubles with the drive: they were seeing extremely slow transfer speeds. She said that they had managed to successfully retrieve about 1% of the data off the drive, but that getting the rest of it was going to take weeks to months. I could live with that -- it was an improvement on decades, at least. Surprisingly, my rep said that the cost would probably fall in the middle of the original quoted range, despite the difficulty of the recovery. Also surprisingly, she said that they would give me a firm price later on, because the evaluation wasn't done -- apparently they get all your data off during the evaluation, and then they give you the price. That's kind of cool on one hand; on the other hand, since my recovery was going to take weeks, I don't want to waste my time by leaving my drive with them only to not "proceed" with the recovery later on because it was too pricey.

Unfortunately, the Ontrack representative got back to me yesterday afternoon and said that they were declaring the drive unrecoverable. My rep said that initially it was just a matter of slow transfer speed, but then they encountered areas of scratches (at least I think that's what she said; Skype kind of barfed when she said that) and areas with no "structure". (I'm still not exactly sure what that means.) They're sending my drive back to me.

Due to my own attempts at recovery, this was kind of a shock yesterday, because I was running under the assumption that most of my data was still intact on the platters, and that it was just a matter of getting it off the drive.

I also find it a little bit fishy that they couldn't get any data off my drive when I clearly was able to see the files myself on my own computer. I also wasn't impressed at all with the ~5 days of complete silence after they received my drive; the free shipping at the UPS Store was nice, and the ability to check the status via the web was nice, but it's a very granular status. That is, after it got into the clean room, there were no status updates until they got all the data off and started evaluating the data structures. So the website didn't help at all during those five days either.

Although it seems like there's little hope for my data, I'm going to send it off to TechRestore to see what they say about my drive. This'll be an interesting test to see if one data recovery service is better or not.

One big caveat I should note: none of these data recovery services seem to have a clue about Time Machine. I was particularly concerned about this because I wanted my backups intact when I received them so that I could just use Time Machine to restore me back to my pre-Christmas "present". But since Leopard introduces hard links to directories, which no other file system supports, I wasn't sure how these data recovery services would handle this. Talking to them didn't assuage my fears one bit: they seem to use their own software on your drive once they get all the data off, and I'm not convinced that they leave your data in the same folder structure as it was originally on your drive.

I asked both DriveSavers and Ontrack if they simply do cloning: I can use DiskWarrior and Data Rescue II myself once I get back a clone of the drive that transfers data at acceptable speeds. This would ensure that Time Machine backups are unchanged, assuming that none of the data in those sectors was damaged. DriveSavers doesn't do cloning; Ontrack does, but my rep said that it wouldn't affect the final price of the recovery job. I haven't specifically asked TechRestore about this, but I will when I send my drive off to them.

So what did I lose that was so important? Well, nothing extremely important, but a lot of files that I'd like to have, like saved documents from some of my classes throughout my school career, a pretty extensive collection of photos from various sources (some of which were from my very recent Yosemite trip which I was supposed to send to my friend), e-mail and chat archives, etc. Stuff like that. My music is all saved on my iPod; I lost some archived TV shows from iTunes, but those are disposable. All in all, I've been living without my data pretty well for the past month.

The one thing that I really was worried about was my source code to all my apps. Originally, I thought I had saved a backup of it all to the MacBook that I routinely use, but then I couldn't find any trace of it, so I started to panic a little bit. But I eventually found a four month old backup of the TuneTagger source on my iMac G4 (which was there because I was testing Panther compatibility), and then I found a three month old backup of all source code on my MacBook once I realized that I was looking the wrong home folder (I have both Leopard and Tiger installed on my MacBook). I think the only changes that I've really lost are the changes in TuneTagger that migrated it over to the MusicBrainz database from the HTML-scraping that I was doing of the Gracenote website. That'll have to be redone, but I'm not too concerned about that, especially now that I created a "Pretty Up XML" service which will really help in getting the XML responses from the MusicBrainz website cleaned up.

I would've had a more recent backup of my source code and possibly my photos if I hadn't had to restore my iPod twice due to it causing problems with iTunes for some reason shortly before I left for Boston. Maybe I should just do a cursory check for deleted files that haven't been overwritten on my iPod, just in case.

I guess here I should stress the importance of backups. Offsite backups. Time Machine has solved the problem of backups pretty well, I think, but as you can tell, that's obviously not enough. It's aggravating enough to lose your data due to failure to do simple backups, but it's even more aggravating when you were doing backups and it doesn't help. At the very least, backup your important stuff to your iDisk or iPod or something.

I guess the only other thing that's been a casualty of my stolen iMac and damaged external hard drive has been my weblogging. It's (obviously) ground to a halt, not because I don't have things to say, but because all the data that allowed me to produce the pages were on my iMac and my external drive and I was hoping to get the data back so I could pick up where I left off. Not that I don't have backups of all my writing (as you can still browse the archives), but I don't have backups of the raw data files that iBlog uses to generate the HTML files for my weblog.

So, in that vein, I've decided to ditch iBlog, because it just seems to be the right time. But not for an online weblog service or for iWeb. I'm taking Steven Frank's lead, and rolling my own weblogging solution to fit my custom needs.

You're seeing the first entry with my new weblogging application. I created it in a single day (yay Cocoa!), today.

It's obviously still very rough, as you can see that many of the support pages that iBlog used to create are not yet created by my own weblog application, currently called TidyWeblogger. It doesn't even update the main weblog landing page or the RSS feed automatically. But it will. I really like the flexibility of iBlog, and I really like that it produces static webpages so that you don't need to rely on any server-side software. And I want to preserve that aspect of my weblog while improving it in other ways.

Why ditch iBlog? It's been languishing, and some key "features" of a weblog sorely need to be updated. First of all, those crufty, unreadable URLs need to go. And this entry strikes the first blow, as you can see. iBlog 2 does away with them as well, but crucially, not in a manner that is backwards compatible with previous pages, which means the commenting system on my weblog will break if I moved to iBlog 2. Unacceptable. Second, iBlog is so JavaScript-heavy, unnecessarily so, and iBlog 2 is using even more JavaScript to the point where it causes problems with certain versions of Safari. I'm not against JavaScript, I just want to pare down the places that it's not needed, and despite iBlog being flexible, it's not flexible enough. I only managed to get the self-updating list of entries in Linkable Supernova through a massive JavaScript hack. Lastly, I can only update using iBlog from one location, unless I do some manual carting of data files between computers. This is stupid, especially since I have so much space on my iDisk and because the files that generate this website are ridiculously small.

Rolling my own app will allow me to overcome all these obstacles. I'm going to yank out as much JavaScript as possible. I'm going to integrate my various weblogs more. I'm going to migrate my entries over to new, readable URLs, preserving the old URLs as well as preserving the old comments. I'm going to create my app so that it stores the weblog data on my iDisk (with a local backup, of course) so that I can create entries from any Mac to which I have access. And this will, in the process, make my weblog more usable, more friendly, more easily updated, and faster (especially on iPhones and iPods touch).

So there'll be a bit of brokenness around here in the next few days (possibly weeks), but all the old entries will continue to work as they have, and I'll finally be publishing some new content in the meantime. Hopefully I can get back to some commentary on the Mac community.

I'd like to end simply by profusely thanking a few people who've been a great help in this ordeal:

  • James G., or anwnn, very helpfully recommend ddrescue and its brethren, and even though it didn't help in the end, I was grateful for the recommendation, anyway, for the future. I like to think the G stands for Gosling, father of the Java programming language, and that he thinks that I'm someone important to follow on Twitter. Well, I think he's as important as Gosling, if they're not one and the same.
  • Drew Thaler got me in touch with the engineer behind Data Rescue II. And that was very helpful, too, even if Data Rescue didn't help out in the end, either. On the extremely plus side, I got to meet him: I offered him a beer for his help, and since he was in the area the week after MacWorld, we met up and we conversed over fish and chips and Stella Artois. Who knew that he was a MathCounts geek when he was younger, too? And our beer cascaded into a dinner and then into a trivia night at the local pub. Fun times were had.
  • Michael Heins, the main engineer behind Prosoft's Data Rescue II, was also a big help. He created custom builds of Data Rescue just for me to try out, and he got back to me very quickly and was very apologetic that Data Rescue didn't help. He got me to hope for data recovery through Data Rescue's graph of my drive. He even gave me a temporary serial number to use for Data Rescue even though he had very little idea of who I was and little reason to trust me. That was very generous, and I thank him for that. To top it off, he was well-versed in geology despite being a software engineer, something which was surprising, but in a good way. If you're reading this, Michael, my offer of free beer still stands.
  • The guy at Prosoft's booth at MacWorld who also said that Michael was an awesome guy and also made a helpful suggestion to check the power supply of my drive (didn't help, of course), and the guy at the OWC booth who helpfully pointed me to the Newer Technology cables instead of the more expensive hard drive case that wouldn't have helped both deserve a little bit of thanks, too, even though they probably won't ever see this.

Technological Supernova   Rants   Older   Newer   Post a Comment