Thursday, July 17, 2008

Thank you Fedora.

So about 3 weeks ago, I started having some serious issues with my system. The system would start acting very strange and then eventually spontaneously reboot itself. By acting strange I mean that the system would work fine, until it started working in continuous short bursts following short periods of complete inactivity.

I have Corsair RAM in my machine, the kind that has the activity LEDs on it, and these LEDs were also acting strange, again showing the unusual burst of activity pattern followed by pauses of inactivity. I immediately thought this might be memory related so I left Memtest 86+ (available free through the link) running for the night only to discover that my memory was likely just fine.

So after a long week of arduous troubleshooting I narrowed it down to the SATA drive I was running the OS off of. Looking through the drive's S.M.A.R.T info revealed that the drive was failing (SpeedFan gives you a nice breakdown of your drive info). I also have 2 other IDE drives in my machine which up until now I have been primarily using for storage. I backed up one of them on an external drive and installed XP on it hoping my defective drive would last long enough for me to back up my most important data.

The good news was that once XP was installed all of the problems immediately went away (confirming the role of the SATA drive in the problems I have been having), and I also managed to grab some of the most important data off the defective drive for backup. I figured I could breath easy for now and get the rest of the data off later on. Well in retrospect that was a mistake, as the next morning when I woke up, the drive was gone.

Now you would think this would be the end of the story, except that a week later, upon waking up, I discovered that my system was hung and required a reboot. After restarting the system I found out that my dead drive had come back to life. Needless to say I was exhilarated, in the sort of way that I rarely am upon waking up. However, my excitement at the thought of having my data back did not last long, as I had quickly discovered that while the drive was available, it was not accessible and neither was the data in it.

Not intent on giving up, I decided to try my best to get that data off even if it took me all day (the data I'm referring to was mostly composed of pictures - 3.5GB worth of family pictures that while not crucial, certainly had a lot of sentimental value - especially since many of them were not backed up anywhere else). So on I went running CHKDSK and Western Digital diagnostic utilities, wasting many hours with absolutely no luck. I then decided to give Linux a shot. So I downloaded Fedora 9 Live, burned the ISO and booted into the OS. Thanks to some instructions I found on a blog post by Rodney Fletcher, I was able to determine the commands to mount an NTFS partition in Linux and decided to see what would happen. Unfortunately, while I was able to mount my external drive without a hitch, trying to mount the defective drive ended up repeatedly hanging the live OS. About to admit defeat, I booted back into XP only to discover that whatever Fedora did while trying to mount the partition on the defective drive, allowed me access to the partition's content. Overjoyed, I started frantically grabbing the data and copying it to my external drive. Everything seemed to be working well except that a minute or so into the copying process, my computer resumed the strange burst behavior it was exhibiting with the old XP, which also stopped the copying in its tracks.

After rebooting and trying again a number of times without any real success, I noticed that just as the strange behavior started, Windows was outputting an unusual message, notifying me that a delayed write action had failed on the defective drive. I thought about this for a while and I could not figure out why would anything need to be written to the drive during the copying process. I never actually figured that out but I did come up with an idea and a plan to solve the problem. I realized that reading from the drive seemed to be working just fine, until something attempted to write to it, at which point the strange behavior commenced. I realized that if I could make the drive read-only, perhaps the problem can be avoided and I can get my data off the drive. The only problem is that Windows does not support making a partition read-only. So off I went looking for freeware that might do the job.

Unfortunately I could not find such software (an idea for an open source project perhaps?), but I was able to locate a shareware by the name of HDGuard which among other things allows you to designate a partition read-only. I jumped through some hoops and managed to download their 30 day trial version. After a simple install, a bit of configuration and a couple of reboots I was finally able to get my files off that dreaded drive.

So to conclude:

Thank you Fedora, thank you Rodney Fletcher, thank you HDGuard, thank you TeraCopy (a great free program that replaces the dreaded Windows copy utility) and thank you Western Digital for having relatively long warranties and a relatively hassle free RMA service (and sorry for the unusually long rant!).

No comments: