Recovering a Linux/Unix system from a failed hard drive typically involves:

  1. Booting from a live CD on the machine with the new hardware

  2. Partitioning and mkfs'ing the new hard drive, and mounting the partitions

  3. Establishing network connection to the backup server

  4. Copying files from the backup onto freshly mounted partitions

  5. Making the drive bootable

  6. Rebooting, and being happy

There are various ways to do these various steps (eg, using the OS install CD, a System Rescue CD, or a traditional LiveCD like Knoppix); using rsync, ssh, or tar and netcat; etc) but the overall process is pretty similar for every recovery. The step with the most variability is usually making the new drive bootable, as often something about the drive setup has changed between the old system and the new system -- at minimum the file system UUIDs have probably changed, and a lot of modern Linux distributions are mounting by file system UUID supposedly to try to improve resilency in the face of changes. (Alas like identifying network interfaces by MAC address, in many situations these approaches seem to cause more problems than they solve, since they're fragile right at the point where you need all the help you can get -- recovering the system onto new hardware. Sigh. Mounting by file system label is a bit easier to recreate if the user knows to expect it, so that's my preference when doing things by hand.)

Traditionally Linux on Intel/AMD x86 ("PC") hardware was booted using LILO, the LInux LOader. Over the years the Linux kernel (and hardware requirements) have grown to a point where LILO cannot easily keep up, and so modern Linux systems are usually booted using GRUB, the GRand Unified Bootloader. Typically GRUB needs less maintenance than LILO; typically running update-grub to update the GRUB configuration file when there is a new Linux kernel installed is all that is required (by contrast LILO needed its installer re-run any time you changed anything to do with LILO or the things it was supposed to load, because it hard coded details of what was to be loaded into the boot system).

Often the way that the drive is made bootable is to chroot into the freshly copied over install and rerun the boot install program, eg, grub-install. One challenge in doing so is that the Live CD which was booted must have support for the OS that was copied over. In particular if the OS copied over is a 64-bit install, then it is not possible to run it when booted from a 32-bit LiveCD. Instead you get a message about an unsupported executable format. (You can check the processor flags in /proc/cpuinfo for the "lm" flag to see if the processor is capable of 64-bit mode -- if it isn't you need new hardware; if it is, you just need a new Live CD version.) (For other errors make sure you've mounted /proc, /sys and /dev into the chroot; mount --bind /DIR /target/DIR is one easy way to do this. /dev is particularly required in these days of udev, and dynamically populated device nodes.)

However one of the advantages of GRUB is that its initial boot loader is much smarter than older boot loaders like LILO, so if you can get the initial portion of the boot loader to work it is possible to manually step through the GRUB boot to get the installed OS running and from there run the GRUB installer from the real OS. So even if you can't run binaries from the installed OS, it's worth trying grub-install /dev/sda from the Live CD and rebooting. If you get to a grub> prompt, you can probably get the system to boot by hand.

The minimal set of commands to make GRUB boot a single OS various depending on whether it is GRUB 1 (aka GRUB Legacy), or GRUB 2 -- and GRUB legacy versions usually have a 0.xx version number, while GRUB 2 versions have a 1.xx version number!

GRUB 1 (aka GRUB Legacy) minimal commands

Assuming that the first partition on the disk is a /boot partition (a common setup, due to historical limitations on which portion of the disk the BIOS could access), then the minimal set of commands is:

root (hd0,0)
kernel /vmlinuz-VERSION root=/dev/ROOTDEVICE
initrd /initrd-VERSION
boot

where hd0,0 is the GRUB 1 way of referring to the first partition on the first disk, VERSION is replaced by the version of the kernel in use and ROOTDEVICE is replaced by the Linux name for the partition holding the root file system. Conveniently it is possible to use tab completion on the kernel and initrd lines, once the root line has been entered, which helps quickly narrow down the correct version number needed in the filenames.

GRUB 2 minimal commands

With the same assumptions (ie, /boot is the first partition on the first disk), the GRUB 2 commands are:

root (hd0,1)
linux /vmlinuz-VERSION root=/dev/ROOTDEVICE
initrd /initrd-VERSION
boot

with the same substitutions as above. Note two changes:

  1. hd0,0 becomes hd0,1, still for the first partition of the first disk. (Apparently someone saw fit to change the way the partitions were numbered in GRUB, which seems to be designed only to cause confusion, given it's still hd0 rather than hd1 -- ie, hard drives still start counting at 0, but partitions now start counting at 1. WTF?)

  2. kernel becomes linux, presumably due to the ability to handle more types of kernels. (This makes more sense.)

Tab completion works as with GRUB 1, and it's also possible to use ls to do simple directory exploration.

See also Ubuntu GRUB 2 notes, who apparently consider the partition naming changes an improvement. And the GRUB 2 Manual online.

Tidying up

Assuming you can get the installed OS running, you then want to update /boot/grub/menu.lst (GRUB 1) or /boot/grub/grub.conf (GRUB 2) with the appropriate drive settings that you used to make it boot by hand, and run update-grub followed by grub-install /dev/sda to reinstall the boot loader from the installed OS. One more reboot should show the system booting automatically; if there are any errors, providing you get to a grub> prompt you can simply boot by hand again and fix the configuration issue. (If you get a GRUB menu up, but it won't boot, you can use e to edit the configuration for this single boot as a quicker way to get going.)

And now for something different: Homebrew 1/10th scale Cray-1, via jwz; the Cray 1 was one of the first supercomputers. 30+ years later, the average phone has more CPU power. (ETA, 2010-09-01: Also discussed on Slashdot.)