Recovering a Linux/Unix system from a failed hard drive typically involves:
Booting from a live CD on the machine with the new hardware
Partitioning and mkfs'ing the new hard drive, and mounting the partitions
Establishing network connection to the backup server
Copying files from the backup onto freshly mounted partitions
Making the drive bootable
Rebooting, and being happy
There are various ways to do these various steps (eg, using the OS install CD, a System Rescue CD, or a traditional LiveCD like Knoppix); using rsync, ssh, or tar and netcat; etc) but the overall process is pretty similar for every recovery. The step with the most variability is usually making the new drive bootable, as often something about the drive setup has changed between the old system and the new system -- at minimum the file system UUIDs have probably changed, and a lot of modern Linux distributions are mounting by file system UUID supposedly to try to improve resilency in the face of changes. (Alas like identifying network interfaces by MAC address, in many situations these approaches seem to cause more problems than they solve, since they're fragile right at the point where you need all the help you can get -- recovering the system onto new hardware. Sigh. Mounting by file system label is a bit easier to recreate if the user knows to expect it, so that's my preference when doing things by hand.)
Traditionally Linux on Intel/AMD x86 ("PC") hardware was booted
using LILO,
the LInux LOader. Over the years the Linux kernel (and hardware
requirements) have grown to a point where LILO cannot easily keep
up, and so modern Linux systems are usually booted using
GRUB, the GRand Unified
Bootloader. Typically GRUB needs less maintenance than LILO;
typically running update-grub
to update the GRUB
configuration file when there is a new Linux kernel installed is
all that is required (by contrast LILO needed its installer re-run
any time you changed anything to do with LILO or the things it was
supposed to load, because it hard coded details of what was to be
loaded into the boot system).
Often the way that the drive is made bootable is to
chroot
into the
freshly copied over install and rerun the boot install program, eg,
grub-install
. One challenge in doing so is that the
Live CD which was booted must have support for the OS that was
copied over. In particular if the OS copied over is a 64-bit
install, then it is not possible to run it when booted from a 32-bit
LiveCD. Instead you get a message about an unsupported executable
format. (You can check the processor flags in /proc/cpuinfo
for the "lm" flag to see if the processor is capable of 64-bit mode
-- if it isn't you need new hardware; if it is, you just need a new
Live CD version.) (For other errors make sure you've mounted
/proc
, /sys
and /dev
into
the chroot; mount --bind /DIR /target/DIR
is one easy
way to do this. /dev
is particularly required in these
days of
udev
,
and dynamically populated device nodes.)
However one of the advantages of GRUB is that its initial boot
loader is much smarter than older boot loaders like LILO, so if you
can get the initial portion of the boot loader to work it is possible
to manually step through the GRUB boot to get the installed OS
running and from there run the GRUB installer from the real OS. So
even if you can't run binaries from the installed OS, it's worth trying
grub-install /dev/sda
from the Live CD and rebooting. If
you get to a grub>
prompt, you can probably get the
system to boot by hand.
The minimal set of commands to make GRUB boot a single OS various depending on whether it is GRUB 1 (aka GRUB Legacy), or GRUB 2 -- and GRUB legacy versions usually have a 0.xx version number, while GRUB 2 versions have a 1.xx version number!
GRUB 1 (aka GRUB Legacy) minimal commands
Assuming that the first partition on the disk is a /boot
partition (a common setup, due to historical limitations on which
portion of the disk the BIOS could access), then the minimal set of
commands is:
root (hd0,0)
kernel /vmlinuz-VERSION root=/dev/ROOTDEVICE
initrd /initrd-VERSION
boot
where hd0,0
is the GRUB 1 way of referring to the first partition
on the first disk, VERSION
is replaced by the version
of the kernel in use and ROOTDEVICE
is replaced by the
Linux name for the partition holding the root file system. Conveniently
it is possible to use tab completion on the kernel
and
initrd
lines, once the root
line has been
entered, which helps quickly narrow down the correct version number needed
in the filenames.
GRUB 2 minimal commands
With the same assumptions (ie, /boot
is the first partition
on the first disk), the GRUB 2 commands are:
root (hd0,1)
linux /vmlinuz-VERSION root=/dev/ROOTDEVICE
initrd /initrd-VERSION
boot
with the same substitutions as above. Note two changes:
hd0,0
becomeshd0,1
, still for the first partition of the first disk. (Apparently someone saw fit to change the way the partitions were numbered in GRUB, which seems to be designed only to cause confusion, given it's still hd0 rather than hd1 -- ie, hard drives still start counting at 0, but partitions now start counting at 1. WTF?)kernel
becomeslinux
, presumably due to the ability to handle more types of kernels. (This makes more sense.)
Tab completion works as with GRUB 1, and it's also possible to use
ls
to do simple directory exploration.
See also Ubuntu GRUB 2 notes, who apparently consider the partition naming changes an improvement. And the GRUB 2 Manual online.
Tidying up
Assuming you can get the installed OS running, you then want to
update /boot/grub/menu.lst
(GRUB 1) or
/boot/grub/grub.conf
(GRUB 2) with the appropriate
drive settings that you used to make it boot by hand, and run
update-grub
followed by grub-install /dev/sda
to reinstall the boot loader from the installed OS. One more reboot
should show the system booting automatically; if there are any
errors, providing you get to a grub>
prompt you can
simply boot by hand again and fix the configuration issue. (If you
get a GRUB menu up, but it won't boot, you can use e
to edit the configuration for this single boot as a quicker way to
get going.)
And now for something different: Homebrew 1/10th scale Cray-1, via jwz; the Cray 1 was one of the first supercomputers. 30+ years later, the average phone has more CPU power. (ETA, 2010-09-01: Also discussed on Slashdot.)