Debian Squeeze RAID1 and GRUB2

Introduction

This article presents the analysis of a problem I had setting up a RAID1 root filesystem using mdadm on Debian squeeze (6.0-6.0.1) using grub2. The analysis and conclusions are below.

Background information

Debian squeeze includes the following packages:

package
version
common aliases
grub-pc1.98+20100804-14grub2
grub-legacy0.97-64grub

For compatibility with what most other people do, this article will use the names grub2 and grub-legacy. In addition, we will refer to the code written to a system’s MBR as the grub loader, which may be supplemented by grub modules.

My system has two disks: /dev/sda and /dev/sdb. /dev/sda is partitioned by partman during the installation into /dev/sda1 of size 12GB for root, /dev/sda2 of size 500MB for swap, its remaining space was left unallocated. /dev/sdb had no partitions, so all its remaining space was left unallocated. What I wanted was a RAID1 root filesystem and RAID1 swap.

The plan

My plan was based on what I had previously done on Debian lenny systems:

  1. copy the partition table from /dev/sda to /dev/sdb using parted
  2. create a one-legged (i.e. immediately degraded) RAID1 device /dev/md0 from /dev/sdb1 and a one-legged RAID1 device /dev/md1 from /dev/sdb2 using mdadm
  3. copy root filesystem from /dev/sda1 into /dev/md0 by running:
    mount /dev/md0 /mnt
    rsync -ax / /mnt/

    and update the fstab on /dev/md0 to mount /dev/md0 as root filesystem.

  4. ensure /dev/sda’s MBR is intelligent enough to find the Linux kernel inside /dev/md0 by running:
    grub-install --modules="raid mdraid part_msdos ext2 linux normal help" \
        --root-directory=/mnt /dev/sda
  5. reboot using some manual intervention to boot from /dev/md0 (in order to ensure /dev/sda1 is not in use in order to ensure mdadm is able to add /dev/sda1 to /dev/md0 without complaining that /dev/sda1 is in use)
  6. add /dev/sda1 to /dev/md0 and /dev/sda2 to /dev/md1 using mdadm
  7. ensure both disks’ MBRs are intelligent enough to find the Linux kernel inside /dev/md0 using grub-install
  8. test by physically detaching each disk in turn

The problem

Step 5 of the plan outlined in the previous section failed; I could not get the grub loader to load the Linux kernel from /dev/md0. In case this page is googled, here are several of the error messages I encountered:

Welcome to GRUB!

error: no such disk.
Entering rescue mode...
grub rescue> insmod linux
error: no such disk.
grub rescue> set prefix=(md0)/boot/grub
grub rescue> insmod linux
error: no such disk.
grub rescue> set prefix=(hd1,msdos1)/boot/grub
grub rescue> ls $prefix

error: file not found.
grub rescue> ls /
error: no such disk.
grub rescue> set root=(hd1,msdos1)
grub rescue> ls /

grub rescue> set prefix=(hd0,msdos1)/boot/grub
grub rescue> insmod linux
grub rescue> insmod normal
grub rescue> help
Unknown command `help'
grub rescue> insmod help
grub rescue> normal
#  escape from grub menus to grub shell
c                                                           
grub> set root=(hd0,msdos1)
#  tab-completion works here
grub> linux /boot/vmlinuz-2.6.32-5-amd64 root=/dev/md0 ro
#  and here too
grub> initrd /boot/initrd.img-2.6.32-5-amd64                
grub> boot
#  kernel boots
...                                                        
Booting processor 1 APIC 0x3 in 0x6000
Not responding
Booting processor 2 APIC 0x2 in 0x6000
Not responding
Booting processor 3 APIC 0x1 in 0x6000
Not responding
#  system reboots!

Analysis

When I was unable to boot from /dev/md0, I decided to try all combinations of:

  1. telling mdadm to use either metadata version 0.90 or metadata version 1.20,
  2. either running or not running grub-install before the reboot,
  3. either interrupting the grub loader at the boot menu and editing the first boot menu selection or escaping to the grub loader’s shell and trying to boot manually or not interrupting the grub loader,
  4. interactively telling the grub loader to get the Linux kernel either from (md0) or (md/0) or (hd1,msdos1) or (hd0,msdos1),
  5. telling the Linux kernel that the root filesystem is either /dev/md0 or /dev/sda1,
  6. either adding or not adding nolapic to the kernel command line.

Some combinations could not be tested because things had already gone wrong (e.g. if it is not possible to tell the grub loader to get the kernel from (md0) then it is not possible to tell the kernel to use /dev/md0 as the root fileystem). Some combinations could not be tested because the combination was not valid (e.g. if the grub loader is not interrupted then there is no opportunity to tell the kernel that the root filesystem is either /dev/md0 or /dev/sda1).

The results of this analysis are best presented as a flowchart (click on the image for the fullsize version):

diagram1

Notes on various boxes in the diagram:

  1. mdadm metadata version: either write mdadm’s metadata to disk in version 0.90 format or in 1.20 format. 0.90 format was written by default by mdadm version 2.6.7.2, which was included with Debian lenny. 1.20 format is written by default by mdadm version 3.1.4, whichh is included with Debian squeeze. For more info, see the mdadm(8) man page.
  2. grub finds grub and load modules: when the grub loader is written onto the MBR with the grub-install command, then some grub modules may be written with it; these are typically used to give the grub loader enough intelligence to find and load other grub modules; these other grub modules are typically used to find and load the Linux kernel. The Debian installer calls grub-install getting it to write the following grub modules to the MBR: raid, mdraid, part_msdos, ext2. After that you can run something like:
    set prefix=(hd0,msdos1)/boot/grub
    insmod <module-name>

    to add other modules. The grub-install command shown in the ‘The Plan’ section above writes all the necessary modules to the MBR, so there should be no need to set prefix or run insmod to load anything else (at least for this RAID1 + non-LVM setup).

  3. menu interrupt mode: at the grub loader’s boot menu, one of three relevant steps may be taken: do nothing allowing the grub loader to execute the commands behind the first menu option, edit the first menu option, escape to the grub loader’s shell.
  4. grub root: specifies the value of the grub loader’s root environment variable; e.g.:
    set root=(hd0,msdos1)

    It is kind of implied that the grub loader’s prefix environment variable will have been set to something similar first; e.g.:

    set prefix=(hd0,msdos1)/boot/grub
  5. linux root: specifies the value of the Linux kernel’s root command line option; e.g.:
    root=/dev/md0
  6. nolapic: specifies whether the Linux kernel’s nolapic command line option was used
  7. OK: system boots up successfully from /dev/md0
  8. OK: USELESS: system boots up successfully from /dev/sda1, but this is not what we wanted, as it will mean /dev/sda1 is marked as in use, and therefore cannot be added to /dev/md0
  9. OK: OUT OF SCOPE: this article is concerned with grub2, not grub-legacy (a procedure for installing grub-legacy and RAID1 can be found in the ‘Raiding root and swap (lenny)’ section of Configuring storage services)
  10. ERROR: NO SUCH DISK: the grub loader reports error: no such disk.
  11. ERROR: LS EMPTY: the grub loader’s ‘ls /’ command reports nothing
  12. ERROR: APIC: consistent APIC errors while booting the kernel
  13. ERROR: ~APIC: occasional APIC errors while booting the kernel
  14. ERROR: ~SERVICES: occasional service startup errors (e.g. syslogd, ntpd)

Having worked out all the above ways to try to boot and the results, it remains to select the route across the flowchart from “START” to an “OK” that involves the minimum deviation from a standard installation. This is not so difficult as there are only a couple of options.

Conclusions

  1. The route I selected was the following (click on the image for the fullsize version):
    diagram1-route1
    (But note that this route is only required for a short time during the procedure.)
  2. The complete procedure can be found in the ‘Raiding root and swap (squeeze)’ section of  Configuring storage services generation two.
  3. grub2 thinks (md0) and (md/0) are the same,
  4. grub-install’s --root option is used to determine where to write supplementary grub modules to and to derive the initial value for the grub loader’s prefix environment variable written onto the MBR.
  5. The fact that reinstalling grub2 results in an automatically bootable system whereas manually running grub-install does not, suggests some extra steps are being run by grub2′s post-install script (but note that that only writes an MBR to /dev/sda, not to /dev/sdb).
  6. There are obvious reasons why grub2 has not yet reached version 2.0; one might reasonably consider installing grub-legacy instead of grub2 and waiting to see how much grub2 has evolved by the time Wheezy is released, especially as Xen Dom0s with lenny installed require all PV DomUs to use grub-legacy.
  7. Once both disks are part of RAID volumes, then do not run grub-install; it will not be possible to boot if you do!
  8. It’s more than likely that I have misunderstood something about how to use grub-install, but it is not something I could find by googling or referring to official grub documentation or man pages. If somebody knows the full grub-install command I should be running to avoid all these problems then please let me know!

See also

One thought on “Debian Squeeze RAID1 and GRUB2

  1. Hi no-ip,

    Great post with allot of detail that I will be referring back to should I decide to setup RAID for my home server.

    I first encountered the

    error: file not found.
    Entering rescue mode . . .
    grub rescue>

    first when I filled up my laptop’s 1TB drive (was copying directories from a 2TB drive that I am going to reformat and use in a home server)

    and

    second when I was moving (mv) directories from/to a SATA drive, in a IO Magic disk enclosure via USB, pulled from another Linux box (that also had a boot and root partition on it). I did a bad thing. I unplugged the USB without ‘umount’ing the device as I was going to reboot anyway (did not think it would matter in this specific scenario…wrong). Funny how you will do things at home with your own equipment that you will never do in your professional capacity as a Systems Administrator / Systems Engineer.

    In both of the above scenarios the ‘insmod’ command will not work, instead one gets this error: error: file not found

    Whether this is a problem specific to either grub or grub2 for debian and not ubuntu, I do not know as I write this but will discover later today as I continue searching. Thought I would simply share this experience for the benefit of others and bring it to everyone’s attention, that there may be potential differences between debian and ubuntu with respects to grub or grub2. Which leads me to the next comment and suggestion to you for the benefit of others.

    I honestly do not know if I am running grub2 (its a new laptop, so probably or grub. Thought it might be good for you to include in your blog post above the command that will tell you whether you have ’1.98+20100804-14′ or ’0.97-64′. As ‘grub-install -v’ did not work either. Probably would be a good idea to include the file where this information is stored so that others can find it, if they too can not run grub-install.

    I was able to determine the right device to use in the ‘set root’ and ‘set prefix’ commands thanks to the ‘ls’ and ‘ls (hd0,msdos2)/’


    grub rescue> ls
    (hd0) (hd0,msdos2) (hd0,msdos1)

    grub rescue> ls (hd0,msdos2)/
    ./ ../ var/ proc/ dev/ sys/ run/ .pulse-cookie .pulse/ data/ bkup2/

    thus for set root and set prefix I could use:

    set root=(hd0,msdos2)
    set prefix=hd0,msdos2) /boot/grub

    Unfortunately no joy with insmod or insmod normal

    insmod normal
    error: file not found

    At this point I have only been searching for the answer online for a few hours, spending more time reading to learn, than just searching for the solution and will probably discover that eventually. Hopefully my comments help others.

    For completeness, here is my fdisk -l (shows root as /dev/sda2)
    Device Boot Start End Blocks Id System
    /dev/sda1 2048 33222655 16610304 82 Linux swap / Solaris
    /dev/sda2 * 33222656 1953523711 960150528 83 Linux

    So my /dev/sda2 is equivalent to (hd0,msdos2), hopefully this will help others who are experiencing this for the first time.

    While I do not have a solution yet for the second scenario, for the first scenario, I was able to use my debian 7.3 boot / rescue disk to get to a terminal window, mount the drive, also mount another USB disk enclosure, copy files from the full drive to the drive on the USB device. It was interesting that a ‘df -k’ or ‘df -P’ did not show that any space was free until I moved (mv) off approx 25% of the 100% full drive or more than 250MB off the 1 TB drive that was full.

    Another weirdness that I have not figured out yet was that the 2TB drive in the IOMagic enclosure, that had been a boot drive in another Linux system, never showed any space free, kept showing 100% full even though I knew that was not true. Hopefully I do not have to free up 25% of that 2TB drive before it shows the amount of space FREE, like I did with the 1TB drive in my laptop that had filled up.

    Hopefully all this will help another understand their configuration and thank you for providing such a detailed diagram and explanation in your blog post…wow, loved it.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>