Linux RAID disk wipeout

A common problem with Linux software RAID (aka md) happens when you try to use a disk that was previously part of some other disk array. Symptoms include: wrong volume size, unable to add device to raid, volume UUID mismatch. To fix the problem you need to use mdadm utility on the disk to cleanup:

# mdadm --zero-superblock devicepath

If you need to apply this fix on a system that doesn’t boot up (for instance when your root volume is on RAID), remember that mdadm and other disk administration utilities are available in Gentoo minimal installation disk.

UPDATE: Rav asked for the gory details so here it is: when you initially create a Linux RAID array, mkraid writes a signature to the disk called superblock, which contains a unique UUID code for the array and a description of the array (size, raid level, etc). When Linux kernel boots up, this superblock is read by the md kernel module and a minor device number is assigned to the array. Even if you erase your partition table or mbr, this superblock won’t be erased.
The problem arises when you try to add a disk with an existing superblock to a computer that already has another array in place (for instance when replacing a faulty RAID1 or RAID5 disk): if md driver recognises a superblock, it won’t allow your added drive to join the array and will report a generic “Invalid argument” error. Furthermore, it can happen that, if a minor number is forced onto an array, when booting a system with two parts of arrays trying to grab the same minor, none of them can get through and therefore md devices are not available.
So, instead of zeroing the whole disk with dd if=/dev/zero of=/dev/path, which can take a certain amount of time and is quite useless (if you’re rebuilding RAID1 or RAID5, your disk contents will be overwritten by raid reconstruction anyway), you can use the command explained at the beginning to erase just the bad superblock and fix the problem.

Just a final notice: another problem with replacing disks in RAID1 and RAID5 happens when people try to use a volume which is slightly smaller than the others in the array (even if advertised capacity is the same of the old drives, there can be slight differences in actual number of blocks). In this case, the error reported from md upon loading is the same as above: “Invalid argument”. So if your disk is unused, this is probably the first thing to check, otherwise try the following command on the disk device to check for existing superblocks:

# mdadm -E devicepath
This entry was posted in Sysadmin, Tips and tagged , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

2 Comments

  1. Rav
    Posted February 3, 2010 at 5:44 AM | Permalink

    You should explain what the technical reasoning for the error is. You should also explain exactly what the “fix” does.

    Thanks.

  2. Posted February 3, 2010 at 8:22 PM | Permalink

    Hi Rav, I just posted an update to the post, with more technical details.

3 Trackbacks

  1. By uberVU - social comments on February 2, 2010 at 7:55 PM

    Social comments and analytics for this post…

    This post was mentioned on Twitter by geekscrap: Linux RAID disk wipeout http://goo.gl/fb/o3e5 #sysadmin #tips #gentoo #linux #md #raid…

  2. By some stuff « factorQ.net on February 3, 2010 at 12:57 PM

    [...] by jamba on February 3, 2010 Linux RAID disk wipeout: A common problem with Linux software RAID (aka md) happens when you try to use a disk that was [...]

  3. [...] Linux RAID disk wipeout [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>