Showing posts with label fail. Show all posts
Showing posts with label fail. Show all posts

Friday, January 4, 2013

Replacing a bad drive in a RAID 1 array

About a month ago, I started getting daily mail from mdadm (the service that creates RAID arrays in software) that there was a degraded-array problem:

Subject: DegradedArray event on /dev/md0:myserver

This is an automatically generated mail message from mdadm
running on myserver

A DegradedArray event had been detected on md device /dev/md0.


Uh-oh. This is one reason why I don't use RAID for booting the system. I use a totally separate hard drive for the OS. My RAID is for data.
Let's login as root and take a look:

# mdadm --detail /dev/md0

/dev/md0:
        Version : 1.2
  Creation Time : Thu Mar 17 10:42:16 2011
     Raid Level : raid1
     Array Size : 1171873823 (1117.59 GiB 1200.00 GB)
  Used Dev Size : 1171873823 (1117.59 GiB 1200.00 GB)
   Raid Devices : 1
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Fri Jan  4 07:25:43 2013
          State : active, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : myserver:0
           UUID : b8d85003:03eb3cd0:fbb98516:361fb411
         Events : 6596

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sdb1
       1       0        0        1      faulty removed

So one of my RAID drives has apparently desynced, gone bad, or otherwise fallen off the array beyond the ability of mdadm to fix. sdb1 is the remaining good drive.

# blkid | grep linux_raid_member
/dev/sda1: UUID="b8d85003-03eb-3cd0-fbb9-8516361fb411" LABEL="myserver:0" TYPE="linux_raid_member" 
/dev/sdb1: UUID="b8d85003-03eb-3cd0-fbb9-8516361fb411" LABEL="myserver:0" TYPE="linux_raid_member" 

My two RAID drives are /dev/sda1 and /dev/sdb1. sdb1 is good. Therefore the faulty drive is sda1. Let's get the serial number of the bad drive, since that is printed on the outside of the drive.

# hdparm -I /dev/sda1 | grep "Serial Number"
 Serial Number:      WD-WMAZA3691591

Alternately, I could look for the serial number of the good drive(s).
Let's try re-adding the bad drive. Perhaps it was a onetime anomaly

# mdadm --add /dev/md0 /dev/sda1

Watch the progress if the sync using cat /proc/mdstat. For a 1200GB partition, my system takes about three hours for the mirroring to complete.

Often, re-adding the drive to the array is all you need to do, and the nonfaulty drive will happily rejoin it's friends in RAID.



However in this case, an hour later, the mirroring failed. Another degraded-array email showed up in my inbox. And a basic information command like hdparm -I /dev/sda1 failed with an input/output error. The drive is no longer trustworthy and must be replaced.

I ordered two slightly larger drives, to increase size to a three disk array. Replacement drives must be the same size or larger. I happen to know I have space in the server case for an additional drive, and that I have additional power connections and motherboard data connections available (and cables) for the new drive.

With the new drives in hand, it's time to poweroff the server (er, don't forget to tell network users that you're doing this!). Since we're adding and removing drives, there is a good chance that we will need access to BIOS...so that means moving the server to the desk and hooking up a keyboard and monitor.

First, remove the bad drive. Good thing we know the serial number!
Second, add the new drives. Test them by booting into BIOS and ensuring they are detected. While in BIOS, make sure the boot drive is correct. Mine had changed and I had to fix it.

Finally, boot the machine and gain root.

# ls /dev | grep sd
sda
sda1
sdb
sdb1
sdb2
sdb5
sdc
sdd

It looks like sda1 is our raid partition, sdb* is our non-raid system disk, and sdc and sdd are the new unpartitioned drives. We can verify that using parted

# parted -l
Model: ATA WDC WD20EARS-00M (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name       Flags
 1      17.4kB  1200GB  1200GB  ext3         Server

Model: ATA WDC WD3200AVJB-6 (scsi)
Disk /dev/sdb: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size    Type      File system     Flags
 1      1049kB  319GB  319GB   primary   ext3            boot
 2      319GB   320GB  1546MB  extended
 5      319GB   320GB  1546MB  logical   linux-swap(v1)

In order to add the new drives to the RAID array, we must first create a partition table (I chose GPT becasue these are large drives). Next, we partition them to match the existing drive. Then we can add them to the array.

# parted /dev/sdc mklabel gpt
# parted -a optimal /dev/sdc mkpart 1 ext3 17.4kb 1200GB
# mdadm --add /dev/md0 /dev/sdc1

# parted /dev/sdd mklabel gpt
# parted -a optimal /dev/sdd mkpart 1 ext3 17.4kb 1200GB
# mdadm --add /dev/md0 /dev/sdd1

See how each partition has three commands? See how we took the partition sizes directly from the existing partition above?

And now we are back in familiar territory - we have added drives to an array before. We can monitor the process using cat /proc/mdstat, and expect it to take a few hours.

And the array is humming along smoothly again.

Saturday, June 27, 2009

Reinstalling Xubuntu 9.04

About a month ago, audio suddenly stopped working. Rather than troubleshoot, I decided to reinstall...it might be faster. Unlike last time, this time was a complete reinstall to get rid of certain dependency problems that had also cropped up.

  • Wireless networking worked immediately.
  • Oops. I didn't backup any of my *hidden* files - suddenly all of my e-mail and web caches are...gone. Lesson - keep older backups. Lesson - copy hidden files, too.
  • I tried using Jablicator to save my package list, but hit a failure - it couldn't resolve libdvdcss2 because it didn't come from the Ubuntu repositories.
  • Audio works in the Listen player - except wma files. Totem plays the wma, so it's not a codec or dependency issue. Streaming .pls works.
  • Video works. Added Flash, and YouTube and Hulu work. DVD read works after just a bit of tweaking

Saturday, June 7, 2008

Custom Browser URL bar icons

Long ago, I made a favicon.ico custom icon for the Milwaukee Without A Car website, but I don't remember how.... I think I created a .bmp image and simply renamed it...

So I found a new SVG of the same graphic, but I'm not using it - instead I'll stick with the old favicon until inkscape can export to .ico files.

Another example showing how SVGs and their tools have great promise, but aren't quite there yet.

Friday, June 6, 2008

More SVG Shortcomings

  • SVGs, since they are embedded objects, link internally instead of from the parent body. This is the same annoyance I had with frames long ago. For example, a link inside an <object> will work, but only the tiny object box will show the new page. Workaround: Use the target="parent" or target="top" properties of the link tag.
  • SVGs within a link (like a linked image) are not, in fact, linked. Move the cursor over them, and you see that you can't click on the SVG. This particularly sucks with a graphic menu header I tried. Workaround: Use an imagemap (ugh).
  • SVGs can't be called by CSS on a mouseover event (graphic header again). No workaround.

So I'm back to .png and .gif and .jpg for most HTML images. As before, SVG original for easy changes, but export raster images. Well, that's what experiments are for.

Hey, this is my first 'fail' entry in over a month!

Friday, April 4, 2008

DVD copying

Needed to backup a DVD. On Linux, Brasero failed to finish reading the disk, and never said why. The MacBook Disk Utility errored out with -39. Back to Linux dvdcopy - error reading Title VOB at block 159. No luck - giving up.

Thursday, March 20, 2008

NTFS external drive

For fun, I checked my external drive while booted in Windows. Oh no! It needed to be defragmented! Defrag recommended CHKDSK as well.

Big mistake: One or both of them scattered a lot of my backup files (including the backups of this blog - quite a few entries gone until they turn up again). I'm afraid to look at my music...or my collection of Futurama episodes. Only fix is to hunt through the 'found' folder and put stuff away manually. Big mess.