Anyway, the spare drive was partitioned to be used as a place to store system backups. But the system was so new that it hadn't been used much. So I just moved the data from it and then used the spare to replace the bad sdb drive. The system was configured with LVM on top of software raid for sda and sdb, and just LVM for sdc (the spare).
I used fdisk to verify the drive partitions. I did a lot of testing and checking to make sure that I wasn't going to blow away the only good data I had.
fdisk -l /dev/sda
fdisk -l /dev/sdb
Once I was absolutely sure that sda had the good data, sdb was bad, and sdc was the spare, I first removed the LVM settings from the spare drive. Note that the system was reporting sdc as sdb because it didn't recognize the bad drive anymore. Unfortunately, I didn't keep great notes, but I think I did this:
lvremove /dev/vg1/lvbackup
vgremove /dev/vg0
pvremove /dev/sdb1
With those settings out of LVM, I repartitioned the spare drive (now reporting as sdb) to match the exact partition of sda. This command copies the partition of sda and overwrites the partition of sdb with the sampartitioning.
sfdisk -d /dev/sda | sfdisk /dev/sdb
I tried to remove the old sdb drive from the raid array, but the --failed and --removed commands reported errors:
mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm --manage /dev/md1 --remove /dev/sdb1
So I just added the new drive partitions to the raid array:
mdadm --manage /dev/md0 --add /dev/sdb1
mdadm --manage /dev/md1 --add /dev/sdb2
Checking /proc/mdstat showed the partitions syncing. I rebooted just to make sure it all came back up properly, which it did.
Then, about an hour later, the load on the server skyrocketed up to the 30-50 range. This server is a dual 4-core cpu (total of 8 cores) with 16GB ram and three 750GB Seagate Barracuda ES.2 sata drives. I'm not sure what caused it, but I was worried it might be the raid sync. I found out that my customer was running myisamchk on their database at the same time. So maybe it was just that. The load came back down to around 1.5 and the syncing is still running. I guess it is all good now...
I found this resource a helpful guide while solving this problem:
http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array
No comments:
Post a Comment