Image Map Image Map
Results 1 to 8 of 8

Thread: RAID Failures

  1. #1
    Join Date
    Oct 2008
    Location
    Kamloops, BC, Canada
    Posts
    5,787
    Blog Entries
    44

    Default RAID Failures

    This always sucks when you are running a server. Tried accessing some PDF's last night and discovered the entire directory tree for my server's shared volume was missing everything but the folders. Looks like the RAID controller (IBM ServeRAID 4LX) dropped the dedicated SCSI chip late Monday night, the snap-in failed to alert of the cascading failure (flagged failing drives, drives disappearing and then the entire RAID failing) and was not able to light the fault lamps, Windows Server didn't drop the volume and at 3 the next morning Remote Storage scanned the "zombie" volume, hosed the database down because it could only guess that everything had been deleted and proceeded to do so for two more days until I discovered the problem and shut the server down. Last backup was technically done on Saturday morning so in reality I've only lost a day and a half of actual data but I'm trying to figure out how to get the RAID5 back anyways because I can only think of five files I've lost and restoring from tape sucks and I don't want to baby this thing over my thanksgiving long weekend.

    Right now the RAID controller still technically works. It signs on at POST and lets you enter the config utility and everything seems to be happy but cannot see the SCSI chip and thus anything on the bus. I did have a spare card for this exact reason and when that was swapped in it spun the drives up, saw an existing RAID5 config and tried to sync back to the controller, but failed because the card BIOS, the controller firmware and the Windows driver were different versions. Apparently I never verified that it was flashed to the same revision level...
    Reflashing with IBM's firmware diskette revealed I overshot and went to xx.xx.14 instead of xx.xx.11, because apparently when I updated everything way the heck back in 2015 I somehow missed the very very last version, and IBM is selective on what the public can download so I seemingly cannot flash backwards.
    So anyways I did the next logical step and moved the EPROMs over from the bad controller to the good one. That bricked the spare controller (the card BIOS signs on, it waits for the controller firmware to load and then fails) but still left me with a crippled original controller when I swapped them back.

    So the drives are good, the cabling is good, the integrity of the RAID should be good but the controller the RAID was built on is dud. Is it even possible at this point to rebuild the logical volume without moping how I can't afford a proper data recovery service? It isn't encrypted or using weird striping. It's just a regular RAID5 on three matched 300gb SCSI disks.
    Last edited by NeXT; October 11th, 2019 at 12:37 PM.
    = Excellent space heater

  2. #2
    Join Date
    Feb 2011
    Location
    NorthWest England (East Pondia)
    Posts
    2,246
    Blog Entries
    10

    Default

    Try some of the used ones on e-bay? They seem cheap enough assuming its 06P5741...
    Dave
    G4UGM

    Looking for Analog Computers, Drum Plotters, and Graphics Terminals

  3. #3

    Default

    Oh do I feel your pain. I've been through just this kind of disaster several times over the years. I've never succeeded in rebuilding an array on a controller other than the one it was built on. I'm not convinced that it can't be done, but I've never succeeded.

    I don't do the data recovery service thing either; I've resorted to mirroring on USB hard drives; which for the sizes I need are pretty affordable. I'm still using Hi-8, too. But that's kind of scary.

  4. #4
    Join Date
    Dec 2008
    Location
    libtard capital, California
    Posts
    1,065

    Default

    repeat after me: RAID is not a substitute for backup...

  5. #5
    Join Date
    Oct 2008
    Location
    Kamloops, BC, Canada
    Posts
    5,787
    Blog Entries
    44

    Default

    Quote Originally Posted by dorkbert View Post
    repeat after me: RAID is not a substitute for backup...
    Which is why the entire system is backed up. I only ran a three drive RAID5 because I don't fully trust SCSI drives this old with long term data unless the ability to recover from a single drive failure is an option.

    I can still buy another controller, but this version mismatch thing is confusing the hell out of me. Was it the original RAID config that was causing it or was the replacement card simply unable to be at all backwards compatible?
    = Excellent space heater

  6. #6

    Default

    I cannot imagine that the newer firmware isn't compatible with the data on the drives. I have seen where the older driver wouldn't like the newer firmware, so when I did servers regularly we always upgraded the driver first, and then the firmware, just in case. Nothing sucks more than a windows STOP 0x7b.

  7. #7
    Join Date
    Feb 2009
    Location
    Chattanooga, TN - USA
    Posts
    857
    Blog Entries
    1

    Default

    I've had great luck recovering raid 5 sets with this software


    https://www.runtime.org/raid.htm


    I had my own oh crap this week, I logged into my xpenology box that runs a 12 drive SHR2 raid (Mix and match drive size raid, 2 drive redundant) and had 2 failed drives, I was never notified because of the email changes gmail made recently.
    Luckily I was able to replace and rebuild.


    Later,
    dabone

  8. #8
    Join Date
    Oct 2008
    Location
    Kamloops, BC, Canada
    Posts
    5,787
    Blog Entries
    44

    Default

    Actually here is a good question to ask in the downtime for anyone who is knowledgeable in Windows server 2000 and 2003

    With Remote Storage, I know it relies on the monitored volume to retrieve files that RSM has put away. While keeping a backup or shadow copy of the monitored volume makes total sense, how do you handle situations where say, an RSM media copy is imported into the server but there is no monitored volume to reference from? Is RSM able to recreate the contents of a monitored volume using the media copy? I'm seeing this giant one-way flaw where if the monitored volume for any reason does not exist or is older than the last update of the media copy (say from an older backup), you end up with files that you KNOW are on tape but there is no way to retrieve them. I'm reading some of the older Server 2000 administration books and I'm not seeing anything that answers this.
    = Excellent space heater

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •