damn nice-thing to hell

I don’t know if it’s something that we are doing, or if it’s just Dell’s POS server, but my gosh… Nice-thing is broken again, now this time it isn’t any software issue, it’s hardware.

Last week, we determined that the cause of our LVM metadata corruption was that the second local hard disk in nice-thing decided to die. Now, because LVM makes both hard drives act as one (like RAID), when one goes…so does everything else.

Well, Jeremy called up Dell and they shipped us a new hard disk and when we installed it, it wasn’t being detected. After a few hours of troubleshooting, we now determined that the problem was not the initial hard disk, but the SCSI daughter card’s second hard disk slot.

Jeremy again rang up Dell and they without a question, sent a Technical Repair man over from Syracuse the next day. Now, I wasn’t in the lab at the time he replaced arrived, but I guess he replaced the mother board and the SCSI cord to the new hard disk and hightailed it out of there. About an hour after he left, we noticed that the servers fan’s would not rev-down. They were on full blast constantly. Also, the server has a light on the front of it that goes from blue (all systems are go) to amber (something is critically wrong). This light was now amber.

After some extensive diagnostic testing, the software told us that the CPU’s temperature has “exceeded the upper non-critical bound.” Dell is again called up and they give us this sketchy piece of software, from some weird ftp site and tell us to run it. Now, supposedly it is supposed to run tests on the server, zip up the results, and ftp it back to the mothership that is Dell HQ.

As of now, we are waiting to receive a response from Dell about these tests. I hope we never get another Dell server…

Comments are closed.