October 21, 2010

Windows Server 911

What could be worse than never performing backups? Server crashes in the middle of the backup process. Which was exactly what happened to me a few days ago, while I was backing up SQL data off our ERP server, and then the work day started and users started calling in the complain.

After the server rebooted, the Intel ICH RAID started rebuilding itself and was stuck at 33% and the ECC errors kept on climbing, which indicated a bad hard drive. The Intel console didn't say which drive was going bad though. Fortunately, the BIOS settings page did say that drive 0 was bad, so I replaced the drive with an identical one.


Unforunately, after the drive was replaced, it restarted the RAID building process, but the Intel Matrix Storage Console also complained that there's a missing drive. I kept wondering if I did something wrong, but luckily, after about 30 hours of RAID rebuilding, the process completed and everything was fine, and the missing drive error disappeared. (I didn't capture an image of the missing drive error, since I thought it would be there even after the rebuild was complete.)


So everything turned out to be fine. But during the 30 hours that the RAID was rebuilding, I took another full backup of the SQL data and restored it into my server's identical twin with the newer OS. So now I have two identical servers with different OS's and different versions of SQL Server.

The older server is Windows Server 2008 with SQL Server 2008, and the new, live, server is Windows Server 2008 R2 and SQL Server 2008 R2. After some testing, I decided to just take out the older server and replace it with the new one, since I've been wanting to upgrade to R2 for a while. So the drive crashing and stressful server rescue (and getting abused by the users) turned out to be a good thing after all.

Before:




After:



Oh, and what could be more stressful than bringing a new server with restored databases online? A database went missing. But this was a result of the bad information our ERP consultants gave me (i.e. I should always restore database backups using the ERP client and not using SQL Server), which further proves their uselessness.

No comments: