October 15, 2010

The accidental DBA

A lot of DBA seem to be accidents. A search for accidental DBA on Google turns out millions of matches, and now I'm one of them.

A few months ago, when my Microsoft Dynamics NAV ERP project was about to go live, one of the last things we got stuck at was a process called adjust cost item entries ("ACIE"), which basically takes all the transactions you've made, and figures out the average or FIFO cost of a particular item in your inventory.

So I was looking at the adjust cost process, and it was taking a long time. A loooong time. A looooooooong time. A looooooooooooooooooong time. After a few hours, I killed the process and decided to ask the consultant about it.

The mor^H^H^Hconsultant said something like... your server is probably way below our recommended specs, adjust cost only takes two hours on her notebook, you need to buy a new server, or install a modern OS like Windows Server 2003 and SQL Server 2005. I told her that our server already runs Windows Server 2008 SP1 and SQL Server 2008. Both are 64-bit. The server has a Core 2 Quad Q6600, 8 GB RAM, and loads of disk space in a RAID5 configuration. She then mumbled something about Windows Server 2008 is too new and probably incompatible with our hardware, but then she suddenly jumped and said that RAID5 is baaaaad for SQL Server.

Hmmf. I do know that RAID5 is not a very good choice to run SQL Server on, and our server really wasn't all that powerful in 2010 (it was bought in late 2007), but then the server was specifically bought to run the ERP software, our database wasn't that big, and we only have 20 users. Even if it's sub-optimal, it shouldn't take forever to do something that needed to be done weekly or even daily.

After trying a whole bunch of settings on both the hardware and on Windows, and being nagged to no end by the consultants that my RAID5 setup is no good for running SQL Server and that I need to buy the Enterprise Edition of SQL Server instead of the Standard Edition, I finally brought my second server with identical specs online, but I installed the database files on separate drives. It's not RAID10 like they asked, but it has separate drives for the OS, for SQL Server itself, for the database files, for the log, and tempdb. I also installed Windows Server 2008 R2 and SQL Server 2008 Enterprise.

So I started the adjust cost batch process on this server, and exactly 24 hours later, it completed! I had never been able to complete the adjust cost process on the live server, since the database server can't accept posting entries while ACIE is running. But running for 24 hours wasn't exactly acceptable. (I could even accept 12 hours (!) since our working hours is 12 hours.)

I couldn't understand why the process only takes two hours on her notebook, so I decided to steal a look at the notebook when she wasn't looking to see how things were configured there. But then I discovered that she was running the native client version of the Dynamics NAV software. GAH!

Dynamics NAV comes in two different versions. The native version uses a file based database, and the SQL Server client version which, of course, uses SQL Server. They're very similar, but code has to be optimized differently depending on which version is used. I realized then that the idi^H^H^Hconsultants had absolutely no clue whatsoever about SQL Server, and decided to take matters into my own hands. (Okay, I had an idea earlier that they were clueless when they installed the entire database into my boot drive when the data drive had 1 TB of empty space, but they argued that I set up the drives wrong.)

Oh, I did get the ACIE issue resolved. While I was trying out different hardware settings, one of their other consultants (PURE GENIUS, by God!) mumbled something about bugs in the ACIE routine, and sent me an object, which immediately resolved the issue, and my ACIE time dropped from 24 hours to minutes. Proving that all the problems had nothing to do with my hardware whatsoever.

Since I was taking matters into my own hands (becoming the accidental DBA), I had to start learning first. I first went to Amazon and bought The NAV/SQL Performance Field Guide, which is targated at Dynamics NAV and Professional SQL Server 2008 Internals and Troubleshooting, which is targated at SQL Server in general. Unfortunately, since I was such a newbie at SQL, and I like Wrox books in general, so I also got Beginning Microsoft SQL Server 2008 Administration. (Since then, I've also bought Manning's SQL Server 2008 Administration in Action, which is by far my favorite right now.)

After burying myself in the books for a few weeks, I started working on the SQL Server. First, I discovered that the consultants had allocated 40 GB to the NAV database's PRIMARY file, and it was only using 60 MB. The data filegroup, however, was set to auto-grow and it was at 12 GB with 99% full. So I resized it into a more logical 128 MB PRIMARY and 16 GB data. They told me that I had allocated too much space when they were the ones that did the idiotic allocation.

Second, I discovered that the NAV log file was 200 GB. We've never done a backup using SQL Server, since the consultant insisted that we should backup only using the Dynamics NAV client and not through SQL Server. I backed up the log which cleared it out, and resized it to a more logical 8 GB. Since we only do daily backups and don't care about data loss during the day, I also decided to change the recovery mode to simple.

After the above, I made maintenance plans to make daily backups using SQL Server's backup function. Backing up using the NAV client by then took nearly an hour, and restoring the database took nearly three hours. Backing up and restoring using SQL Server only took about 5 minutes.

Next, The Dynamics NAV/SQL Performance Field Guide provided an index defragmentation script, which took more than an hour to complete on the first run. I guess it was an indication how fragmented my database was. Nowadays it takes about 2-3 minutes to run.

The consultants also said that my server has way too little RAM (it has 8 GB) since task manager indicates that SQL is using all the memory. 8 GB is not that much for SQL Server, but we have a very small database! Besides, SQL Server is designed to cache data in memory and it's normal for SQL Server to use large chunks of memory. But I still made changes to SQL Server so it doesn't automatically take up all the memory.

I wrote above that the ACIE issue was resolved. However, the ERP software was still suffering performance problems, and the assw^H^H^H^Hconsultants were still insisting that my RAID5 (and lack of RAM) was causing problems. By this time I've given up explaining to them patiently that my RAID5 had nothing to do with the sluggishness, since the ACIE problem was fixed by a simple object. (Anyway, just by pure chance, or maybe it was the Will of the Gods, one of the drives in the RAID5 array died and I ended up in a RAID1 configuration, which I describe in another post.)

After doing all that, the ERP software is still uselessly slow most of the time. What do I do in times like these? Why, throw more hardware at the problem, of course.

(Image source: Imagine-Net Technology)

No comments: