Two important events happened to me last week. Due to recent acquisitions I have been doing a large number of pst imports. The second is that we set up a new vmware cluster. We’ve been running low on system resources and the pst import has been going slow so my vmware admin advised to migrate the mailbox server to the new cluster. After moving the vm to the new cluster we increased the RAM and processor cores. Shortly thereafter I started the pst imports.
I started getting calls on the next business day that users were getting database disconnection messages in OWA. Sure enough on the server the 32GB of RAM and the 4 CPU’s were completely maxed out.
Using Process Explorer from Sysinternals we could clearly see that there was a private process memory leak on the Exchange database. A Private Process Memory Leak is what happens when a process tries to use all available system memory and we could see this happening with the store.exe process where even if the most minor process stopped, store.exe process would grow by a few additional KB.
At first blush this was a mailbox database corruption which would be a manageable problem however the system was completely unusable. We tried booting the server in to safe mode and even though that brought the CPU and Memory down you could not RDP to the server and interacting with the server at the console level was painstaking requiring several seconds inbetween mouse clicks. This made us think the problem might be with VMWare but more importantly we needed the email server back on line.
One other complication was that even though we were getting sucessful backup messages from our backup system the logs were not being truncated. There were 175,000 logs files! Yikes.
Our first thought was to copy out the database to a new server and perform an Exchange recovery and then repair the database. To get what files we needed to copy off we used ESEUTIL:
c:\windows\system32> eseutil /mh "D:\Exchange SErver\V14\Mailbox\[DatabaseFolder]\[DatabaseFile].edb
This command provides some really useful output, most importantly the the list of log files needed to perform a database repair.
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating FILE DUMP mode...
Database: D:\Exchange Server\V14\Mailbox\MSX06MBXDB01\MSX06MBXDB01.edb
Expected Checksum: 0x30bc20b7
Actual Checksum: 0x30bc20b7
File Type: Database
Format ulMagic: 0x89abcdef
Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,17
Engine ulVersion: 0x620,17
Created ulVersion: 0x620,17
DB Signature: Create time:08/15/2012 14:39:14 Rand:14004066 Computer:
dbtime: 799434390 (0x2fa66696)
State: Dirty Shutdown
Log Required: 587149-587170 (0x8f58d-0x8f5a2)
Log Committed: 0-587170 (0x0-0x8f5a2)
Log Recovering: 0 (0x0)
GenMax Creation: 09/19/2012 09:08:06
Last Objid: 730388
Scrub Dbtime: 0 (0x0)
Scrub Date: 00/00/1900 00:00:00
Repair Count: 0
Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
Last Consistent: (0x8EE72,A,54) 09/18/2012 16:08:25
Last Attach: (0x8EE73,9,86) 09/18/2012 16:08:25
Last Detach: (0x0,0,0) 00/00/1900 00:00:00
Log Signature: Create time:08/15/2012 14:39:13 Rand:14018268 Computer:
OS Version: (6.1.7601 SP 1 NLS ffffffff.ffffffff)
So based on the log required field I was able to shorten the list of log files I needed to 1,300 files. Sadly I couldn’t copy them off again because of the performance issues. So we thought to dettach the storage volumes from the existing volumes and attach to the new server. Again we had a problem. In the VMWare console, even with the vm turned off, the Remove button when highlighting the drives of the vm were greyed out. At this point we had something to go to VMWare support with and say that there was a problem with VMWare.
To quote the VMWare support engineer, “That’s really wierd”, the solution was to remove the vm from inventory and then remove the host from inventory and rejoin both.
Now we had the storage volumes attached to the new server which put us on the road to getting operational again. The steps we followed were:
- Install Windows Server with the same Server Name and same OS configuration. (Disconnect “source” machine from the network.)
- Install all the prerequisites for exchange 2010 SP2 http://technet.microsoft.com/en-us/library/bb691354.aspx
- Install Exchange Server with /recoverserver switch having Exchange 2010 SP2 binaries.http://technet.microsoft.com/en-us/library/dd876880.aspx
- Copy the old database to the temp location. Same to the drive which will contain the database in future. (In our case we attached the drives.)
- Run soft recovery and bring the database in clean shutdown state.
- Before running the soft recovery just to be sure we had all the log files we ran the command to verify the log files:
C:\Windows\system32>eseutil /ml F:\Recovery\E01
- Run soft recovery command:
C:\Windows\system32>eseutil /r E01 /D "[DriveLetter]:\[PathToMailboxDBFolder]\[DatabaseFile].edb" /L "[DriveLetter]:\[LogFolder]"
- *** It’s also important to note when recovering an Exchange database you need the very first log file, commonly named E01.log and there should be no .jst files in the directory.
- Mount the database and monitor performance with the EXMon.exe tool http://technet.microsoft.com/en-us/library/bb508855(v=exchg.65).aspx
- Of course there were some mailboxes that were trying to use up 100% of the processor like we expected so we ran the on-line mailbox repair on the database using powershell:
[PS] C:\Windows\system32>New-MailboxRepairRequest -database MSX06MBXDB01
After running mailbox repair (which took several hours) the server was back to normal and everybody had access to their mailboxes again.
So there you have 3 days of work in a bite-sized package.