This is going to be a fairly technical post I will warn the reader. However if you enjoy reading about me causing myself great pain and having to dig out of a hole you might want to skim it. I am writing this mostly to help others who might get stuck in the same situation I was and also to document it for myself.
On Tuesday November 10, 2009 I was doing some prep work for deploying Exchange 2010 at my work. As part of this I ran some reports and exports looking for mailboxes (and AD accounts) that I could delete rather than dragging along as dead weight to the new server. I found that there were roughly 120 users who have left and we no longer really needed their mailboxes. I also found that those mailboxes accounted for over 40 GB of space which is 15-20% of entire mail data we had. I talked to my boss around noon to double check which accounts could be safely removed.
At about 2:30 PM I was working in the Exchange Management Console to remove these mailboxes. I (thought) I had sorted the list based on the Organizational Unit. I selected what I believed to be only the mailboxes in the Possible Delete Emp OU and then selected remove mailboxes (and AD accounts). The number removed seemed higher than I expected but I didn’t really think anything of it. Then I started having prompts asking for my user name over and over. This was then rapidly followed by several faculty members saying they couldn’t log in. It was at this point I checked the OU for the faculty.
IT WAS EMPTY! YES TOTALLY EMPTY! (This is bad in case you are wondering.) Oh and my account and my boss’s account was also gone. To really top things off so were all the accounts for our admin council (the bosses).
At this point I will stop and get really technical for a bit. I want to describe our infrastructure so that the next steps make the most sense to those who are reading this looking for pointers.
Our Infrastructure:
One Forest / One Domain (as simple as it gets)
Windows Server 2008 R2 Active Directory servers running at a 2008R2 Forest / Domain functional level – 2 Virtual and 1 Physical DC
Exchanger Server 2007 SP2 running on Windows Server 2003 SP2 – Physical
Things that were in place that are good:
Daily full backups of Exchange Database
Deleted Mailbox retention of 30 days on all mailbox databases
Things that would have been great to have in place:
Active Directory Recycle Bin – This is a feature that is new in 2008 R2 but for some unknown reason is not enabled by default (I am sure there is a security / replication reason)
Active Directory Backup – Although doing an authoritative restore is not fun at all
Once I realized what I had done I first briefed my boss so she could brief our head of school (especially since remember I deleted that mailbox too). Then I started trying to figure out how to fix what I had done. Luckily my department helped fend off the entire school while I tried to fix the issue.
I first started by looking at the Disconnected Mailbox container in the Exchange Management Console. Many but not all the deleted mailboxes were listed there. The fact that they weren’t all listed really confused me, however knowing that I needed the Active Directory accounts first I decided to come back to this later.
I haven’t had to restore a deleted AD object since the Windows 2000 days so I had to do some quick research to determine the best course of action. I wanted to determine if I had turned on the AD Recycle Bin since I wasn’t sure if I had or not. After looking into the issue it appeared it wasn’t active. I could however verify by using Get-ADObject -SearchBase "CN=Deleted Objects,DC=domain,DC=com" -ldapfilter "(objectclass=User)" -includeDeletedObjects | Format-List Name,ObjectClass,ObjectGUID | Export-CSV Deleted.csv that the objects still existed in tomb stoned form. I found the ADRestore (http://technet.microsoft.com/en-us/sysinternals/bb963906.aspx) utility and the associated KB article. I wasn’t sure if that was the best route to go so I decided to place a call to Microsoft and initiate a PSS case. I did this mostly because time was very critical, I had to get this fixed by the next morning at the latest.
After digging around for a while to find a phone number on http://support.microsoft.com (which was not easy) I made the first call at 3:40 PM. I went through the process of paying for an incident, online I could only buy a 5 pack. During the case creation I made the case severity A since I was in an outage situation and was willing to work 24 hours a day to get it fixed. At the end of the call I was told the Active Directory Recovery team had a 2 hour call back policy. At this point the first waiting game began.
At about 5:40 PM I got the call from MS PSS. The support representative said he had tried to contact me earlier but I am not sure how since I didn’t have a missed call on cell phone, office phone or an email to home email. (I found out later he called my home phone number somehow, no idea where that number came from.) He may have tried to email my work address but as mentioned above it had been deleted so I wouldn’t have gotten that and I told the call routing agent to not use that one. I started by explaining to Aman what the situation was. We started by checking to see if the AD Recycle Bin was active, it wasn’t unfortunately. We then went through using ldp to verify the accounts were available, which I had already done but he wanted to check. We then manually recovered my account using ldp which is a pretty tedious process.
A quick side note here to explain why I was trying to restore these accounts rather than just recreate them. An active directory account is at its core a Global Unique Identifier or GUID. The GUID is what is used to reference the account in permissions primarily and many other aspects of windows networks. If I were to recreate the accounts essentially every file on the network and every file on each user’s computer would have to be modified to reflect the new account. If the account is restored however this is not necessary, this is why it was so important to restore the accounts. Back to the primary story.
The support rep and I had discussed using ADRestore during the call and it was at this point that is was determined that ADRestore was the best way to go. (One of my heroes Mark Russinovich to the rescue again.) I then used ADRestore to go through the 300+ deleted objects and recover the 120 or so that I needed to get back. I got off the phone at about 7:05 PM with this support rep.
Once the accounts were restored the world was not prefect yet. When recovered from a tombstone the account is disabled, has no password, has no details and loses all group membership. The first two I could fix fairly easily since we assign passwords to users via a script and I could just rerun that script. The third was not a big deal since not much info is stored in AD beyond the needed info. The last part of group membership was a bit more problematic. My script assigns people to a primary group but not all the extra groups that people are a part of. My boss went to work on fixing the groups once I had re-activated all the restored accounts.
At this point it was about 7:30 PM. We had all the Active Directory accounts mostly back in order and now it was time to tackle the deleted mailbox recovery portion. As I mentioned above this should have been fairly straightforward thanks to deleted mailbox retention. The problem however was that not all the mailboxes were showing up and I wasn’t sure if I was going to have to do some full database restores to recover or not. I decided to initiate a second Microsoft Support Incident with the Exchange group to try and figure out how to proceed. I hadn’t had to recover accidentally deleted mailboxes in quite a while either.
At about 8:05 PM I made a second call to the support phone number. This call routing agent was not nearly as helpful. Somehow the agent got Exchange Server 2007 and Outlook 2007 confused, they are related but not nearly the same thing. One is the server and the other is the client. After waiting on hold and getting connected to Outlook support and then getting bounced back to a second call routing agent who then bounced me to a third call routing agent I finally got a case created with the Exchange group at about 8:25 PM. I again created a case with severity A since I was in an outage situation and was willing to work 24 hours.
At about 8:35 PM the Exchange support rep called me. I explained the situation to him as well. We looked at the list to try and determine why some mailboxes were listed and some weren’t. We also verified the retention policy was correctly in place. There didn’t seem to be an obvious reason why some were missing. The support rep then had me run Clean-MailboxDatabase on each of the databases with deleted mailboxes. This made it so all the deleted mailboxes showed up. Once they were listed it was a simple (if time consuming) matter of reconnected each mailbox one by one. I ended the call with this rep at about 8:45 PM.
Another side note to elaborate on Clean-MailboxDatabase (http://technet.microsoft.com/en-us/library/bb124076.aspx). It is apparently like a tiny subset of ESEUtil to specifically look for deleted mailboxes and cleanup the table listing them. If I hadn’t been so freaked and pressed for time I probably could have found this myself, but oh well.
I left school at about 9:30 PM to head home and finish the last steps. From home I went through the process of manually reconnecting all the deleted mailboxes. I finally finished this about 11:30 PM and went to bed close to midnight. I headed in early the next morning to be prepared for any further fall-out.
The biggest problems to come out of the whole mess were:
- Loss of user access from 2:30 PM until the next morning for all faculty and admin council.
- Loss of inbound email from 2:30 PM until at the latest 11:30 PM.
- Loss of email due to incorrect group membership. This was 90-95% correct with 24 hours but the last 5-10% took as much as a month for people to notice that they weren’t on the correct lists.
The one good thing to come out of this was cleaning up some of the above mentioned lists. Also somewhat verifying recovery procedures is cool too.
Hopefully those of you that have read this far have learned something and maybe won’t make the same mistakes I did.