After much anticipation, Microsoft finally released service pack 1 (SP1) for SharePoint 2010 earlier last week. Like so many other SharePoint professionals, I headed out to Microsoft’s site and pulled down everything I needed to patch-up my servers and virtual machines (VMs).
I’ve patched about four different server and VM environments with SP1 thus far, and although my experience with SP1 has been relatively positive, I did run into one particular snag that I thought I’d blog about. It was a weird one. I don’t claim to have good answers, but (at a minimum) I wanted to share my experience and observations.
What Happened?
My primary development environment is a virtual machine running Windows Server 2008 R2, SQL Server 2008 R2 Enterprise, SharePoint Server 2010, and the Office Web Apps. I don’t have any cumulative updates (CUs) or hotfixes installed, and everything is patched-up via Windows Update. Generally speaking, life is pretty happy in my little VM bubble. I sometimes have a hiccup or two due to the VM’s role as a domain controller (DC), but I can usually resolve those problems without too much trouble.
The relative peace and tranquility I normally experience was interrupted by none other than SP1. Like a bull in a china shop, SP1 announced that it had arrived on the scene with the dialog seen on the right. What struck me as weird is that all I had done up until the point when the dialog started appearing was the following:
- Install SharePoint Foundation 2010 SP1
- Reboot (as prompted by the SPF 2010 SP1 installer)
- Install SharePoint Server 2010 SP1.
- I hadn’t even run PSCONFIG or gone through the SharePoint 2010 Products Configuration Wizard, and suddenly I started getting unhandled exceptions. I still had the Office Web Apps SP1 package to install and wasn’t planning to run PSCONFIG until all the binaries had been laid down. My first reaction was “huh, that’s weird. I’ll just reboot.” That didn’t make the problem go away. Neither did running PSCONFIG and upgrading the farm. Shortly after rebooting, the dialog shown above would start appearing. If I’d clear it out by saying clicking “No,” I’d have another one waiting for me in about a minute.
My “Ostrich Maneuver” wasn’t cutting the mustard, so I had to dig in and figure out what I needed to do in order to get my environment back in order.
Reluctant Analysis
The Visual Studio Just-In-Time Debugger dialog was popping up about every minute, so I figured that it was as good a place to start as any. The only thing the dialog really told me was the following:
- the problem was with OWSTIMER.EXE (i.e., the SharePoint Timer Service)
- a System.ServiceModel.EndpointNotFoundException was being thrown
Since this was my development environment and Visual Studio 2010 was installed, I decided to attach the VS debugger to the SharePoint Timer Service process and see if I could learn anything more. The dialog on the left was the result.
Although the dialog didn’t contain a substantial amount of new information, it did include a few tidbits I needed to continue my hunt for the culprit behind my constant timer service crashes:
- The exception was being generated by the metadata exchange (MEX) endpoint for a service that resided on the ResourceManagementService path. Since the connection was being actively refused, there was a decent chance that the service simply wasn’t running
- The service in question was apparently running (or expected to be running) on TCP port 5725
A web search for “SharePoint” and “5725” quickly brought me to the Plan security hardening (SharePoint Server 2010) page on TechNet. On that page, I found the following under the User Profile service hardening requirements section:
TCP port 5725 must be open on the server that runs the Forefront Identity Management agent and is set up to crawl a directory store.
The revelation that I was dealing with Forefront Identity Manager (FIM) and the SharePoint 2010 User Profile Service (UPS) immediately kicked-off a round of cursing and forehead slapping that took a couple of minutes to rein in.
User Profile Service Investigation
I’m sure that the ResourceManagementService path and port 5725 is obvious to many of you, but I’ll come clean and tell you that I try to stay as far away from the User Profile Service as I possibly can. Unless I know that I’m going to be doing something with the UPS and social networking, I don’t even bother with it in most of my SharePoint environments. Why? Because the UPS is just a colossal pain in the butt to configure and keep running. It is the bane of SharePoint administrators everywhere. Since I’m averse to pain and rarely do anything with it, I stay away from the UPS.
I had trouble believing that I would have gone through the trouble of configuring the UPS in my development environment, so I decided to take a look in Central Administration. Much to my surprise, I had actually taken the time to jump through all of the hoops (as shown on the right) to get the service running, AD synchronization going, etc.
If the UPS was running and the service was actually started (which I confirmed through the Services on Server section of Central Administration), why was I seeing constant timer service crashes?
I decided to look a little further; specifically, I went in to look at the actual Forefront Identity Manager Service which backstops the whole User Profile Service. I popped-open Start –> Administrative Tools –> Services and navigated to the FIM service. The result (which is shown on the left) explained a lot.
Even though the service was set to auto-start, it wasn’t running. Without the service running, it would make sense that attempts to reach it or interact with it would fail.
I decided to manually start the service. Once I did so, the OWSTIMER.EXE crashes ceased. I thought I was in the clear … until I rebooted. Once I rebooted, the FIM service failed to start again and the OWSTIMER.EXE exceptions started happening again. Ugh.
Rewind
I spent about an hour or two poking and prodding, but I didn’t make much headway. Since I was more interested in getting my development environment usable than solving my UPS problems, I decided to cut bait and rewind.
Remember: this is a VM that we’re talking about. Since I try to practice what I preach (particularly from a backup and restore perspective), I had taken a VM snapshot prior to the start of SP1 work. I rolled-back to my pre-SP1 state and took a different approach to the service pack application.
Fans of the movie Aliens will appreciate my application of the “Ellen Ripley Strategy” as shown on the right:
I say we take off and nuke the entire site from orbit. It’s the only way to be sure.
As mentioned earlier, I seldom if ever use the User Profile Service. If it was causing me problems with the application of SP1, I decided that it had to go. At a minimum, I wanted to see if SP1 would go on properly without an instance of the User Profile Service Application running in my environment.
I deleted my UPS service application instance and decided to have a look at the FIM service again. What I saw (on the left) looked a bit different. Rather than being set to auto-start (“Automatic”), both the FIM Service and its associated synchronization service were set to “Disabled.”
I then went through the process of laying down the various service packs, and even before running PSCONFIG I knew that things were going to be different. Applying the SharePoint 2010 Server Service Pack 1 binaries did not start a constant stream of OWSTIMER.EXE exceptions. I was able to lay down all of the service packs and run PSCONFIG without issue. Problem solved.
That’s Your Solution?
So, is deleting your UPS service application instance a solution to this problem? In most production environments, I’d wager that the answer is “probably not".” If I think about the situation from a production perspective, the answer may be to preserve the appropriate UPS databases, tear down the UPS service application instance for SP1 application, and then rebuild the UPS service application afterwards.
My primary objective with this post, though, was to simply provide some initial root-cause analysis and troubleshooting. I may be the only person seeing this; if so, I hope that what I’ve written was at least entertaining. If you happen to be seeing the same symptoms I saw, though, then I hope that what I’ve written saves you a little time while troubleshooting.