After much anticipation, Microsoft finally released service pack 1 (SP1) for SharePoint 2010 earlier last week. Like so many other SharePoint professionals, I headed out to Microsoft’s site and pulled down everything I needed to patch-up my servers and virtual machines (VMs).
I’ve patched about four different server and VM environments with SP1 thus far, and although my experience with SP1 has been relatively positive, I did run into one particular snag that I thought I’d blog about. It was a weird one. I don’t claim to have good answers, but (at a minimum) I wanted to share my experience and observations.
What Happened?
My primary development environment is a virtual machine running Windows Server 2008 R2, SQL Server 2008 R2 Enterprise, SharePoint Server 2010, and the Office Web Apps. I don’t have any cumulative updates (CUs) or hotfixes installed, and everything is patched-up via Windows Update. Generally speaking, life is pretty happy in my little VM bubble. I sometimes have a hiccup or two due to the VM’s role as a domain controller (DC), but I can usually resolve those problems without too much trouble.
The relative peace and tranquility I normally experience was interrupted by none other than SP1. Like a bull in a china shop, SP1 announced that it had arrived on the scene with the dialog seen on the right. What struck me as weird is that all I had done up until the point when the dialog started appearing was the following:
- Install SharePoint Foundation 2010 SP1
- Reboot (as prompted by the SPF 2010 SP1 installer)
- Install SharePoint Server 2010 SP1.
- I hadn’t even run PSCONFIG or gone through the SharePoint 2010 Products Configuration Wizard, and suddenly I started getting unhandled exceptions. I still had the Office Web Apps SP1 package to install and wasn’t planning to run PSCONFIG until all the binaries had been laid down. My first reaction was “huh, that’s weird. I’ll just reboot.” That didn’t make the problem go away. Neither did running PSCONFIG and upgrading the farm. Shortly after rebooting, the dialog shown above would start appearing. If I’d clear it out by saying clicking “No,” I’d have another one waiting for me in about a minute.
My “Ostrich Maneuver” wasn’t cutting the mustard, so I had to dig in and figure out what I needed to do in order to get my environment back in order.
Reluctant Analysis
The Visual Studio Just-In-Time Debugger dialog was popping up about every minute, so I figured that it was as good a place to start as any. The only thing the dialog really told me was the following:
- the problem was with OWSTIMER.EXE (i.e., the SharePoint Timer Service)
- a System.ServiceModel.EndpointNotFoundException was being thrown
Although the dialog didn’t contain a substantial amount of new information, it did include a few tidbits I needed to continue my hunt for the culprit behind my constant timer service crashes:
- The exception was being generated by the metadata exchange (MEX) endpoint for a service that resided on the ResourceManagementService path. Since the connection was being actively refused, there was a decent chance that the service simply wasn’t running
- The service in question was apparently running (or expected to be running) on TCP port 5725
A web search for “SharePoint” and “5725” quickly brought me to the Plan security hardening (SharePoint Server 2010) page on TechNet. On that page, I found the following under the User Profile service hardening requirements section:
TCP port 5725 must be open on the server that runs the Forefront Identity Management agent and is set up to crawl a directory store.
The revelation that I was dealing with Forefront Identity Manager (FIM) and the SharePoint 2010 User Profile Service (UPS) immediately kicked-off a round of cursing and forehead slapping that took a couple of minutes to rein in.
User Profile Service Investigation
I’m sure that the ResourceManagementService path and port 5725 is obvious to many of you, but I’ll come clean and tell you that I try to stay as far away from the User Profile Service as I possibly can. Unless I know that I’m going to be doing something with the UPS and social networking, I don’t even bother with it in most of my SharePoint environments. Why? Because the UPS is just a colossal pain in the butt to configure and keep running. It is the bane of SharePoint administrators everywhere. Since I’m averse to pain and rarely do anything with it, I stay away from the UPS.
If the UPS was running and the service was actually started (which I confirmed through the Services on Server section of Central Administration), why was I seeing constant timer service crashes?
I decided to look a little further; specifically, I went in to look at the actual Forefront Identity Manager Service which backstops the whole User Profile Service. I popped-open Start –> Administrative Tools –> Services and navigated to the FIM service. The result (which is shown on the left) explained a lot.
Even though the service was set to auto-start, it wasn’t running. Without the service running, it would make sense that attempts to reach it or interact with it would fail.
I decided to manually start the service. Once I did so, the OWSTIMER.EXE crashes ceased. I thought I was in the clear … until I rebooted. Once I rebooted, the FIM service failed to start again and the OWSTIMER.EXE exceptions started happening again. Ugh.
Rewind
I spent about an hour or two poking and prodding, but I didn’t make much headway. Since I was more interested in getting my development environment usable than solving my UPS problems, I decided to cut bait and rewind.
Fans of the movie Aliens will appreciate my application of the “Ellen Ripley Strategy” as shown on the right:
I say we take off and nuke the entire site from orbit. It’s the only way to be sure.
As mentioned earlier, I seldom if ever use the User Profile Service. If it was causing me problems with the application of SP1, I decided that it had to go. At a minimum, I wanted to see if SP1 would go on properly without an instance of the User Profile Service Application running in my environment.
I then went through the process of laying down the various service packs, and even before running PSCONFIG I knew that things were going to be different. Applying the SharePoint 2010 Server Service Pack 1 binaries did not start a constant stream of OWSTIMER.EXE exceptions. I was able to lay down all of the service packs and run PSCONFIG without issue. Problem solved.
That’s Your Solution?
So, is deleting your UPS service application instance a solution to this problem? In most production environments, I’d wager that the answer is “probably not".” If I think about the situation from a production perspective, the answer may be to preserve the appropriate UPS databases, tear down the UPS service application instance for SP1 application, and then rebuild the UPS service application afterwards.
My primary objective with this post, though, was to simply provide some initial root-cause analysis and troubleshooting. I may be the only person seeing this; if so, I hope that what I’ve written was at least entertaining. If you happen to be seeing the same symptoms I saw, though, then I hope that what I’ve written saves you a little time while troubleshooting.
Spence has some guidance on things to change if you’re FIM on a DC, http://www.harbar.net/articles/sp2010ups2.aspx#ups8
tk
Thanks for the link, Todd! I hadn’t played around with any of the service startup parameters (such as delaying the start of the FIM services), but they sound like a great next step. If I find some free time, I may go back and play with the (now) old snapshot to see if those changes do the trick.
Good article Sean. And thank you for the reference Todd. We had the issues of FIM not working at all. And we also had a few errors with regard to Office Web Apps. Did you guys experience anything with regard to office web apps?
Thanks Tony. I haven’t really run into any problems with SP1 following its installation/configuration, but I haven’t had a whole lot of time to play around. I’ll keep my eyes open, though — particularly with the OWAs. If I see anything there, I’ll let you know!
Thanks Sean!
I also had this error on the intial install of SP1 on my dev env. After I installed the CU the error has gone away. I don’t know if the CU fixed it.
I appreciate the feedback, Bill. I haven’t played with the June CU, but I have a natural aversion to the 2010 CUs primarily due to the lack of extensive regression testing. If I can find the time, I might try playing with the CU on the snapshot I took prior to SP1 install. I’ve gone back once or twice to try some additional tricks since writing this post, but I have yet to find an approach that doesn’t (a) leave me with the timer job issues, or (b) result in an unusable user profile service.
Thanks again!
Hi I’m having exactly the same problem.
Prior to installing SP1 – everything was pretty much good. But soon after, I noticed that I was getting debug error messages from Visual Studio complaining about the Timer Services. Also whenever I deployed a Custom Solution Webpart, I had to manually go in to each server in the farm and run the “STSADM -o execadmsvcjobs” command to force the job to run and successfully install the solution.
unfortunately I didn’t to do a VM snapshot and therefore am rather stuck with SP1. Can anyone out there help?
Thanks
Davinder.
thanks
EROL
MVP
Davinder: I’m sorry, but I don’t have any particular suggestions for you. I’m hoping that perhaps someone else will chime in. Best of luck!
You’re welcome, Erol; thanks for reading!
This post was very helpful. We were having the same issues after SP1 installation on the DEV farm. The VS 2010 debugger errors were popping up and the farm upgrade was not completing. Noticed the Domain connection was not showing under UPS and UPS was stopped. Stated the UPS under Services on server and the VS 2010 debugger error stopped After I started the UPS service – the FIM also went from Disabled to Automatic Started.
Ran PSConfig.exe -cmd upgrade -inplace b2b -force -cmd applicationcontent -install -cmd installfeatures >> C:\temp\sp1_status3.txt
The upgrade completed successfully and All Issues were resolved.
I HATE the User Profile Service application – it is very high maintenance!!!!!
Thanks RC; glad that the post was helpful.
Thanks Sean,
Actually, I was thinking about the UPS, if it is the main causing of the issue? because TIMER Service is crashing for each time when the synchronize job is started!!
But after I read your post and follow the steps,the issue resolved :)
Thanks again for the good troubleshooting and explaining,
Thanks for the feedback; I’m glad things are working now :-)