Upcoming Activities (March 2010)

In this post, I discuss some of my activities for the next couple of months. These include the INTERalliance TechOlympics, SharePoint Saturday Michigan, and continuing efforts to get the SharePoint 2010 Disaster Recovery Guide ready for product launch.

2010 is in full-swing, and there seems to be no shortage of activities for me to jump into!  If anything, I need more free time to take on some of the stuff I really want to sink my teeth into (such as a SharePoint 2010 CodePlex project I want to have ready for RTM).  Until I have something more tangible in hand, though, I’ll avoid talking about that topic any further.

Here are some of the things occupying my free time in the short-to-mid term:

TechOlympics Expo 2010

The TechOlympics Expo is the type of event every adult geek wishes they had when they were in high school – a weekend lock-in featuring technical competitions, cool toys, games of every imaginable sort, and pretty much everything else that would get a teenage gearhead jazzed-up.  The underlying goal of the event is to get high school kids interested in technology, careers in technology, and technical opportunities in the Cincinnati area.

The event (on March 5-7) is being put on by the INTERalliance of Greater Cincinnati, and my involvement in the event is kind of a curious thing.  My primary client of the past 2+ years is a big backer of (and heavily invested in) the INTERalliance, so naturally they kick-in help whenever events come up.  I helped the INTERalliance through a last-minute (and somewhat ugly) technical hurdle involving SMS voting for their PharaohFest event last October, and I suspect that played a part in my being asked to help out with the TechOlympics.

With the TechOlympics, I’m part of a team that’s working to make all the “technical stuff” (behind-the-scenes and otherwise) happen.  My responsibilities seem to shift a bit each day, but the bulk of what I’ve been working on is coordinating network logistics and services, translating “the vision” into technical infrastructure, providing some guidance on applications being written to support the event, and generally doing my best at “collision avoidance” to ensure that we don’t miss anything important for the event.

I’m confident that the event is going to be incredible, and it’s been a lot of fun doing the planning thus far.  Seeing everything come together is going to be neat – both for me and for everyone else who has been laboring to make the magic happen!

SharePoint Saturday Michigan

What would an “Upcoming Activities” post be without a SharePoint Saturday announcement!  The next one I’ll be attending is SharePoint Saturday Michigan in Ann Arbor on March 13th.  I’ll be presenting “Saving SharePoint,” the disaster recovery talk that John Ferringer and I have been delivering at various SharePoint Saturday events around the region.  I’ll be flying solo this time around, though, as John has some other things going on that weekend.

SharePoint Saturday Michigan As always, SharePoint Saturday events are free and open to the public.  If you have any interest in learning more about SharePoint, getting some free training, or simply networking and meeting other professionals in the SharePoint space, please sign up!

SharePoint 2010 Disaster Recovery Guide

This announcement is last, but it’s definitely not least.  Some of you are aware, but for those who aren’t: John and I have been working on the SharePoint 2010 Disaster Recovery Guide for a while now.  I’m not going to lie – it’s slow going.  Personally, I’m a very slow writer, and the process itself is exceptionally labor-intensive.  Nevertheless, we’re making progress – one page at a time.

Our goal (and Cengage’s goal for us) is to have the book ready for SharePoint 2010 RTM.  I haven’t seen or heard anything official from Microsoft, but rumor has it that SharePoint 2010 will probably be out sometime in June.  If that’s the case, then John and I are on-track.

If you have suggestions for us, particularly if you read the first book, we would love to hear them.  We’re incorporating a few that we already received (for example, a chapter that covers some real world use-cases), but our ears are open and listening.  We know that DR isn’t a topic that gets everyone overly hot and bothered (unless they’ve lost everything at some point, of course), but our goal is to make the book as useful as possible.  We’d love your help!

Additional Reading and References

  1. Site: CodePlex
  2. Event: TechOlympics Expo 2010
  3. Organization: The INTERalliance of Greater Cincinnati
  4. Event: PharaohFest
  5. Event: SharePoint Saturday Michigan
  6. Partner In Crime: John Ferringer on Twitter
  7. Book: SharePoint 2010 Disaster Recovery Guide

DPM, RPC, and DCOM with Forefront TMG 2010

In this post, I discuss a couple of DCOM/RPC snags I ran into while configuring Microsoft’s Data Protection Manager (DPM) 2007 client protection agent to run on my new Forefront Threat Management Gateway (TMG) 2010 server. I also walk through the troubleshooting approach I took to resolve the issues that appeared.

In a recent post, I was discussing my impending move to Microsoft’s Forefront Threat Management Gateway (TMG) 2010 on my home network.  As part of the move, I was going to decommission two Microsoft Internet Security and Acceleration (ISA) 2006 servers and an old Windows Server 2008 remote access services (RAS) box and replace them with a single TMG 2010 server – a big savings in terms of server maintenance and power consumption.

I completed the move about a week ago, and I’ve been very happy with TMG thus far.  TMG’s ISP redundancy and load balancing features have been fantastic, and they’ve allowed me to use my Internet connections much more effectively.

As a user of ISA since its original 2000 release, I also had no problem jumping in and working with the TMG management GUI.  It was all very familiar from the get-go.  Call me “very satisfied” thus far.

This afternoon, I took a few moments to un-join the old ISA servers from the domain, power them down, and clean things up.  I had also planned to take a little time integrating the new TMG box into my Data Protection Manager (DPM) 2007 backup rotation.  Unfortunately, though, the DPM integration took a bit longer than expected …

Backup Brigade

For those unfamiliar with the operation of DPM, I’ll take a couple of moments to explain a bit about how it works.  In order for DPM to do its thing, any computer that is going to be protected must have the DPM 2007 Protection Agent installed on it.  Once the DPM Protection Agent is installed and configured, the DPM server communicates through the agent (which operates as a Windows service) to leverage the protected system’s Volume Shadow Copy Service (VSS) for backups.

Installing the DPM agent typically isn’t a big challenge for common client computers, and it can be accomplished directly from within the DPM management GUI itself.  When the agent is installed through the GUI, DPM connects to the computer to be protected, installs the agent, and configures it to point back to the DPM server.  No manual intervention is required.

On some systems, though, it’s simply easier to install and configure the agent directly from the to-be-protected system itself.  A locked-down server (like a TMG box) falls into this category, so I manually put the agent installation package on the TMG server, ran it, and carried out the follow-up PowerShell Attach-ProductionServer (from the DPM Management Shell) on the DPM server.  The install proceeded without issue, and the associated attach (on the DPM server) went off without a hitch.  I thought I was good to go.

I fired up the management GUI on the DPM Server, went into the Agents tab under Management, and discovered that I couldn’t connect to the TMG server.

DPM Agent Issue

The Checklist

The fact that I couldn’t connect to the TMG server (SS-TMG1) from my DPM box was a bit of an eyebrow lifter, but it wasn’t entirely unexpected.  Communication between a DPM server and the DPM agent leverages DCOM, and I’d had to jump through a few hoops to ensure that the DPM server could communicate with the ISA boxes previously.

I suspected that an RPC/DCOM issue was in play, but I was having trouble seeing where the problem might be. So, I reviewed where I was at.

  • Without an exception, Windows Firewall will block communication between a DPM server and its agents.  I confirmed that Windows Firewall wasn’t in play and that TMG itself was handling all of the firewall action.
  • Examining TMG, I confirmed that I had a rule in place that permitted all traffic between my DPM server (SS-TOOLS1) and the TMG box itself.
    DPM-TMG Access Rule 
  • DPM-TMG Access Rule (with RPC)Strict RPC compliance is another potential problem with DPM on both ISA and TMG, as requiring strict compliance blocks any DCOM traffic.  DCOM (and any other traffic that doesn’t explicitly begin RPC exchanges by communicating with the RPC endpoint mapper on the target server) gets dropped by the RPC Filter unless the checkbox for Enforce strict RPC compliance is unchecked.  I confirmed that my rule wasn’t requiring strict compliance (as shown on the right).
  • I made sure that my DPM server wasn’t listed as a member of either the Enterprise Remote Management Computers or Remote Management Computers Computer Sets in TMG.  These two Computer Sets are specially impacted by a couple of TMG System Policy rules that can impact their ability to call into TMG via RPC and DCOM.
  • I reviewed all System Policy rules that might impact inbound RPC calls to the TMG server, and I couldn’t find any that would (or should) be influencing DPM’s ability to connect to its agent.

I also went out to the Forefront TMG Product Team’s Blog to see what advice they might have to offer, and I found this excellent article on RPC and TMG – well worth a read if you’re trying to troubleshoot RPC problems.  Unfortunately, it didn’t offer me any tips that would help in my situation.

Watching The Traffic

I may have simply had tunnel vision, but I was still obsessed with the notion that strict RPC checking was causing my problems.  To see if it was, I decided to fire-up TMG’s live logging and see what was happening when DPM tried to connect to its agent.  I set the logging to display only traffic that was originated from the DPM box, and this is what I saw.

Watching Traffic from DPM

There was nothing wrong that I could see.  My access rule was clearly being utilized (the one that doesn’t enforce strict RPC checking), and I wasn’t seeing any errors – just connection initiations and closes.  Traffic from DPM to TMG looked clean.

Taking A Step Back

I was frustrated, and I clearly needed to consider the possibility that I didn’t have a good read on the problem.  So, I went to the Windows Application event log to see if it might provide some insight.  I probably should have started with the event logs instead of TMG itself and firewall rules … but better late than never, I figured.

DPM Agent Can't CommunicatePopping open Event Viewer, I was greeted with the image you see on the left.  What I saw was enlightening, for I did have a problem with communication between the DPM agent and the DPM server.  The part that intrigued me, though, was the fact that the problem was with outbound communication (that is, from TMG server to DPM server) – not the other way around as I had originally suspected.  All of my focus had been on troubleshooting traffic coming into TMG because I’d been interpreting the errors I’d seen to mean that the DPM server couldn’t reach the agent – not that the agent couldn’t “phone home,” so to speak.

I knew for a fact that the DPM Server, SS-TOOLS1, didn’t have the Windows Firewall service running.  Since the service wasn’t running, there was no way that the agent’s attempts to communicate with DPM could (or rather, should) be getting blocked at the destination.  That left the finger of blame pointing at TMG.

On The Way Out

I decided to repeat the traffic watching exercise I’d conducted earlier, but instead of watching traffic coming into the TMG box from my DPM server, I elected to watch traffic going the other direction – from TMG to DPM.  Here’s what I saw:

Watching Traffic to DPM

The “a-ha” moment for me came when I saw the firewall rule that was actually governing RPC traffic to the DPM box from TMG.  It wasn’t the DPM All <=> SS-TMG1 rule I’d established — it was a system policy rule called [System] Allow RPC from Forefront TMG to trusted servers.

System policy rules are normally hidden in the firewall policy tab, so I had to explicitly show them to review them.  Once I did, there it was – rule 22.

System Policy Rule 22

Note that this rule applies to all traffic from the TMG server to the Internal network; I’ll be talking about that more in a bit.

System Policy EditorI couldn’t edit the rule in-place; I needed to use the System Policy editor.  So, I fired up the System Policy Editor and traced the rule back to its associated configuration group.  As it turned out, the rule was tied to the Active Directory configuration group under Authentication Services.

As the picture on the left clearly shows, the Enforce strict RPC compliance checkbox was checked.  Once I unchecked it and applied the configuration change, the DPM agent began communicating with the DPM server without issue.  Problem solved.

What Happened?

I was fairly sure that I hadn’t experienced this sort of trouble installing the DPM Protection Agent under ISA Server 2006, so I tried to figure out what might have happened.

I hadn’t recalled having to adjust the target system policy under ISA when installing the DPM agent originally, but a quick boot and check of my old ISA server revealed that the checkbox was indeed unchecked (meaning that strict RPC compliance wasn’t being enforced).  I’d apparently made the change at some point and forgotten about it.  I suspect I’d messed with it at some point in the distant past while working on passing AD information through ISA, getting VPN functionality up-and-running, or perhaps something else.

Implications

Bottom line: TMG enforces strict compliance for RPC traffic that originates on the TMG server (Local Host) and is destined for the Internal network.  Since System Policy Rules are applied before administrator-defined Firewall Policy Rules, RPC traffic from the TMG server to the Internal network will always be governed by the system policy unless that policy is disabled.

In this particular scenario, the DPM 2007 Protection Agent’s operation was impacted.  Even though I’d created a rule that I thought would govern interactions between DPM and TMG, the reality is that it only governed RPC traffic coming into TMG – not traffic going out.

In reality, any service or application that sends DCOM traffic originating on the TMG server to the Internal network is going to be affected by the Allow RPC from Forefront TMG to trusted servers rule unless the associated system policy is adjusted.

Conclusion

The core findings of this post have been documented by others (in a variety of forms/scenarios) for ISA, but this go-round with TMG and the DPM association caught me off-guard such that I thought it would be worth sharing my experience with other firewall administrators.  If anyone else moving to TMG takes the “build it from the ground up” approach that I did, then the system policy I’ve been discussing may get missed.  Hopefully this post will serve as a good lesson (or reminder for veteran firewall administrators).

UPDATE (10/26/2010)

Thomas K. H. Bittner (former MVP from Germany who runs the Windows Server System Reference Architecture blog) contacted me and shared a blog post he wrote on configuring DPM 2010 and TMG communication; the post can be found here: http://msmvps.com/blogs/wssra/archive/2010/10/20/configure-the-forefront-tmg-2010-to-allow-dpm-2010-communication.aspx.  Thomas’ post is fantastic in that it is extremely granular in its configuration of communication channels between TMG and DPM.  If you would prefer to lock things down more securely than I demonstrate, then I highly recommend checking out the post that Thomas wrote.

Additional Reading and References

  1. Recent Post: Portrait of a Basement Datacenter
  2. Microsoft: Forefront Threat Management Gateway 2010
  3. Microsoft: Internet Security and Acceleration Server 2006
  4. Microsoft: System Center Data Protection Manager
  5. Forefront TMG Product Team Blog: RPC Filter and “Enable strict RPC compliance”

Portrait of a Basement Datacenter

In this post, I take a small detour from SharePoint to talk about my home network, how it has helped me to grow my skill set, and where I see it going.

Whenever I’m speaking to other technology professionals about what I do for a living, there’s always a decent chance that the topic of my home network will come up.  This seems to be particularly true when talking with up-and-coming technologists, as I’m commonly asked by them how I managed to get from “Point A” (having transitioned into IT from my previous life as a polymer chemist) to “Point B” (consulting as a SharePoint architect).

I thought it would be fun (and perhaps informative) to share some information, pictures, and other geek tidbits on the thing that seems to consume so much of my “free time.”  This post also allows me to make good on the promise I made to a few people to finally put something online for them to see.

Wait … “Basement Datacenter?”

For those on Twitter who may have seen my occasional use of the hashtag #BasementDatacenter: I can’t claim to have originated the term, though I fully embrace it these days.  The first time I heard the term was when I was having one of the aforementioned “home network” conversations with a friend of mine, Jason Ditzel.  Jason is a Principal Consultant with Microsoft, and we were working together on a SharePoint project for a client a couple of years back.  He was describing his love for his recently acquired Windows Home Server (WHS) and how I should have a look at the product.  I described why WHS probably wouldn’t fit into my network, and that led Jason to comment that Microsoft would have to start selling “Basement Datacenter Editions” of its products.  The term stuck.

So, What Does It Look Like?

Basement Datacenter - Legend Basement Datacenter - Front Shot Two pictures appear on the right.  The left-most shot is a picture of my server shelves from the front.  Each of the computing-related items in the picture is labeled in the right-most shot.  There are obviously other things in the pictures, but I tried to call out the items that might be of some interest or importance to my fellow geeks.

Behind The Servers Generally speaking, things look relatively tidy from the front.  Of course, I can’t claim to have the same degree of organization in the back.  The shot on the left displays how things look behind and to the right of the shots that were taken above.  All of the power, network, and KVM cabling runs are in the back … and it’s messy.  I originally had things nicely organized with cables of the proper length, zip ties, and other aids.  Unfortunately, servers and equipment shift around enough that the organization system wasn’t sustainable.

While doing the network planning and subsequent setup, I’m happy that I at least had the foresight to leave myself ample room to move around behind the shelves.  If I hadn’t, my life would be considerably more difficult.

On the topic of shelves: if you ever find yourself in need of extremely heavy duty, durable industrial shelves, I highly recommend this set of shelves from Gorilla Rack.  They’re pretty darn heavy, but they’ll accept just about any amount of weight you want to put on them.

I had to include the shot below to give you a sense of the “ambiance.”

Under The Cover Of Colorful Lighting

Anyone who’s been to my basement (which I lovingly refer to as “the bunker”) knows that I have a thing for dim but colorful lighting.  I normally illuminate my basement area with Christmas lights, colored light bulbs, etc.  Frankly, things in the basement are entirely too ugly (and dusty) to be viewed under normal lighting.  It may be tough to see from this shot, but the servers themselves contribute some light of their own.

Why On Earth Do You Have So Many Servers?

After seeing my arrangement, the most common question I get is “why?”  It’s actually an easy one to answer, but to do so requires rewinding a bit.

Many years ago, when I was a “young and hungry” developer, I was trying to build a skill set that would allow me to work in the enterprise – or at least on something bigger than a single desktop.  Networking was relatively new to me, as was the notion of servers and server-side computing.  The web had only been visual for a while (anyone remember text-based surfing?  Quite a different experience …), HTML 3 was the rage, Microsoft was trying to get traction with ASP, ActiveX was the cool thing to talk about (or so we thought), etc.

It was around that time that I set up my first Windows NT4 server.  I did so on the only hardware I had leftover from my first Pentium purchase – a humble 486 desktop.  I eventually got the server running, and I remember it being quite a challenge.  Remember: Google and “answers at your fingertips” weren’t available a decade or more ago.  Servers and networking also weren’t as forgiving and self-correcting as they are nowadays.  I learned a awful lot while troubleshooting and working on that server.

Before long, though, I wanted to learn more than was possible on a single box.  I wanted to learn about Windows domains, I wanted to figure out how proxies and firewalls worked (anyone remember Proxy Server 2.0?), and I wanted to start hosting online Unreal Tournament and Half Life games for my friends.  With everything new I learned, I seemed to pick up some additional hardware.

When I moved out of my old apartment and into the house that my wife and I now have, I was given the bulk of the basement for my “stuff.”  My network came with me during the move, and shortly after moving in I re-architected it.  The arrangement changed, and of course I ended up adding more equipment.

Fast-forward to now.  At this point in time, I actually have more equipment than I want.  When I was younger and single, maintaining my network was a lot of fun.  Now that I have a wife, kids, and a great deal more responsibility both in and out of work, I’ve been trying to re-engineer things to improve reliability, reduce size, and keep maintenance costs (both time and money) down.

I can’t complain too loudly, though.  Without all of this equipment, I wouldn’t be where I’m at professionally.  Reading about Windows Server, networking, SharePoint, SQL Server, firewalls, etc., has been important for me, but what I’ve gained from reading pales in comparison to what I’ve learned by *doing*.

How Is It All Setup?

I actually have documentation for most of what you see (ask my Cardinal SharePoint team), but I’m not going to share that here.  I will, however, mention a handful of bullets that give you an idea of what’s running and how it’s configured.

  • I’m running a Windows 2008 domain (recently upgraded from Windows 2003)
  • With only a couple of exceptions, all the computers in the house are domain members
  • I have redundant ISP connections (DSL and BPL) with static IP addresses so I can do things like my own DNS resolution
  • My primary internal network is gigabit Ethernet; I also have two 802.11g access points
  • All my equipment is UPS protected because I used to lose a lot of equipment to power irregularities and brown-outs.
  • I believe in redundancy.  Everything is backed-up with Microsoft Data Protection Manager, and in some cases I even have redundant backups (e.g., with SharePoint data).

There’s certainly a lot more I could cover, but I don’t want to turn this post into more of a document than I’ve already made it.

Fun And Random Facts

Some of these are configuration related, some are just tidbits I feel like sharing.  All are probably fleeting, as my configuration and setup are constantly in flux:

Beefiest Server: My SQL Server, a Dell T410 with quad-core Xeon and about 4TB worth of drives (in a couple of RAID configurations)

Wimpiest Server: I’ve got some straggling Pentium 3, 1.13GHz, 512MB RAM systems.  I’m working hard to phase them out as they’re of little use beyond basic functions these days.

Preferred Vendor: Dell.  I’ve heard plenty of stories from folks who don’t like Dell, but quite honestly, I’ve had very good luck with them over the years.  About half of my boxes are Dell, and that’s probably where I’ll continue to shop.

Uptime During Power Failure: With my oversize UPS units, I’m actually good for about an hour’s worth of uptime across my whole network during a power failure.  Of course, I have to start shutting down well before that (to ensure graceful power-off).

Most Common Hardware Failure: Without a doubt, I lose power supplies far more often than any other component.  I think that’s due in part to the age of my machines, the fact that I haven’t always bought the best equipment, and a couple of other factors.  When a machine goes down these days, the first thing I test and/or swap out is a power supply.  I keep at least a couple spares on-hand at all times.

Backup Storage: I have a ridiculous amount of drive space allocated to backups.  My DPM box alone has 5TB worth of dedicated backup storage, and many of my other boxes have additional internal drives that are used as local backup targets.

Server Paraphernalia: Okay, so you may have noticed all the “junk” on top of the servers.  Trinkets tend to accumulate there.  I’ve got a set of Matrix characters (Mr. Smith and Neo), a PIP boy (of Fallout fame), Cheshire Cat and Alice (from American McGee’s Alice game), a Warhammer mech (one of the Battletech originals), a “cat in the bag” (don’t ask), a multimeter, and other assorted stuff.

Cost Of Operation: I couldn’t begin to tell you, though my electric bill is ridiculous (last month’s was about $400).  Honestly, I don’t want to try to calculate it for fear of the result inducing some severe depression.

Where Is It All Going?

As I mentioned, I’m actively looking for ways to get my time and financial costs down.  I simply don’t have the same sort of time I used to have.

Given rising storage capacities and processor capabilities, it probably comes as no surprise to hear me say that I’ve started turning towards virtualization.  I have two servers that act as dedicated Hyper-V hosts, and I fully expect the trend to continue.

Here are a few additional plans I have for the not-so-distant future:

  • I just purchased a Dell T110 that I’ll be configuring as a Microsoft Forefront Threat Management Gateway 2010 (TMG) server.  I currently have two Internet Security and Acceleration Server 2006 servers (one for each of my ISP connections) and a third Windows Server 2008 for SSL VPN connectivity.  I can get rid of all three boxes with the feature set supplied by one TMG server.  I can also dump some static routing rules and confusing firewall configuration in the process.  That’s hard to beat.
  • I’m going to see about virtualizing my two domain controllers (DCs) over the course of the year.  Even though the machines are backed-up, the hardware is near the end of its usable life.  Something is eventually going to fail that I can’t replace.  By virtualizing the DCs, I gain a lot of flexibility (I can move them around on physical hardware) and can get rid of two more physical boxes.  Box reduction is the name of the game these days!  I’ll probably build a new (virtual) DC on Windows Server 2008 R2; migrate FSMO roles, DNS, and DHCP responsibilities to it; and then phase out the physical DCs – rather than try a P2V move.
  • With SharePoint Server 2010 coming, I’m going to need to get some even beefier server hardware.  I’m learning and working just fine with the aid of desktop virtualization right now (my desktop is a Core i7-920 with 12GB RAM), but that won’t cut it for “production use” and testing scenarios when SharePoint Server 2010 goes RTM.

Conclusion

If the past has taught me anything, it’s that additional needs and situations will arise that I haven’t anticipated.  I’m relatively confident that the infrastructure I have in place will be a solid base for any “coming attractions,” though.

If you have any questions or wonder how I did something, feel free to ask!  I can’t guarantee an answer (good or otherwise), but I do enjoy discussing what I’ve worked to build.

Additional Reading and References

  1. LinkedIn: Jason Ditzel
  2. Product: Gorilla Rack Shelves
  3. Networking: Cincinnati Bell DSL
  4. Networking: Current BPL
  5. Microsoft: System Center Data Protection Manager
  6. Dell: PowerEdge Servers
  7. Microsoft: Hyper-V Getting Started Guide
  8. Movie: The Matrix
  9. Gaming: Fallout Site
  10. Gaming: American McGee’s Alice
  11. Gaming: Warhammer BattleMech
  12. Microsoft: Forefront Threat Management Gateway 2010
  13. Microsoft: Internet Security & Acceleration Server 2006

Upcoming SharePoint Activities (January 2010)

In this post, I cover the upcoming SharePoint Saturday Indianapolis event and a couple of its sessions (including one of my own).

You can’t turn a corner these days without running into a SharePoint Saturday event!  At the end of this month, Indianapolis will be holding its SharePoint Saturday on January 30th.

SharePoint Saturday Indianapolis

My disaster recovery (DR) cohort-in-crime, John Ferringer, and I will be presenting “Saving SharePoint” within the event’s IT Pro track.  We’ve given the talk together a handful of times, and the session tries to communicate some of the more important concepts from our DR book, such as the importance of undertanding RPO/RTO, tools that are available for DR out-of-the-box, and more.  We’ll also be covering how the landscape will be changing a bit for DR in the upcoming SharePoint 2010 release.

One of my team members, Steve Pietrek, will also be presenting his new SharePoint and Silverlight presentation – one that I am very anxious to see.  Steve’s been doing an exceptional amount of work in “constrained” SharePoint environments recently, and he’s found all sorts of ways to bend Silverlight to his will.  I’m sure developers will walk away with some novel ideas.

As always, SharePoint Saturday events are free to the public; all they’ll cost you is some time.  Sign up today!

Additional Reading and References

  1. Event: SharePoint Saturday Indianapolis
  2. Twitter: John Ferringer
  3. Book: SharePoint 2007 Disaster Recovery Guide
  4. Blog: Steve Pietrek – Everything SharePoint/Silverlight
  5. Site: SharePoint Saturday Indianapolis sign-up on Eventbrite

SharePoint, WebDAV, and a Case of the 405 Status Codes

Recent failures with Microsoft Office Picture Manager and SharePoint Explorer View led me to dive under-the-hood to better understand how SharePoint 2007’s WebDAV and IIS7’s WebDAV Publishing role service interact. This post summarizes my findings, as well as how I eliminated my 405 errors.

Several months ago, I decided that a rebuild of my primary MOSS environment here at home was in order.  My farm consisted of a couple of Windows Server 2003 R2 VMs (one WFE, one app server) that were backed by a non-virtualized SQL Server.  I wanted to free up some cycles on my Hyper-V boxes, and I had an “open physical box” … so, I elected to rebuild my farm on a single, non-virtualized box running (the then newly released) Windows Server 2008 R2.

The rebuild went relatively smoothly, and bringing my content databases over from the old farm posed no particular problems.  Everything was good.

The Situation

Fast forward to just a few weeks ago.

One of the site collections in my farm is used to store and share pictures that we take of our kids.  The site collection is, in effect, a huge multimedia repository …

… and allow me a moment to address the concerns of the savvy architects and administrators out there.  I do understand SharePoint BLOB (binary large object) storage and the implications (and potential effects) that large multimedia libraries can have on scalability.  I wouldn’t recommend what I’m doing to most clients – at least not until remote BLOB storage (RBS) gets here with SharePoint 2010.  Remember, though, that my wife and I are just two people – not a company of hundreds or thousands.  The benefits of centralized, tagged, searchable, nicely presented content outweigh scalability and performance concerns for us.

"Upload Multiple= Back to the pictures site.  I was getting set to upload a batch of pictures, so I did what I always do: I went into the Upload menu of the target pictures library in the site collection and selected Upload Multiple Pictures as shown on the right.  For those who happen to have Microsoft Office 2007 installed (as I do), this action normally results in the Microsoft Office Picture Manager getting launched as shown below.

The Microsoft Office Picture Manager being launchedFrom within the Microsoft Office Picture Manager, uploading pictures is simply a matter of navigating to the folder containing the pictures, selecting the ones that are to be pushed into SharePoint, and pressing the Upload and Close button.  From there, the application itself takes care of rounding up the pictures that have been selected and getting them into the picture library within SharePoint.  SharePoint pops up a page that provides a handy “Go back to …” link that can then be used to navigate back to the library for viewing and working with the newly uploaded pictures.

Upon selecting the Upload Multiple Pictures menu item, SharePoint navigated to the infopage.aspx page shown above.  I waited, and waited … but the Microsoft Office Picture Manager never launched.  I hit my browser’s back button, and tried the operation again.  Same result: no Picture Manager.

Trouble In River City

Opening a Picture Library in Explorer ViewPicture Manager’s failure to launch was obviously a concern, and I wanted to know why I was encountering problems … but more than anything, I simply wanted to get my pictures uploaded and tagged.  My wife had been snapping pictures of our kids left and right, and I had 131 JPEG files waiting for me to do something.

I figured that there was more than one way to skin a cat, so I initiated my backup plan: Explorer View.  If you aren’t familiar with SharePoint’s Explorer View, then you need not look any further than the name to understand what it is and how it operates.  By opening the Actions menu of a library (such as a Document Library or Picture Library) and selecting the Open with Windows Explorer menu item as shown on the right, a Windows Explorer window is opened to the library.  The contents of the library can then be examined and manipulated using a file system paradigm – even though SharePoint libraries are not based in (or housed in) any physical file system.

A Picture Library Open in Explorer View

The mechanisms through which the Explorer View are prepared, delivered, and rendered are really quite impressive from a technical perspective.  I’m not going to go into the details, but if you want to learn more about them, I highly recommend a whitepaper that was authored by Steve SheppardSteve is an escalation engineer with Microsoft who I’ve worked with in the past, and his knowledge and attention to detail are second to none – and those qualities really come through in the whitepaper.

Unfortunately for me, though, my attempts to open the picture library in Explorer View also led nowhere.  Simply put, nothing happened.  I tried the Open with Windows Explorer option several times, and I was greeted with no action, error, or visible sign that anything was going on.

SharePoint and WebDAV

I was 0 for 2 on my attempts to get at the picture library for uploading.  I wasn’t sure what was going on, but I was pretty sure that WebDAV (Web Distributed Authoring and Versioning) was mixed-up in the behavior I was seeing.  WebDAV is implemented by SharePoint and typically employed to provide the Explorer View operations it supports.  I was under the impression that the Microsoft Office Picture Manager leveraged WebDAV to provide some or all of its upload capabilities, too.

After a few moments of consideration, the notion that WebDAV might be involved wasn’t a tremendous mental leap.  In rebuilding my farm on Windows Server 2008 R2, I had moved from Internet Information Services (IIS) version 6 (in Windows Server 2003 R2) to IIS7.  WebDAV is different in IIS7 versus previous versions … I just hadn’t heard about SharePoint WebDAV-based functions operating any differently.

Playing a Client-Side Tune

My gut instincts regarding WebDAV hardly qualified as “objective troubleshooting information,” so I fired-up Fiddler2 to get a look at what was happening between my web browser and the rebuilt SharePoint farm.  When I attempted to execute an Open with Windows Explorer against the picture library, I was greeted with a bunch of HTTP 405 errors.

405 ?!?!

To be completely honest, I’d never actually seen an HTTP 405 status code before.  It was obviously an error (since it was in the 400-series), but beyond that, I wasn’t sure.  A couple of minutes of digging through the W3C’s status code definitions, though, revealed that a 405 status code is returned whenever a requested method or verb isn’t supported.

I dug a little deeper and compared the request headers my browser had sent with the response headers I’d received from SharePoint.  Doing that spelled-out the problem pretty clearly.

Here’s an example of one of the HTTP headers that was sent:

PROPFIND /pictures/twins3/2009-12-07_no.3 HTTP/1.1

… and here’s the relevant portion of the response that the SharePoint server sent back:

HTTP/1.1 405 Method Not Allowed
Allow: GET, HEAD, OPTIONS, TRACE

PROPFIND was the method that my browser was passing to SharePoint, and the request was failing because the server didn’t include the PROPFIND verb in its list of supported methods as stated in the Allow: portion of the response.  PROPFIND was further evidence that WebDAV was in the mix, too, given its limited usage scenarios (and since the bulk of browser web requests employ either the GET or POST verb).

So what was going on?  The operations I was attempting had worked without issue under II6 and Windows Server 2003 R2, and I was pretty confident that I hadn’t seen any issues with other Windows Server 2008 (R2 and non-R2) farms running IIS7.  I’d either botched something on my farm rebuild or run into an esoteric problem of some sort; experience (and common sense) pointed to the former.

Doing Some Legwork

I turned to the Internet to see if anyone else had encountered HTTP 405 errors with SharePoint and WebDAV.  Though I quickly found a number of posts, questions, and other information that seemed related to my situation, none of it really described my particular scenario or what I was seeing.

After some additional searching, I eventually came across a discussion on the MSDN forums that revolved around whether or not WebDAV should be enabled within IIS for servers that serve-up SharePoint content.  The back and forth was a bit disjointed, but my relevant take-away was that enabling WebDAV within IIS7 seemed to cause problems for SharePoint.

WebDAV Enabled in IIS7 I decided to have a look at the server housing my rebuilt farm to see if I had enabled the WebDAV Publishing role service.  I didn’t think I had, but I needed to check.  I opened up the Server Manager applet and had a look at Role Services that were enabled for the Web Server (IIS).  The results are shown in the image on right; apparently, I had enabled WebDAV Publishing.  My guess is that I did it because I thought it would be a good idea, but it was starting to look like a pretty bad idea all around.

The Test

I was tempted to simply remove the WebDAV Publishing role service and cross my fingers, but instead of messing with my live “production” farm, I decided to play it safe and study the effects of enabling and disabling WebDAV Publishing in a controlled environment.  I fired up a VM that more-or-less matched my production box (Windows Server 2008 R2, 64-bit, same Windows and SharePoint patch levels) to play around.

When I fired-up the VM, a quick check of the enabled role services for IIS showed that WebDAV Publishing was not enabled – further proof that I got a bit overzealous in enabling role services on my rebuilt farm.  I quickly went into the VM’s SharePoint Central Administration site and created a new web application (http://spsdev:18480).  Within the web application, I created a team site called Sample Team Site.  Within that team site, I then created a picture library called Sample Picture Library for testing.

When It Works (without the WebDAV Publishing Role Service)

I fired up Fiddler2 in the VM, opened Internet Explorer 8, navigated to the Sample Picture Library, and attempted to execute an Open with Windows Explorer operation.  Windows Explorer opened right up, so I knew that things were working as they should within the VM.  The pertinent capture for the exchange between Internet Explorer and SharePoint (from Fiddler2) appears below.

Explorer View Exchange without IIS7 WebDAV

Reviewing the dialog between client and server, there appeared to be two distinct “stages” in this sequence.  The first stage was an HTTP request that was made to the root of the site collection using the OPTIONS method, and the entire HTTP request looked like this:

OPTIONS / HTTP/1.1
Cookie: MSOWebPartPage_AnonymousAccessCookie=18480
User-Agent: Microsoft-WebDAV-MiniRedir/6.1.7600
translate: f
Connection: Keep-Alive
Host: spdev:18480

In response to the request, the SharePoint server passed back an HTTP 200 status that looked similar to the block that appears below.  Note the permitted methods/verbs (as Allow:) that the server said it would accept, and that the PROPFIND verb appeared within the list:

HTTP/1.1 200 OK
Cache-Control: private,max-age=0
Allow: GET, POST, OPTIONS, HEAD, MKCOL, PUT, PROPFIND, PROPPATCH, DELETE, MOVE, COPY, GETLIB, LOCK, UNLOCK
Content-Length: 0
Accept-Ranges: none
Server: Microsoft-IIS/7.5
MS-Author-Via: MS-FP/4.0,DAV
X-MSDAVEXT: 1
DocumentManagementServer: Properties Schema;Source Control;Version History;
DAV: 1,2
Exires: Sun, 06 Dec 2009 21:13:27 GMT
Public-Extension: http://schemas.microsoft.com/repl-2
Set-Cookie: WSS_KeepSessionAuthenticated=18480; path=/
Persistent-Auth: true
X-Powered-By: ASP.NET
MicrosoftSharePointTeamServices: 12.0.0.6510
Date: Mon, 21 Dec 2009 21:13:27 GMT

After this initial request and associated response, all subsequent requests (“stage 2”) were made using the PROPFIND verb and have a structure that was similar to the following:

PROPFIND /Sample%20Picture%20Library HTTP/1.1
User-Agent: Microsoft-WebDAV-MiniRedir/6.1.7600
Depth: 0
translate: f
Connection: Keep-Alive
Content-Length: 0
Host: spdev:18480
Cookie: MSOWebPartPage_AnonymousAccessCookie=18480; WSS_KeepSessionAuthenticated=18480

Each of the requests returned a 207 HTTP status (WebDAV multi-status response) and some WebDAV data within an XML document (slightly modified for readability).

HTTP/1.1 207 MULTI-STATUS 
Cache-Control: no-cache 
Content-Length: 1132 
Content-Type: text/xml 
Server: Microsoft-IIS/7.5 
Public-Extension: http://schemas.microsoft.com/repl-2 
Set-Cookie: WSS_KeepSessionAuthenticated=18480; path=/ 
Persistent-Auth: true 
X-Powered-By: ASP.NET 
MicrosoftSharePointTeamServices: 12.0.0.6510 
Date: Mon, 21 Dec 2009 21:13:27 GMT 

<?xml version="1.0" encoding="utf-8" ?><D:multistatus xmlns:D="DAV:" xmlns:Office="urn:schemas-microsoft-com:office:office" xmlns:Repl="http://schemas.microsoft.com/repl/" xmlns:Z="urn:schemas-microsoft-com:"> 
<D:response><D:href>http://spdev:18480/Sample Picture Library</D:href><D:propstat>
…
</D:propstat> 
</D:response> 
</D:multistatus>

It was these PROPFIND requests (or rather, the 207 responses to the PROPFIND requests) that gave the client-side WebClient (directed by Internet Explorer) the information it needed to determine what was in the picture library, operations that were supported by the library, etc.

When It Doesn’t Work (i.e., WebDAV Publishing Enabled)

When the WebDAV Publishing role service was enabled within IIS7, the very same request (to open the picture library in Explorer View) yielded a very different series of exchanges (again, captured within Fiddler2):

Explorer View with IIS7 WebDAV Enabled

The initial OPTIONS request returned an HTTP 200 status that was identical to the one previously shown, and it even included the PROPFIND verb amongst its list of accepted methods:

HTTP/1.1 200 OK
Cache-Control: private,max-age=0
Allow: GET, POST, OPTIONS, HEAD, MKCOL, PUT, PROPFIND, PROPPATCH, DELETE, MOVE, COPY, GETLIB, LOCK, UNLOCK
Content-Length: 0
Accept-Ranges: none
Server: Microsoft-IIS/7.5
MS-Author-Via: MS-FP/4.0,DAV
X-MSDAVEXT: 1
DocumentManagementServer: Properties Schema;Source Control;Version History;
DAV: 1,2
Exires: Sun, 06 Dec 2009 22:04:31 GMT
Public-Extension: http://schemas.microsoft.com/repl-2
Set-Cookie: WSS_KeepSessionAuthenticated=18480; path=/
Persistent-Auth: true
X-Powered-By: ASP.NET
MicrosoftSharePointTeamServices: 12.0.0.6510
Date: Mon, 21 Dec 2009 22:04:31 GMT

Even though the PROPFIND verb was supposedly permitted, though, subsequent requests resulted in an HTTP 405 status and failure:

HTTP/1.1 405 Method Not Allowed
Allow: GET, HEAD, OPTIONS, TRACE
Server: Microsoft-IIS/7.5
Persistent-Auth: true
X-Powered-By: ASP.NET
MicrosoftSharePointTeamServices: 12.0.0.6510
Date: Mon, 21 Dec 2009 22:04:31 GMT
Content-Length: 0

Unfortunately, these behind-the-scenes failures didn’t seem to generate any noticeable error or message in client browsers.  While testing (locally) in the VM environment, I was at least prompted to authenticate and eventually shown a form of “unsupported” error message.  While connecting (remotely) to my production environment, though, the failure was silent.  Only Fiddler2 told me what was really occurring.

The Solution

IIS7 WebDAV Publishing Role Not Installed The solution to this issue, it seems, is to ensure that the WebDAV Publishing role service is not installed on WFEs serving up SharePoint content in Windows Server 2008 / IIS7 environments.  The mechanism by which SharePoint 2007 handles WebDAV requests is still something of a mystery to me, but it doesn’t appear to involve the IIS7-based WebDAV Publishing role service at all.

Steve Sheppard’s troubleshooting whitepaper (introduced earlier) mentions that enabling or disabling the WebDAV functionality supplied by IIS6 (under Windows Server 2003) has no appreciable effect on SharePoint operation.  Steve even mentions that SharePoint’s internal WebDAV implementation is provided by an ISAPI filter that is housed in Stsfilt.dll.  Though this was true in WSSv2 and SharePoint Portal Server 2003 (the platforms addressed by Steve’s whitepaper), it’s no longer the case with SharePoint 2007 (WSSv3 and MOSS 2007).  The OPTIONS and PROPFIND verbs are mapped to the Microsoft.SharePoint.ApplicationRuntime.SPHttpHandler type in SharePoint web.config files (see below) – Stsfilt.dll library doesn’t even appear anywhere within the file system of MOSS servers (or at least in mine).

Web.config HttpHandler Section

Regardless of how it is implemented, the fact that the two verbs of interest (OPTIONS and PROPFIND) are mapped to a SharePoint type indicates that WebDAV functionality is still handled privately within SharePoint for its own purposes.  When the WebDAV Publishing role is enabled in IIS7, IIS7 takes over (or at least takes precedence for) PROPFIND requests … and that’s where things appear to break.

To Sum Up

After toggling the WebDAV Publishing role service on and off a couple of times in my VM, I became convinced that my production environment would start behaving the way I wanted it to if I simply disabled IIS7’s WebDAV Publishing functionality.  I uninstalled the WebDAV Publishing role service, and both Microsoft Office Picture Manager and Explorer View started behaving again.

I also made a note to myself to avoid installing role services I thought I might need before I actually needed them  :-)

Additional Reading and References

  1. Blog Post: Todd Klindt, “Installing Remote Blob Store (RBS) on SharePoint 2010
  2. Whitepaper: Understanding and Troubleshooting the SharePoint Explorer View
  3. Blog: Steve Sheppard
  4. Microsoft TechNet: About WebDAV
  5. IIS.NET: WebDAV Extension
  6. Tool: Fiddler2
  7. W3C: HTTP/1.1 Status Code Definitions
  8. MSDN: Client Request Using PROPFIND
  9. TechNet Forums: IIS WebDAV service required for SharePoint webdav?
  10.   Site: HTTP Extensions for Distributed Authoring

Upcoming SharePoint Activities (November 2009)

In this post, I discuss events that I’ll be participating in during the month of November. Events include a SharePoint Conference recap presentation for Microsoft, SharePoint Saturday Cleveland, and a webcast for Idera on SharePoint disaster recovery.

November was looking like a pretty busy month for me before this year’s SharePoint Conference (SPC) in Las Vegas, but the excitement about SharePoint 2010 both in and around the conference seems to have ratcheted things up a notch.  Here’s where I’ll be and what I’ll be doing (in “order of appearance”) in the month of November:

Microsoft “Best of SPC 2009” Event

Many of the folks who wanted to attend the Microsoft SharePoint Conference in Las Vegas this year weren’t able to so for a variety of reasons.  To “share the love” a bit, Microsoft is holding a series of one-day events that brings select sessions from the SPC to cities around the country … or at least around the state of Ohio.  Yes, I’m extrapolating a bit with “around the country,” but it’s an educated guess :-)

In any case, I’ll be delivering a session titled What’s New for SharePoint 2010 Administration and Governance to the crowd that will be attending the event at the Microsoft office in Columbus, Ohio, on November 10th.  The abstract for the session reads as follows:

SharePoint 2010 includes many new and improved tools for providing a flexible and controlled environment and this session will provide an overview of those innovations.

I caught this session while I was at the SPC, and I found it to be good, solid information for IT professionals.  I’m very much looking forward to delivering the content myself!

SharePoint Saturday Cleveland

SharePoint Saturday finally makes its way to Ohio!  SharePoint Saturday Cleveland will be held on Saturday, November 14th, at the Embassy Suites on Rockside Woods Blvd. in Independence, Ohio.

SharePoint Saturday Cleveland

John Ferringer and I will be delivering our SharePoint disaster recovery (DR) talk titled “Saving SharePoint.”  It will differ a bit from previous presentations on the topic in that we can now include SharePoint 2010 content.  After the talk, I’ll be sure to post our slide deck here on my blog.

SPS Cleveland is less than two weeks away, but there are still seats open.  As with all SPS events, there’s no charge for those in attendance – all you need to do is show up and take it all in!

“SharePoint Disaster Recovery Essential Guidelines” Webcast

The “week of whirlwind activity” (roughly speaking) will conclude with a webcast for Idera.  John and I will be presenting SharePoint Disaster Recovery Essential Guidelines on Wednesday, November 18th, and it will be similar to the SharePoint Saturday presentation we’ve given in the past (and will have given a few days earlier at SPS Cleveland).

Todd Klindt recently presented a DR webcast with Idera; if you saw it, you might be asking “do I really need another DR webcast?”  Probably the biggest differences between Todd’s webcast and ours are scope and target audience.  I caught Todd’s presentation, and his webcast was aimed more at the solidly SharePoint admin/IT pro crowd.  John and I include some of the same content and focus, but our webcast is packaged with more of a lean towards classic DR concepts (RTO, RPO, BCPs, etc.).  I would also say that our webcast targets IT decision makers and DR planners as much as it does IT pros, though I feel that both groups will find something of interest in what we have to say.

If our webcast sounds like it would be of interest to you, hop over to Idera’s site and sign up!

Additional Reading and References

  1. Events: Microsoft SharePoint Conference 2009
  2. Events: SharePoint Saturday Cleveland
  3. Company: Idera
  4. Blog: Todd Klindt’s SharePoint Admin Blog
  5. Events: SharePoint Disaster Recovery Essential Guidelines webcast

Manually Clearing the MOSS 2007 BLOB Cache

This post investigates manual flushing of the MOSS BLOB cache via file system deletion, why such flushes might be needed, and how they should be carried out. Some common troubleshooting questions (and answers to them) are also covered.

It’s a fact of life when dealing with many caching systems: for all the benefits they provide, they occasionally become corrupt or require some form of intervention to ensure healthy ongoing operation.  The MOSS Binary Large Object (BLOB) cache, or disk-based cache, is no different.

Is BLOB Cache Corruption a Common Problem?

In my experience, the answer is “no.”  The MOSS BLOB cache generally requires little maintenance and attention beyond ensuring that it has enough disk space to properly store the objects it fetches from the lists within the content databases housing your publishing site collections.

How Should a Flush Be Carried Out?

ObjectCacheSettings.aspx Application PageWhen corruption does occur or a cache flush is desired for any reason, the built-in “Disk Based Cache Reset” option is typically adequate for flushing the BLOB cache on a single server and single web application zone.  This option (circled in red on the page shown to the right) is exposed through the Site collection object cache menu item on a publishing site’s Site Collection Administration menu.  Executing a flush is as simple as checking the supplied checkbox and clicking the OK button at the bottom of the page.  When a flush is executed in this fashion, it affects only the server to which the postback occurs and only the web application through which the request is directed.  If a site collection is extended to multiple web applications, only one web application’s BLOB cache is affected by this operation.

BLOB Cache Farm Flush SolutionAlternatively, my MOSS 2007 Farm-Wide BLOB Cache Flushing Solution (screenshot shown on the right) can be used to clear the BLOB cache folders associated with a target site collection across all servers in a farm and across all web applications (zones) serving up the site collection.  This solution utilizes a different mechanism for flushing, but the net effect produced is the same as for the out-of-the-box (OOTB) mechanism: all BLOB-cached files for the associated site collection are deleted from the file system, and the three BLOB cache tracking files for each affected web application (IIS site) are reset.

For more information on the internals of the BLOB Cache, the flush process, and the files I just mentioned, see my previous post entitled We Drift Deeper Into the Sound … as the (BLOB Cache) Flush Comes.

Okay, I Tried a Flush and it Failed.  Now What?

If the aforementioned flush mechanisms simply aren’t working for you, you’re probably staring down the barrel of a manual BLOB cache flush.  Just delete all of the files in the target BLOB cache folder (as specified in the web.config) and you should be good to go, right?

Wrong.

Jumping in and simply deleting files without stopping requests to the affected site collection (or rather, the web application/applications servicing the site collection) risks sending you down the road to (further) cache corruption.  This risk may be small for sites that see little traffic or are relatively small, but the risk grows with increasing request volume and site collection size.  Allow me to illustrate with an example.

The Context

Let’s say that you decided to manually clear the BLOB cache for a sizable publishing site collection that is heavily trafficked.  You go into the file system, find your BLOB cache folder (by default, C:\blobCache), open it up, select all files and sub-folders contained within, and press the <Delete> key on your keyboard.  Deletion of the BLOB cache files and sub-folders commences.

Deleting the sub-folders and files isn’t an instantaneous operation, though.  It takes some time.  While the deletion is taking place, let’s say that your MOSS publishing site collections are still up and servicing requests.  The web applications for which BLOB caching is enabled are still attempting to use the very folders and files currently being deleted.

The Race Condition

For the duration of the deletion, a race condition is in effect that can yield some fairly unpredictable results.  Consider the following possible execution sequence.  Note: this example is hypothetical, but I’ve seen results on multiple occasions that infer this execution sequence (or something similar to it).

  1. The deletion operation deletes one or more of the .bin files at the root of a web application’s BLOB cache folder.  These files are used by MOSS to track the contents of the BLOB cache, the number of times it was flushed, etc.
  2. A request for a resource that would normally be present in the BLOB cache arrives at the web server.  An attempted lookup for the resource in the BLOB cache folder fails because the .bin files are gone as a result of the actions taken in the last step.
  3. The absence of the .bin files kicks off some housekeeping.  Ultimately, a “fresh” set of .bin files written out.
  4. The requested resource is fetched into the BLOB cache (sub-)folder structure and the .bin files are updated so that subsequent requests for the resource are served from the file system instead of the content database.
  5. The deletion operation, which has been running the whole time, deletes the file and/or folder containing the resource that was just fetched.

Once the deletion operation has concluded, a resource that was fetched in step #4 is tracked in the BLOB cache’s dump.bin file, but as a result of step #5, the resource no longer actually exist in the BLOB cache file system.  Net effect: requests for these resources return HTTP 404 errors.

Since image files are the most common BLOB-cached resources, broken link images (for example, that nasty red “X” in place of an image in Internet Explorer) are shown for these tracked-but-missing resources.  No amount of browser refreshing brings the image back from the server; only an update to the image in the content database (which triggers a re-fetch of the affected resource into the BLOB cache) or another flush operation fixes the issue as long as BLOB caching remains enabled.

Proper Manual Clearing

The key to avoiding the type of corruption scenario I just described is to ensure that requests aren’t serviced by the web application or applications that are tied to the BLOB cache.  Luckily, this is accomplished in a relatively straightforward fashion.

Before attempting either of the approaches I’m about to share, though, you need to know where (in the server file system) your BLOB cache root folder is located.  By default, the BLOB cache root folder is located at C:\blobCache; however, most conscientious administrators change this path to point to a data drive or non-system partition.

Location of BLOB cache root folder in web.config If you are unsure of the location of the BLOB cache root folder containing resources for your site collection, it’s easy enough to determine it by inspecting the web.config file for the web application housing the site collection.  As shown in the sample web.config file on the right, the location attribute of the <BlobCache> element identifies the BLOB cache root folder in which each web application’s specific subfolder will be created.

Be aware that saving any changes to the web.config file will result in an application pool recycle, so it’s generally a good idea to review a copy of the web.config file when inspecting it rather than directly opening the web.config file itself.

The Quick and Dirty Approach

When you just want to “get it done” as quickly as possible using the least number of steps, this is the process:

  1. World Wide Web Publishing Service Stop the World Wide Web Publishing Service on the target server.  This can be accomplished from the command line (via net stop w3svc) or the Services MMC snap-in (via Start –> Administrative Tools –> Services) as shown on the right.
  2. Once the World Wide Web Publishing Service stops, simply delete the BLOB cache root folder.  Ensure that the deletion operation completes before moving on to the next step.
  3. Restart the World Wide Web Publishing service (via Services or net start w3svc).

Though this approach is quick with regard to time and effort invested, it’s certainly “dirty,” coarse, and not without disadvantages.  Using this approach prevents the web server from servicing *any* web requests for the duration of the operation.  This includes not only SharePoint requests, but requests for any other web site that may be served from the server.

Second, the “quick and dirty” approach wipes out the entire BLOB cache – not just the cached content associated with the web application housing your site collection (unless, of course, you have a single web application that hasn’t been extended).  This is the functional equivalent of trying to drive a nail with a sledgehammer, and it’s typically overkill in most production scenarios.

The Controlled (Granular) Approach

There is a less invasive alternative to the “Quick and Dirty” technique I just described, and it is the procedure I recommend for production environments and other scenarios where actions must be targeted and impact minimized.  The screenshots that follow are specific to IIS7 (Windows Server 2008), but the fundamental activities covered in each step are the same for IIS6 even if execution is somewhat different.

  1. Locating the IIS site ID of the target web application Determine the IIS ID of the web application servicing the site collection for which the flush is being performed.  This is easily accomplished using the Internet Information Services (IIS) Manager (accessible through the Administrative Tools menu) as shown to the right.  If I’m interested in clearing the BLOB cache of a site collection that is hosted within the InternalHomeWeb (Default) web application, for example, the IIS site ID of interest is 1043653284.
  2. Locating the name of the application pool associated with the web application Determine the name of application pool that is servicing the web application.  In IIS7, this is accomplished by selecting the web application (InternalHomeWeb (Default)) in the list of sites and clicking the Basic Settings… link under Edit Site in the Site Actions menu on the right-hand side of the window.  The dialog box that pops up clearly indicates the name of the associated application pool (as shown on the right, circled in red).  Note the name of the application pool for the next step.
  3. Stopping the target application pool Stop the application pool that was located in the previous step.  This will shutdown the web application and prevent MOSS from serving up requests for the site collections housed within the web application, thus avoiding the sort of race condition described earlier.  If multiple application pools are used to partition web applications within different worker processes, then shutting down the application pool is “less invasive” than stopping the entire World Wide Web Publishing Service as described in “The Quick and Dirty Approach.”  If all (or most) web applications are serviced by a single application pool, though, then there may be little functional benefit to stopping the application pool.  In such a case, it may simply be easier to stop the World Wide Web Publishing Service as described in “The Quick and Dirty Approach.”
  4. BLOB cache folder for selected web applicationOpen Windows Explorer and navigate to the BLOB cache root folder.  For the purposes of this example, we’ll assume that the BLOB cache root folder is located at E:\MOSS\BLOB Cache.  Within the root folder should be a sub-folder with a name that matches the IIS site ID determined in step #1 (1043653284).  Either delete the entire sub-folder (E:\MOSS\BLOB Cache\1043653284), or select the files within the sub-folder and delete them (as shown above).
  5. Once the deletion has completed, restart the application pool that was shutdown in step #3.  If the World Wide Web Publishing Service was shutdown instead, restart it.

Taking the approach just described affects the fewest number of cached resources necessary to ensure that the site collection in question (or rather, its associated web application/applications) starts with a “clean slate.”  If web applications are partitioned across multiple application pools, then this approach also restricts the resultant service outage to only those site collections ultimately being served by the application being shutdown and restarted.

Some Common Questions and Concerns

Q: I have multiple servers or web front-ends.  Do I need to take them all down and manually flush them as a group?

The BLOB cache on each MOSS server operates independently of other servers in the farm, so the answer is “no.”  Servers can be addressed one at a time and in any order desired.

Q: I’ve successfully performed a manual flush and brought everything back up, but I’m *still* seeing an old image/script/etc.  What am I doing wrong?

Interestingly enough, this type of scenario oftentimes has little to do with the actual server-side BLOB cache itself.

One of the attributes that can (and should) be configured when enabling the BLOB cache is the max-age attribute.  The max-age attribute specifies the duration of time, in seconds, that client-side browsers should cache resources that are retrieved from the MOSS BLOB cache.  Subsequent requests for these resources are then served directly out of the client-side cache and not made to the MOSS server until a duration of time (specified by the max-age attribute) is exceeded.

If a BLOB cache is flushed and it appears that old or incorrect resources (commonly images) are being returned when requested, it might be that the resources are simply cached on the local system and being returned from the cache instead of being fetched from the server.  Flushing locally-cached items (or deleting “Temporary Internet files” in Internet Explorer’s terminology) is a quick way to ensure that requests are being passed to the SharePoint server.

Q: I’m running into problems with a manual deletion.  Sometimes all files within the cache folder can’t be deleted, or sometimes I run into strange files that have a size of zero bytes.  What’s going on?

I haven’t seen this happen too often, but when I have seen it, it’s been due to problems with (or corruption in) the underlying file system.  If regular CHKDSK operations aren’t scheduled for the drive housing the BLOB cache, it’s probably time to set them up.

Additional Reading and References

  1. MSDN: Caching In Office SharePoint 2007
  2. CodePlex: MOSS 2007 Farm-Wide BLOB Cache Flushing Solution
  3. Blog Post: We Drift Deeper Into the Sound … as the (BLOB Cache) Flush Comes