It’s a fact of life when dealing with many caching systems: for all the benefits they provide, they occasionally become corrupt or require some form of intervention to ensure healthy ongoing operation. The MOSS Binary Large Object (BLOB) cache, or disk-based cache, is no different.
Is BLOB Cache Corruption a Common Problem?
In my experience, the answer is “no.” The MOSS BLOB cache generally requires little maintenance and attention beyond ensuring that it has enough disk space to properly store the objects it fetches from the lists within the content databases housing your publishing site collections.
How Should a Flush Be Carried Out?
When corruption does occur or a cache flush is desired for any reason, the built-in “Disk Based Cache Reset” option is typically adequate for flushing the BLOB cache on a single server and single web application zone. This option (circled in red on the page shown to the right) is exposed through the Site collection object cache menu item on a publishing site’s Site Collection Administration menu. Executing a flush is as simple as checking the supplied checkbox and clicking the OK button at the bottom of the page. When a flush is executed in this fashion, it affects only the server to which the postback occurs and only the web application through which the request is directed. If a site collection is extended to multiple web applications, only one web application’s BLOB cache is affected by this operation.
Alternatively, my MOSS 2007 Farm-Wide BLOB Cache Flushing Solution (screenshot shown on the right) can be used to clear the BLOB cache folders associated with a target site collection across all servers in a farm and across all web applications (zones) serving up the site collection. This solution utilizes a different mechanism for flushing, but the net effect produced is the same as for the out-of-the-box (OOTB) mechanism: all BLOB-cached files for the associated site collection are deleted from the file system, and the three BLOB cache tracking files for each affected web application (IIS site) are reset.
For more information on the internals of the BLOB Cache, the flush process, and the files I just mentioned, see my previous post entitled We Drift Deeper Into the Sound … as the (BLOB Cache) Flush Comes.
Okay, I Tried a Flush and it Failed. Now What?
If the aforementioned flush mechanisms simply aren’t working for you, you’re probably staring down the barrel of a manual BLOB cache flush. Just delete all of the files in the target BLOB cache folder (as specified in the web.config) and you should be good to go, right?
Jumping in and simply deleting files without stopping requests to the affected site collection (or rather, the web application/applications servicing the site collection) risks sending you down the road to (further) cache corruption. This risk may be small for sites that see little traffic or are relatively small, but the risk grows with increasing request volume and site collection size. Allow me to illustrate with an example.
Let’s say that you decided to manually clear the BLOB cache for a sizable publishing site collection that is heavily trafficked. You go into the file system, find your BLOB cache folder (by default, C:\blobCache), open it up, select all files and sub-folders contained within, and press the <Delete> key on your keyboard. Deletion of the BLOB cache files and sub-folders commences.
Deleting the sub-folders and files isn’t an instantaneous operation, though. It takes some time. While the deletion is taking place, let’s say that your MOSS publishing site collections are still up and servicing requests. The web applications for which BLOB caching is enabled are still attempting to use the very folders and files currently being deleted.
The Race Condition
For the duration of the deletion, a race condition is in effect that can yield some fairly unpredictable results. Consider the following possible execution sequence. Note: this example is hypothetical, but I’ve seen results on multiple occasions that infer this execution sequence (or something similar to it).
- The deletion operation deletes one or more of the .bin files at the root of a web application’s BLOB cache folder. These files are used by MOSS to track the contents of the BLOB cache, the number of times it was flushed, etc.
- A request for a resource that would normally be present in the BLOB cache arrives at the web server. An attempted lookup for the resource in the BLOB cache folder fails because the .bin files are gone as a result of the actions taken in the last step.
- The absence of the .bin files kicks off some housekeeping. Ultimately, a “fresh” set of .bin files written out.
- The requested resource is fetched into the BLOB cache (sub-)folder structure and the .bin files are updated so that subsequent requests for the resource are served from the file system instead of the content database.
- The deletion operation, which has been running the whole time, deletes the file and/or folder containing the resource that was just fetched.
Once the deletion operation has concluded, a resource that was fetched in step #4 is tracked in the BLOB cache’s dump.bin file, but as a result of step #5, the resource no longer actually exist in the BLOB cache file system. Net effect: requests for these resources return HTTP 404 errors.
Since image files are the most common BLOB-cached resources, broken link images (for example, that nasty red “X” in place of an image in Internet Explorer) are shown for these tracked-but-missing resources. No amount of browser refreshing brings the image back from the server; only an update to the image in the content database (which triggers a re-fetch of the affected resource into the BLOB cache) or another flush operation fixes the issue as long as BLOB caching remains enabled.
Proper Manual Clearing
The key to avoiding the type of corruption scenario I just described is to ensure that requests aren’t serviced by the web application or applications that are tied to the BLOB cache. Luckily, this is accomplished in a relatively straightforward fashion.
Before attempting either of the approaches I’m about to share, though, you need to know where (in the server file system) your BLOB cache root folder is located. By default, the BLOB cache root folder is located at C:\blobCache; however, most conscientious administrators change this path to point to a data drive or non-system partition.
If you are unsure of the location of the BLOB cache root folder containing resources for your site collection, it’s easy enough to determine it by inspecting the web.config file for the web application housing the site collection. As shown in the sample web.config file on the right, the location attribute of the <BlobCache> element identifies the BLOB cache root folder in which each web application’s specific subfolder will be created.
Be aware that saving any changes to the web.config file will result in an application pool recycle, so it’s generally a good idea to review a copy of the web.config file when inspecting it rather than directly opening the web.config file itself.
The Quick and Dirty Approach
When you just want to “get it done” as quickly as possible using the least number of steps, this is the process:
- Stop the World Wide Web Publishing Service on the target server. This can be accomplished from the command line (via net stop w3svc) or the Services MMC snap-in (via Start –> Administrative Tools –> Services) as shown on the right.
- Once the World Wide Web Publishing Service stops, simply delete the BLOB cache root folder. Ensure that the deletion operation completes before moving on to the next step.
- Restart the World Wide Web Publishing service (via Services or net start w3svc).
Though this approach is quick with regard to time and effort invested, it’s certainly “dirty,” coarse, and not without disadvantages. Using this approach prevents the web server from servicing *any* web requests for the duration of the operation. This includes not only SharePoint requests, but requests for any other web site that may be served from the server.
Second, the “quick and dirty” approach wipes out the entire BLOB cache – not just the cached content associated with the web application housing your site collection (unless, of course, you have a single web application that hasn’t been extended). This is the functional equivalent of trying to drive a nail with a sledgehammer, and it’s typically overkill in most production scenarios.
The Controlled (Granular) Approach
There is a less invasive alternative to the “Quick and Dirty” technique I just described, and it is the procedure I recommend for production environments and other scenarios where actions must be targeted and impact minimized. The screenshots that follow are specific to IIS7 (Windows Server 2008), but the fundamental activities covered in each step are the same for IIS6 even if execution is somewhat different.
- Determine the IIS ID of the web application servicing the site collection for which the flush is being performed. This is easily accomplished using the Internet Information Services (IIS) Manager (accessible through the Administrative Tools menu) as shown to the right. If I’m interested in clearing the BLOB cache of a site collection that is hosted within the InternalHomeWeb (Default) web application, for example, the IIS site ID of interest is 1043653284.
- Determine the name of application pool that is servicing the web application. In IIS7, this is accomplished by selecting the web application (InternalHomeWeb (Default)) in the list of sites and clicking the Basic Settings… link under Edit Site in the Site Actions menu on the right-hand side of the window. The dialog box that pops up clearly indicates the name of the associated application pool (as shown on the right, circled in red). Note the name of the application pool for the next step.
- Stop the application pool that was located in the previous step. This will shutdown the web application and prevent MOSS from serving up requests for the site collections housed within the web application, thus avoiding the sort of race condition described earlier. If multiple application pools are used to partition web applications within different worker processes, then shutting down the application pool is “less invasive” than stopping the entire World Wide Web Publishing Service as described in “The Quick and Dirty Approach.” If all (or most) web applications are serviced by a single application pool, though, then there may be little functional benefit to stopping the application pool. In such a case, it may simply be easier to stop the World Wide Web Publishing Service as described in “The Quick and Dirty Approach.”
- Open Windows Explorer and navigate to the BLOB cache root folder. For the purposes of this example, we’ll assume that the BLOB cache root folder is located at E:\MOSS\BLOB Cache. Within the root folder should be a sub-folder with a name that matches the IIS site ID determined in step #1 (1043653284). Either delete the entire sub-folder (E:\MOSS\BLOB Cache\1043653284), or select the files within the sub-folder and delete them (as shown above).
- Once the deletion has completed, restart the application pool that was shutdown in step #3. If the World Wide Web Publishing Service was shutdown instead, restart it.
Taking the approach just described affects the fewest number of cached resources necessary to ensure that the site collection in question (or rather, its associated web application/applications) starts with a “clean slate.” If web applications are partitioned across multiple application pools, then this approach also restricts the resultant service outage to only those site collections ultimately being served by the application being shutdown and restarted.
Some Common Questions and Concerns
Q: I have multiple servers or web front-ends. Do I need to take them all down and manually flush them as a group?
The BLOB cache on each MOSS server operates independently of other servers in the farm, so the answer is “no.” Servers can be addressed one at a time and in any order desired.
Q: I’ve successfully performed a manual flush and brought everything back up, but I’m *still* seeing an old image/script/etc. What am I doing wrong?
Interestingly enough, this type of scenario oftentimes has little to do with the actual server-side BLOB cache itself.
One of the attributes that can (and should) be configured when enabling the BLOB cache is the max-age attribute. The max-age attribute specifies the duration of time, in seconds, that client-side browsers should cache resources that are retrieved from the MOSS BLOB cache. Subsequent requests for these resources are then served directly out of the client-side cache and not made to the MOSS server until a duration of time (specified by the max-age attribute) is exceeded.
If a BLOB cache is flushed and it appears that old or incorrect resources (commonly images) are being returned when requested, it might be that the resources are simply cached on the local system and being returned from the cache instead of being fetched from the server. Flushing locally-cached items (or deleting “Temporary Internet files” in Internet Explorer’s terminology) is a quick way to ensure that requests are being passed to the SharePoint server.
Q: I’m running into problems with a manual deletion. Sometimes all files within the cache folder can’t be deleted, or sometimes I run into strange files that have a size of zero bytes. What’s going on?
I haven’t seen this happen too often, but when I have seen it, it’s been due to problems with (or corruption in) the underlying file system. If regular CHKDSK operations aren’t scheduled for the drive housing the BLOB cache, it’s probably time to set them up.