Most publishing site administrators have at least some degree of familiarity with the binary large object (BLOB) cache that is supplied by the MOSS platform, but trying to find information describing how it actually works its magic can be tough. This post is an attempt to shed a bit of light on the structure, implementation, and operations of the BLOB cache.
Before going too far, though, I should apologize to the group Motorcycle for twisting the title and lyrics of one of their more popular trance songs (“As The Rush Comes”) for the purpose of this post. I guess I simply couldn’t resist the opportunity to have a little (slightly juvenile) fun.
What is the MOSS BLOB Cache?
Also known as disk-based caching, BLOB caching is one of the three forms of caching supplied/supported by MOSS (not WSS) out-of-the-box (OOTB). Simply put, the BLOB cache is a mechanism that allows MOSS to locally store “larger” list items (images, CSS, and more) within the file system of web front-ends (WFEs) so that these resources can be served to callers more efficiently than round-tripping to the content database each time a request for such a resource is received.
The rest of this post assumes that you’re familiar with the basics of the MOSS BLOB cache. If you aren’t, I’d recommend checking out MSDN (“Caching In Office SharePoint 2007”) for a primer.
Some BLOB Cache Internals
Before discussing how flushes are carried out, it’s worth spending a few minutes talking about the internals of the BLOB cache. Having an understanding of what’s going on “under the hood” helps when explaining some of peculiarities I’ll be describing a little later in this post.
The MOSS BLOB caching mechanism is implemented primarily with the help of two types (classes) that live within the Microsoft.SharePoint.Publishing namespace: the BlobCache type and its associated BlobCacheEntry type. Each BlobCache object possesses a dictionary that houses BlobCacheEntry instances, and each BlobCacheEntry object represents an SPListItem (SharePoint list item) object that is being stored (cached) in the local file system of the server.
The scope of any BlobCache instance is a single IIS web site, and this is no surprise given that the BlobCache is enabled and disabled through the following (default) entry in the SharePoint web site’s web.config file:
<BlobCache location="C:\blobCache" path="\.(gif|jpg|png|css|js)$" maxSize="10" enabled="false" />
As shown, BLOB caching is disabled by default. Since BLOB caching is enabled and disabled via the web.config file, configuration and “awareness of operation” is largely a manual affair. From within the SharePoint browser UI, it cannot be easily determined if BLOB caching is enabled or disabled in the same way that this information can be determined for page output caching and object caching.
This leads to another point that is also worth mentioning: though an Internet Information Services (IIS) web site and a SharePoint web application are fairly synonymous in the case of a single zone web application, the one-to-one equivalence breaks down when a web application is extended to multiple zones from within Central Administration. In such an extended scenario, each zone (Default, Internet, Intranet, Extranet, and Custom) has its own IIS web site with its own web.config, so it is possible that BLOB caching can be both enabled and disabled for site collections being exposed. The URL used to access a site collection becomes important in this scenario.
Setting the Wheels in Motion
The <BlobCache /> section that resides within the web.config for an IIS web site is recognized and processed by the MOSS PublishingHttpModule type. As its name implies, this type (which also resides in the Microsoft.SharePoint.Publishing namespace) is an HttpModule. Being an HttpModule, the PublishingHttpModule must be present as a child of the <httpModules /> element within the web.config for an IIS web site in order to do carry out its duties. Under normal circumstances, MOSS takes care of this:
The PublishingHttpModule itself is responsible for coordinating a number of caching-related operations for MOSS (more than just BLOB caching), and these operations all begin when an instance of the PublishingHttpModule is initialized at the same time that IIS is setting up the SharePoint/ASP.NET application pipeline. When IIS sets up this pipeline and the PublishingHttpModule.Init method is called, the following actions take place with regard to the BLOB cache:
- The site’s web.config configuration settings for the BLOB cache get read and processed.
- Assuming settings are found, the PublishingHttpModule creates a new BlobCache object instance to service the (IIS) web site. This happens whether or not BLOB caching is actually enabled. Put another way: all sites for which the PublishingHttpModule is active have a BlobCache object “assigned” to them whether that object is in use (enabled) or not.
- The BlobCache instance takes care of a number of startup housekeeping items like computing file paths, setting up internal dictionaries, and ensuring that a consistent and ready state is established to facilitate requests.
- Assuming all settings are consistent and valid, the BlobCache object instance registers itself with the hosting environment; it then spins-up a separate (independent) thread to rehydrate saved settings (for cached objects), create indexes, and perform some additional startup activities. This “maintenance thread” then stays alive to regularly perform background checks for things like flush requests, site changes, etc. – but only if BLOB caching is enabled within the web.config. If BLOB caching isn’t enabled, no additional work is performed on the thread.
- Finally, the BlobCache instance’s RewriteUrl method is registered as a handler for the AuthorizeRequest method of the SharePoint application (HttpApplication) for which the pipeline was established. Since the AuthorizeRequest method fires for each SharePoint web request prior to actual page processing, it gives the BlobCache instance a chance to inspect a requested URL and possibly do something with it – such as serve an object back from the disk-based BLOB cache instead of allowing the request to proceed through “normal channels” (which may involve database object lookup).
At the end of this process, a BlobCache object exists for all publishing sites (that is, sites where the PublishingHttpModule is active). Again, this happens whether or not BLOB caching is actually enabled for the IIS site … though the BlobCache instance will only process requests (that is, perform useful actions in the RewriteUrl method) if it has been enabled to do so via the appropriate web.config setting.
BLOB Cache File System Structure
The following image illustrates the file system of a typical server that is implementing BLOB caching. In the case of this server, the BLOB cache location has been set to E:\MOSS\BLOBCache within the web.config file of each IIS web site utilizing the cache:
Within the E:\MOSS\BLOBCache folder are two subfolders named 748546212 and 1553899298. Each of these folders houses BLOB cache content for a different IIS site; each web site for which BLOB caching is enabled ends up with its own folder. The folder names (for example, 748546212) are nothing more than each web site’s ID value as assigned by IIS. These ID values are readily visible within the Internet Information Manager (IIS) Manager snap-in, making it easy to correlate folders with their associated IIS web sites.
Within each BLOB cache subfolder (web site folder) are three files that are maintained by MOSS; more specifically, they’re maintained by the BlobCache object instance servicing the web site. These files are critical to the operation of the BLOB cache, and they (primarily) serve to persist critical BlobCache variables and state during application pool shutdowns (when the BlobCache object is destroyed):
- change.bin: This file contains serialized change tokens (SPChangeToken) for objects being cached in the local file system. These tokens allow the BlobCache maintenance thread to query the content source(s) and subsequently update the contents of the BLOB cache with any items that are identified as having changed since the last maintenance sweep.
- dump.bin: This file contains a serialized copy of the BlobCache’s cache dictionary. The dictionary maintains information for all objects being tracked and maintained by the BlobCache object; each key/value pair in the dictionary consists of a local file path (key) and it’s associated BlobCacheEntry (value).
- flushcount.bin: This file contains nothing more than the serialized value of the cacheFlushCount for the BlobCache object. Practically speaking, this value allows a BlobCache to determine if a flush has been requested while it was shutdown.
In a properly functioning BLOB cache, these three .bin files will always be present. If any of these files should become corrupt or be deleted, the BlobCache will execute a flush to remedy its inconsistent state.
In a site where web requests have been processed and files have been cached, additional folders and files will be present in addition to the change.bin, dump.bin, and flushcount.bin files. Additional folders (and subfolders) reflect the URL path hierarchy of the site being serviced by the BlobCache object. The files within these (path) folders correspond one-to-one with list items (that is, BLOB assets) that have been requested, and the cached files themselves have the same name as their corresponding list items with the addition of a .cache extension.
As an example, consider a site collection that is located at http://www.myurl.com and for which BLOB caching is enabled. If the BLOB cache is configured to cache JPEG images and a user requests http://www.myurl.com/PublishingImages/test.jpeg, we can expect two things once the request has completed:
- the BLOB cache folder servicing the www.myurl.com site within the server’s file system will have a subfolder within it named PUBLISHINGIMAGES.
- The PUBLISHINGIMAGES subfolder will have a file named TEST.JPEG.cache.
Small side note which may be evident: the BlobCache object creates all cache-resident paths and filenames (save for the .cache extension) in uppercase.
What Are the Mechanics of a Flush?
The BlobCache can flush itself if it detects any internal problems (for example, one or more of its .bin files is missing or corrupt), but the process can also be requested by an external source or event. The actual BLOB cache flush process is relatively straightforward and follows this progression (assuming the BLOB cache has a working folder; that is, it hasn’t somehow been deleted):
- The BlobCache acquires a writer lock for its working folder to prevent other operations during the flush that’s about to be conducted.
- The BlobCache attempts to move it’s working folder to a temporary location – a new folder identified by a freshly generated globally unique identifier (GUID) string – in preparation for the flush.
- If the previous folder move (to the temporary “GUID folder”) succeeded, the BlobCache attempts to delete the temporary folder. If the previous move attempt failed, the BlobCache attempts an in-place deletion of the working folder.
- If the folder deletion attempt fails, the BlobCache waits two seconds before attempting the folder deletion operation once again. If the deletion fails a second time, the BlobCache leaves the temporary folder (or the original folder if the folder move failed in step #2) alone and proceeds.
- The BlobCache performs internal housekeeping to clean up dictionaries, reset tracking variables, create a new BLOB cache subfolder (again, folder name is derived from the IIS site ID), and write out a new set of state files (change.bin, dump.bin, and flushcount.bin) to the folder.
- With everything cleaned-up and ready to go, the BlobCache releases its Mutex writer lock and normal operations resume.
Single-Server Flush Versus Farm-Wide Flush
I mentioned that an external source or event can request a flush. A flush is typically requested in one of two ways:
- A single-server flush can be requested from within the SharePoint browser UI via the Site Collection Administration column’s “Site collection object cache” link.
- A farm-wide flush can be requested via STSADM.exe (note the qualifiers supplied by Maxime Bombardier at the bottom of the page) or with the help of a third-party tool like my MOSS 2007 Farm-Wide BLOB Cache Flushing Solution.
A single-server flush request is executed through the SharePoint browser UI on the ObjectCacheSettings.aspx application page. The relevant portion of that page appears below:
A request that is made through the ObjectCacheSettings.aspx page results in a direct call to the BlobCache object servicing the associated IIS site (and working folder) on the server receiving the postback (flush) request. Once the FlushCache call is made, the BlobCache carries out the flush as previously described.
A farm-wide flush request, on the other hand, is carried out in a very different fashion. The following is a section of the BlobCacheFarmFlush.aspx page from the BlobCacheFarmFlush solution:
A farm-wide flush is executed by incrementing a custom property value (named blobcacheflushcount) on the target site collection’s parent SPWebApplication. A change in this property value propagates to all servers since the affected SPWebApplication.Properties collection is updated and maintained in the SharePoint farm configuration database. Each BlobCache object servicing a site collection under the affected SPWebApplication picks up the property change and carries out a flush on the working folder it is responsible for managing.
Request Mechanism Impact on Flush Process
As you might expect, the choice of flush request mechanism (single-server versus farm-wide) has a profound effect on what actually happens during the flush process.
Consider a MOSS farm that has two WFEs (MOSSWFE1 and MOSSWFE2) serving up page requests for a single site collection. The site collection is exposed through an IIS web site on each server with a URL of http://internal.samplesite.com, and this URL is associated with the default web application. The site collection is also exposed through a web application that has been extended to the Internet zone, and its IIS site has a URL of http://www.samplesite.com. BLOB caching is enabled on both servers for each of the two IIS web sites, so a total of four working folders (2 servers * 2 sites) are in-play for BLOB caching purposes. A (simplified) visual representation looks something like this:
Each of the aforementioned IIS web sites is represented by circled numbers 1 through 4 in the diagram above, while the configuration database is represented by a circled number 5; I’ll be referring to these (numbers) in the descriptions that follow. Pay attention, too, to the IDs for each of the two IIS sites on each server (748546212 for the Internet zone and 1553899298 for the default zone).
Requesting a single-server flush via the SharePoint browser UI results in a request to (or rather, through) one site on one server. Prior to such a request, let’s look at how the BLOB cache might appear on MOSSWFE1:
As you can see, the BLOB cache folders for both IIS sites on MOSSWFE1 (that is, #1 and #2 in the previous farm diagram) have cached items in them. The www.samplesite.com (#1) site has a “MISCELLANEOUS SHOTS” subfolder (which will have one or more cached resources in it), and the internal.samplesite.com site (#2) has a “BRIAN HEATHERS WEDDING” subfolder (also with cached resources).
For the sake of discussion, let’s say that single-server BLOB cache flush request is made against MOSSWFE1 through the site collection via #2 (the internal.samplesite.com site). Once the flush has been executed, the BLOB cache folder structure would appear as follows:
Notice that the “BRIAN HEATHERS WEDDING” subfolder is gone from the site with ID 1553899298 (internal.samplesite.com, or #2). Further examination of the folder would also confirm that all .bin files had been reset – a clear sign that a flush had taken place. The cache folder for the other site at 748546212 (www.samplesite.com, or #1), on the other hand, remains unchanged. Each of the BLOB cache folders (#3 and #4) on MOSSWFE2 also remain unaffected.
A single-server flush, therefore, is not only restricted to a single server (MOSSWFE1 in this example), but it also impacts only the specific IIS site (or SharePoint zone) through which the flush request is made. In the case of the example above, a site administrator requesting a BLOB cache flush through http://internal.samplesite.com has no impact whatsoever on any of the cached files for http://www.samplesite.com.
This can have significant implications in many Internet publishing scenarios where publicly facing sites (zones) only permit anonymous access for security reasons. In such situations, no OOTB mechanism exists to actually permit a flush request for the public zone/site given that such a flush is a privileged operation available only to site collection administrators.
Thankfully, there is a way to address this problem …
In a farm-wide flush, the point of origin for the change that initiates a flush is #5 – the farm configuration database. As described earlier in this post, the blobcacheflushcount property on the SPWebApplication (web application) that houses the target site collection (in the case of the BlobCacheFarmFlush solution) is incremented. When the property is incremented, the BlobCache instances servicing the IIS sites under the SPWebApplication detect the property value change and carry out a flush.
Examining the file system for sites #3 and #4 on MOSSWFE2 prior to a farm-wide flush, we might see the following folder structure:
Once a farm-wide flush has been executed via STSADM or through a tool like the BlobCacheFarmFlush solution, the BLOB cache area of the file system (for sites #3 and #4) on MOSSWFE2 would appear like this:
A review of MOSSWFE1 would reveal the same file system changes; BLOB cache folders for #1 and #2 would also be reset.
Unlike the single-server BLOB cache flush via the SharePoint browser UI, a farm-wide flush impacts all WFEs in the farm serving up the site collection. Arguably the more important (and non-obvious) difference, though, is that the farm-wide flush impacts all zones/IIS sites for the web application serving the site collection. In the case of the example above, a farm-wide flush request through any of the available URLs on either server results in BLOB caches for #1, #2, #3, and #4 being flushed. This tends to make a farm-wide flush the preferred flush mechanism for the publishing site example I cited earlier (where public access occurs through an anonymous-only zone/site).
A Watch-Out with Farm-Wide Flush Requests
There is one additional point that should be made with regard to farm-wide flushes. In order for a flush to take place on a WFE, the IIS application pool servicing the targeted web application must be running. If the application pool isn’t running (hasn’t yet been started or perhaps has shutdown due lack of requests), it will appear that the flush had “no affect” on the server.
The reason for this is relatively straightforward. As described towards the beginning of this post, BlobCache object instances and their associated maintenance threads are created when IIS establishes a SharePoint pipeline (and SPHttpApplication) for request processing. If this pipeline isn’t yet ready to service requests for a targeted web application (perhaps because the IIS worker process hasn’t started-up or the application pool was recycled but not “primed”), then the SPWebApplication’s blobcacheflushcount property change won’t be detected at the time it is altered. No maintenance thread = no property change detection = no flush.
Since the cacheFlushCount for each BLOB cache is serialized and tracked via the flushcount.bin file, though, detection of the web application’s flush property value change occurs as soon as the BlobCache object is instantiated at the time of pipeline setup. The result is that a BLOB cache flush occurs as soon as the worker process or new application domain (and by extension, the BlobCache instance and its maintenance thread) spins-up to begin servicing requests.
It is my hope that this overview provides you with some insight into the internals of the MOSS BLOB cache, as well as a basis for understanding how flush mechanisms differ. As always, I welcome any feedback or questions you might have.