Quick Tips for Managing the SharePoint 2010 Office Web Applications Cache

I presented remotely to the Boston Area SharePoint User Group (BASPUG) tonight (7/13/2016), and I referenced an article that I had written that is no longer available online. This post originally appeared as a “SharePoint Smarts” article from Idera. Idera is out of the SharePoint business nowadays, but the information I shared in that article is still relevant to those who use SharePoint 2010. So if you have a SharePoint 2010 environment and use the Office Web Apps, this post (and more specifically, the scripts contained within) is for you.

One of the hotly anticipated items in SharePoint 2010’s feature set is the introduction of the Microsoft Office Web Applications, or “Office Web Apps” for short. The release of the Office Web Apps opens up new possibilities for those who work with documents and files that are tied to Microsoft Word and other applications in the Microsoft Office Family.

What Are the Office Web Apps?

In prior versions of SharePoint, viewing and editing Office documents that existed in SharePoint document libraries normally required a client computer possessing the Microsoft Office suite of applications. If you wanted to view or edit a Word document that existed in SharePoint, for example, you needed Microsoft Word (or an equivalent application) installed on your computer.

That situation changes with the arrival of the Office Web Apps. When a SharePoint 2010 farm is properly set up and configured with the Office Web Apps, it becomes possible to view and edit several different Office document types directly from within a browser as shown in Figure 1 below.

Open Document

Figure 1: Browser-based editing of a Microsoft Word document

The Office Web Apps provide browser-based viewing and editing support for Microsoft Excel, OneNote, PowerPoint, and Word document types, and this support extends to more than just Internet Explorer. Firefox 3.x, Safari 4.x, and Google Chrome browser types are also supported for viewing and editing – making the Office Web Apps an enabler of cross-platform collaboration that centers on Office documents.

A Word about the Plumbing

As you might imagine, browser-based rendering and editing of Office documents involves a number of complex processes that engage a variety of front-end, middle-tier, and back-end components. The front-end and middle-tier tasks that are tied to document viewing and editing are handled primarily by a new set of service applications that appear when the Office Web Apps are installed. These service applications (and their associated pages, handlers, and worker processes) take care of the business of document conversion, load-balancing, and rendering for browser consumption.

Document conversion and rendering typically generate a combination of images, HTML, JavaScript, and XAML (or eXtensible Application Markup Language) that are sent to consuming browsers. The creation of these document resources is an expensive process, both in terms of CPU cycles and storage. To improve performance levels, it makes sense to generate these document resources only as needed and reuse them whenever possible. That’s where the Office Web Apps cache comes in.

The Office Web Apps cache is the back-end store that is responsible for housing images, HTML, JavaScript, and XAML resources once they have been created for a document. Each time a document is converted into a set of these resources, the resources are stored in the Office Web Apps cache. When a request for a document comes into SharePoint, the cache is checked to see if the document had been previously requested and rendered. If it had, and the cached document resources are up-to-date for the document, then the document request is served from the cache instead of engaging the Office Web Apps to convert and re-render it. Serving document resources from the Office Web Apps cache can yield significant performance improvements over scenarios where no cache is employed.

Quick side note before going too far: the Office Web Apps cache is only employed for Word and PowerPoint document types. It is not used for OneNote or Excel documents.

Inside the Office Web Apps Cache

The Office Web Apps cache takes the form of a single site collection for each Web application within a SharePoint farm. When the Office Web Apps are installed and configured in a SharePoint environment, a couple of new timer jobs are installed and run regularly within the farm. One of those timer jobs, the Office Web Apps Cache Creation timer job, ensures that each Web application where the Office Web Apps are running has a site collection like the one shown below in Figure 2.

Site Collection

Figure 2: The Office_Viewing_Service_Cache site collection

The Office_Viewing_Service_Cache site collection is a standard Team Site, and it is the location where resources are stored following the conversion and rendering of either a Word or PowerPoint document by the Office Web Apps.

The Team Site can be accessed just like any other SharePoint Team Site, and a glimpse inside the All Documents library (showing a number of document resources) appears below in Figure 3.

Cache Library

Figure 3: All Documents library in an Office Web Apps cache site collection

Managing the Cache

For such a complex system, the Office Web Apps components do a pretty good job of maintaining themselves without external intervention. This extends to the site collections that are used by Office Web Apps for caching purposes, as well. For example, the Office Web Apps Expiration timer job that is installed with the Office Web Apps removes old document resources from cache site collections once they’ve hit a certain age. The timer job also ensures that each of the site collections responsible for caching has adequate space to serve its purpose.

This doesn’t mean that there aren’t opportunities for tuning and maintenance, though. In fact, there are a couple of things that every administrator should do and review when it comes to the Office Web Apps cache.

Tip #1: Relocate the Cache to a New Database

By default, the Office Web Apps Cache Creation timer job creates an Office_Viewing_Service_Cache site collection in a content database that is collocated with one or more of the “real” site collections within each of your content Web applications. Since the cache site collection is allowed to grow to a beefy 100GB by default, it makes sense to relocate the cache site collection to its own (new) content database. By relocating the cache site collection to its own content database, it becomes easy to exclude it from other maintenance such as backups.

Relocating the cache site collection is pretty straightforward, and it can be accomplished pretty easily with following RelocateOwaCache.ps1 PowerShell script. Simply save the script, execute it, and supply the URL of a Web application within your farm where the Office Web Apps are running. The script will take care of creating a new content database within the Web application, and it will then move the Web application’s Office Web Apps cache site collection to the newly created content database.

<#
.SYNOPSIS
   RelocateOwaCache.ps1
.DESCRIPTION
   Relocates the Office Web Apps cache for a specified Web application to a new content database that is created by the script
.NOTES
   Author: Sean McDonough
   Last Revision: 07-June-2011
.PARAMETER targetUrl
   A Web application where Office Web Apps are in use
.EXAMPLE
   RelocateOwaCache.ps1 http://www.TargetWebApplication.com
#>
param 
(
	[string]$targetUrl = "$(Read-Host 'Target Web application URL [e.g. http://hostname]')"
)

function RelocateCache($targetUrl)
{
	# Ensure that the SharePoint cmdlets are loaded before continuing
	$spCmdlets = Get-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction silentlycontinue
	if ($spCmdlets -eq $Null)
	{ Add-PSSnapin Microsoft.SharePoint.PowerShell }
	
	# Get the name of the current database where the cache is located; it
	# will serve as the basis for a new content database name.
	$cacheSite = Get-SPOfficeWebAppsCache -WebApplication $targetUrl -ErrorAction stop
	$newDbName = $cacheSite.ContentDatabase.Name + "_OWACache"
	
	# Create a new content database and relocate the cache to it.  Make sure the
	# user knows what's happening each step of the way.
	Write-Host "- creating a new content database ..."
	$cacheDb = New-SPContentDatabase -Name $newDbName -WebApplication $targetUrl -ErrorAction stop
	Write-Host "- moving the Office Web Apps cache ..."
	Move-SPSite $cacheSite -DestinationDatabase $cacheDb -Confirm:$false -ErrorAction stop
	Write-Host "- performing required IISRESET ..."
	iisreset | Out-Null
	
	# Let the user know where the cache is now located
	Write-Host "Cache successfully relocated to the '$newDbName' database."

	# Abort script processing in the event an exception occurs.
	trap
	{
		Write-Warning "`n*** Script execution aborting. See below for problem encountered during execution. ***"
		$_.Message
		break
	}
}

# Launch script
RelocateCache $targetUrl

Tip #2: Review Size and Expiration Settings

When an Office_Viewing_Service_Cache site collection is provisioned within a Web application by the Office Web Apps Cache Creation timer job, it is initially configured to hold cached document resources for 30 days. As mentioned in Tip #1, a cache site collection can also grow to a maximum of 100GB by default.

Whether or not these default settings are appropriate for a Web application depends primarily upon the nature of the site collections housed within the Web application. When site collections contain primarily static documents or content that changes infrequently, it makes sense to allow the cache to grow larger and expire content less often than normal. This maximizes the benefit obtained from caching since document content turns over less frequently.

On the other hand, site collections that experience frequent document turnover and heavy collaboration traffic tend to benefit very little from large cache sizes and long expiration periods. In site collections of this nature, cached content tends to become stale quickly. Little benefit is derived from holding onto document resources that may only be good for days or even hours, so maximum cache size is reduced and expiration periods are shortened.

Tip #3: Give Yourself Some Warning

Since each Office Web App cache is a Team Site and like any other site collection, you can leverage standard SharePoint site collection features and capabilities to help you out. One such mechanism that can be of assistance is the ability to have an e-mail warning sent to site collection owners once a site collection’s size hits a predefined threshold. In the case of the Office Web Apps cache, such a warning could be a cue to increase the maximum size of the cache site collection or perhaps lower the expiration period for document resources housed within the site collection.

Like the maximum cache size setting described in Tip #2, the ability to send e-mail warnings once the cache reaches a threshold is actually tied to SharePoint’s site collection quota capabilities. The maximum size of the cache site collection is handled as a storage quota, and the warning threshold maps directly to the quota’s warning threshold as shown below in Figure 4. In the case of Figure 4, a maximum cache size of 50GB is in effect for the cache site collection, and the e-mail warning threshold is set for 25GB.

Quota

Figure 4: Quota settings for an Office Web Apps cache site collection

Knobs and Dials

Tips #2 and #3 discussed some of the more straightforward Office Web Apps cache settings that are available to you, but you might be wondering how you actually go about changing them.

The AdjustOwaCache.ps1 PowerShell script that appears below provides you with an easy way to review and change the settings discussed. Simply save the script, execute it, and supply the URL of the Web application containing the Office Web Apps cache you’d like to adjust. The script will show you the cache’s current settings and give you the opportunity to modify them.

<#
.SYNOPSIS
   AdjustOwaCache.ps1
.DESCRIPTION
   Dumps several common OWA cache settings to the console for a selected Web application and provides a mechanism for altering the those values
.NOTES
   Author: Sean McDonough
   Last Revision: 08-June-2011
.PARAMETER targetUrl
   A Web application where Office Web Apps are in use
.EXAMPLE
   AdjustOwaCache.ps1 http://www.TargetWebApplication.com
#>
param 
(
	[string]$targetUrl = "$(Read-Host 'Target Web application URL [e.g. http://hostname]')"
)

function AdjustCache($targetUrl)
{
	# Ensure that the SharePoint cmdlets are loaded before continuing
	$spCmdlets = Get-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction silentlycontinue
	if ($spCmdlets -eq $Null)
	{ Add-PSSnapin Microsoft.SharePoint.PowerShell }
	
	# Create an easy converter for GB to bytes
	$GBtoBytes = 1024 * 1024 * 1024
	
	# Get a reference to the cache site collection and extract the values we'll be
	# working with and (potentially) altering.
	$cacheSite = Get-SPOfficeWebAppsCache -WebApplication $targetUrl -ErrorAction stop
	$wacSize = $cacheSite.Quota.StorageMaximumLevel / $GBtoBytes
	$wacWarn = $cacheSite.Quota.StorageWarningLevel / $GBtoBytes
	$wacExpire = 30
	if ($cacheSite.RootWeb.Properties.ContainsKey("waccacheexpirationperiod"))
	{ $wacExpire = $cacheSite.RootWeb.Properties["waccacheexpirationperiod"] }
	Write-Host "Current OWA cache values for '$targetUrl'"
	Write-Host "-  Maximum Cache Size (GB): $wacSize"
	Write-Host "-   Warning Threshold (GB): $wacWarn"
	Write-Host "- Expiration Period (Days): $wacExpire"
	
	# Give the user the option to make changes.
	$yesOrNo = Read-Host "Would you like to change one or more values? [y/n]"
	if ($yesOrNo -eq "y")
	{
		[Int64]$newWacSize = Read-Host "-  Maximum Cache Size (GB)"
		Write-Host  "-   Warning Threshold (GB)"
		[Int64]$newWacWarn = Read-Host " (supply 0 for no warning)"
		[int]$newWacExpire = Read-Host "- Expiration Period (Days)"
		
		# Convert GB values to bytes and set the cache
		$newWacSize = ($newWacSize * $GBtoBytes)
		$newWacWarn = ($newWacWarn * $GBtoBytes)
		Set-SPOfficeWebAppsCache -WebApplication $targetUrl -ExpirationPeriodInDays $newWacExpire -MaxSizeInBytes $newWacSize -WarningSizeInBytes $newWacWarn -ErrorAction stop
	}

	# Abort script processing in the event an exception occurs.
	trap
	{
		Write-Warning "`n*** Script execution aborting. See below for problem encountered during execution. ***"
		$_.Message
		break
	}
}

# Launch script
AdjustCache $targetUrl

Conclusion

The Office Web Apps are a powerful addition to SharePoint 2010 and pave the way for greater collaboration on Office documents without the need for the Microsoft Office suite of client applications. The Office Web Apps cache is an important part of the larger Office Web Apps equation, and the cache is generally pretty good about taking care of itself. As shown in this article, though, it is still a good idea to relocate the cache from its default location. At the same time, a little bit of tuning and e-mail alerting can go a long way towards ensuring that the cache operates optimally for you in your environment.

Is a Higher SharePoint Backup Thread Count Better?

Many administrators have noted that SharePoint 2010 allows them to tune the number of threads that can be used for farm backup and restore operations, but very few have played with the settings. In this post, I share some results I compiled while testing the settings in my own environments. I also share the PowerShell script I assembled for my testing so you can tune the backup and restore thread settings in your own SharePoint farm.

Balls of purple, orange and grey yarn or woolScalability in the hardware and software space is all about parallel computing nowadays. Consider our modern hardware: it used to be that all we really cared about was how fast our CPU could run (“how many GHz?”) Now, we care more about how many cores our CPU has, whether or not those cores support Hyper-threading, how many memory channels our CPU has available to it, etc. Scale-out beats scale-up.

The same is largely true in the software space. Most IT folks learned some time ago that “multithreading” and “higher performance” tended to go hand-in-hand or were at least associated in some way. Multiple threads of execution meant better scheduling of limited processor resources and fewer chances that one long-running operation would bottleneck an entire application.

Configuring SharePoint 2010 Farm Backup and Restore

When I first saw the following section in the “Configure Backup Settings” section of SharePoint 2010’s Central Administration site, it brought a big grin to my face:

Thread Configuration

In SharePoint 2007 and earlier, administrators had no real levers to pull to try and tune the performance of farm backup and restore operations. This obviously changed with SharePoint 2010. We were basically being handed a way to adjust those processes as we saw fit – for better or worse.

Strangely enough, though, I never really took the time to explore the impact of those settings in my SharePoint environments. I always left the number of assigned threads for backup and restore operations at three. I would have liked to mess around with the values, but something else was always more important in the grand scheme of things.

Why Now?

I’ve been working on a new “backup tips and tricks” whitepaper, and I found myself looking for backup and restore concerns within the SharePoint platform that I may not have given much attention to in the past. It didn’t take much wading through Central Administration before I once again found myself looking at thread counts for backup and restore operations.

Doing a little bit of Internet (background) research confirmed what I had suspected: no one else had really spent any time on the topic either. In fact, the only “fresh” and non-copyright-infringing material I found came from a Microsoft TechNet post titled Backup and recovery best practices (SharePoint Server 2010) … and to tell you the truth, the following paragraph from the section titled “Configure SharePoint settings for better backup or restore performance” really bugged me:

If you are using the Backup-SPFarm cmdlet, you can use the BackupThreads parameter to specify how many threads SharePoint Server 2010 will use during the backup process. The more threads you specify, the more resources that backup operation will take, but the faster that it will finish, if sufficient resources are available. However, each thread is reported individually in the log files, so using fewer threads makes interpreting the log files easier. By default, three threads are used. The maximum number of threads available is 10.

Without an understanding of how multithreading (in general) and SharePoint backup (specifically) work, this could easily be interpreted as follows:

The greater the number of threads you assign, the faster your backups will complete.

I realize that my summary is an oversimplification, but I believe that many administrators see the TechNet paragraph as I summarized it. And that concerns me.

I’ve always told people that increasing the backup thread count could yield better performance, but any adjustments would need to be tested in the target farm where they are to be implemented. Realistically speaking, there are several participants and a lot of moving parts in any SharePoint farm backup. Besides the SharePoint server where the backup operation is being coordinated, there is the performance of one or more SQL Servers to consider. The capabilities and restrictions of the backup destination location (typically a UNC file share) also need to be factored-in since that destination is being written to by both the SharePoint Server and one or more SQL Servers.

Setting the number of backup threads to 10 on a SharePoint Server of infinite capability and resources doesn’t guarantee a fast backup, because the farm might have a slow SQL Server, a less-capable backup destination location, a slow or congested network, or a host of other complicating factors.

Oh Yeah? Prove It.

Of course, all of this is just a bunch of hand-waving without proof. So, the scientist in me (yeah, I actually used to be a chemist) decided to take over and devise a series of simple tests to see if there is any real weight to the arguments I’ve been making.

I began with the hypothesis that the easiest and most visible way to gauge the performance of a farm backup operation is to measure how long a backup takes to run; e.g., a farm backup that takes 10 minutes to run is faster than a backup that takes 20 minutes to run if farm content, hardware, configuration, and other factors remain constant. Since SharePoint 2010 provides the ability to specify anywhere from one to 10 backup threads, running a series of backups where the only variable is backup thread count should determine if greater or fewer backup threads yield better performance.

You might recall that I also mentioned that farm topology is a factor in the overall backup equation. As part of my experiment, I decided to run the tests on two different farms I have available to me. General descriptions for each farm:

  • Single-Server Farm: my single server farm environment is a VM running on my laptop. The VM houses SharePoint, SQL Server, and the backup location being targeted. The laptop hardware is a Core-i7 quad-core processor, and the underlying storage for the VM is a solid-state drive (SSD). Hardware bottlenecks should be minimized, and network latency isn’t a factor since backup operations are conducted against a local drive within the VM.
  • Multi-Server Farm: my multi-server environment is the “production” environment on my home network. It consists of a SharePoint Server VM running on a Hyper-V host that also hosts other VMs. The SQL Server instance backing the farm is a non-virtualized SQL Server housing all of the SharePoint databases as well as a few databases for other applications. The backup destination location is a virtualized file server with a pass-through drive array (eSATA with RAID-5). Overall hardware, in this case, is “okay” but obviously not dedicated purely to SharePoint. In addition, network latency and bandwidth (GbE) are also in-play as potential sources of impact.

These two environments have pretty different overall topologies, and it was my hope that I’d see some effect on the performance numbers as a result.

The Script

To run the tests reproducibly, I needed a PowerShell script. So, I put the following script together while I had a bit of free time one night. Feel free to pluck this out to use for testing in your SharePoint environment, as well.

<#
.SYNOPSIS
   TestBackupThreads.ps1
.DESCRIPTION
   This script is used to conduct and time a series of backups using different thread counts.
   The output can then be used to make an educated decision on the number of backup threads to
   assign for use in farm-level backups.
.NOTES
   Author: Sean McDonough
   Last Revision: 25-July-2012
.PARAMETER TestLocation
   A UNC path to a location that can be used to create test backup sets
.EXAMPLE
   TestBackupThreads \\FileShare\TestLocation
#>
param 
(
	[string]$TestLocation = "$(Read-Host 'UNC path to test backup location [e.g. \\FileShare\TestLocation]')"
)

function TestThreads($backupLocation)
{
	# Ensure that the SharePoint cmdlets are loaded before continuing
	$spCmdlets = Get-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction silentlycontinue
	if ($spCmdlets -eq $Null)
	{ Add-PSSnapin Microsoft.SharePoint.PowerShell }
	
	# Setup some variables we'll need for execution.
	$threadTimes = @{}									# Hash table to hold timing results
	$backupItems = Join-Path $backupLocation "spbr*"	# Used to delete temp backup files
	
	# We need to execute a full farm backup for each thread count 1 through 10
	Clear-Host
	Write-Host "`nBackup thread count testing process beginning."
	for ($threads = 1; $threads -lt 11; $threads++)
	{
		# Clean out any backup contents from the test location
		Remove-Item $backupItems -recurse

		# Grab the starting date/time (for later comparison), kick-off a farm backup, and then
		# grab the stop date/time.
		Write-Host "`nInitiating a backup with $threads thread(s) ..."
		$startPoint = Get-Date
		Backup-SPFarm -BackupMethod Full -Directory $backupLocation -BackupThreads $threads
		$stopPoint = Get-Date

		# Store and report results
		$keyName = "Backup with {0} thread(s)" -f $threads
		$elapsedSeconds = "{0:N0}" -f ($stopPoint - $startPoint).TotalSeconds
		$threadTimes[$keyName] = $elapsedSeconds
		Write-Host "Backup with $threads thread(s) complete"
		Write-Host ("- time to complete (in seconds): {0}" -f $elapsedSeconds)
	}
	
	# Do a final sweep of the test backup location to clean out backup items
	Remove-Item $backupItems -recurse

	# Dump the results sorted in order of quickest to longest
	Write-Host "`nBackup thread count testing process complete."
	$threadTimes.GetEnumerator() | Sort-Object Value

	# Abort script processing in the event an exception occurs.
	trap
	{
		Write-Warning "`n*** Script execution aborting. See below for problem encountered during execution. ***"
		$_.Message
		break
	}
}

# Launch script
TestThreads $TestLocation

The script is fairly straightforward in what it does. You supply a TestLocation parameter to specify where farm backup test data should be written to, and the script will run a series of full farm backups using the supplied location as the backup destination. The script starts with a full backup using one backup thread; at the end of each full farm backup, the script notes how long the backup took (in seconds) and cleans-up the contents of the TestLocation folder. The number of backup threads is then incremented, and the next test is run. When the script has completed running all backup tests, it sorts the results from “quickest backup” (i.e., the backup thread count requiring the least amount of time) to the slowest backup.

Test Results

I ran a series of three tests for each of the aforementioned environments for a total of six total test runs. Although there’s still quite a bit of variability between individual results within a backup thread series, some trends did appear to emerge.

Single-Server Farm

Backup Times for the Single-Server Environment

With the single-server environment, increasing the number of backup threads did appear to have a directional impact on performance. A single backup thread proved to be the slowest option for the farm backup, and “greater than one” thread resulted in better performance.

If you look at the average values, though, there wasn’t a tremendous difference between the slowest thread count (410 seconds for one thread) and the fastest (388 seconds for 10 threads). We’re only talking about a 5% to 6% difference overall. To truly find the optimum number of backup threads in an environment like this would require more than three test runs to account for standard deviation and establish significance.

Oh, and for those that might be wondering: I’m sure I introduced some of my own variability into the results. Although I didn’t do anything processor or disk intensive during the test runs, I didn’t go out of my way to minimize the impact of services, background operations, etc. To repeat: more testing (with better controls) would be needed for truly conclusive results. The only thing I started to show with this particular set of tests is that multithreading seemed to improve backup performance.

Multi-Server Farm

Things got quite a bit more interesting (to me) when I switched over to multi-server farm testing.

Backup Times for the Multi-Server Environment

In the multi-server environment, the average for using just one backup thread (1413 seconds) appeared to be significantly faster than the next best option (1747 seconds for seven backup threads) – in the neighborhood of 20% or so faster. Just like the single-server results, additional trials would be needed to completely validate the observations, but the results are less ambiguous (given the relatively greater precision of the samples) than with the single-server runs.

Do you find this surprising? Given my multi-server environment and what I know about it, I can’t really say that I was caught flat-footed by the results. Going into the tests, my hypothesis was that my backup destination location would likely be the “weak link” in my overall farm and backup topology. The SharePoint Server was doing well, the SQL Server was relatively robust … but all of that backup activity was hard on my (virtualized) file server. Multiple servers trying to write to the backup location were swamping it and the network, and adding additional backup threads to the mix didn’t end up helping or improving the overall backup process.

The Take-Away

At the end of the day, I recognize that these tests of mine didn’t prove anything conclusively. Frankly, conclusive proof wasn’t my goal. The intent of these experiments wasn’t to say “more threads are better” or “more threads are worse.”

The only point I’m making (I hope) by sharing these results is this: until you run some real tests of your own in your SharePoint environment, you really don’t know where your backup thread sweet spot is. You can try to guess it, but it’s just a guess. And guessing is really no better than simply leaving the backup thread count set to its default value of three.

References and Resources

  1. Wikipedia: Parallel Computing
  2. Wikipedia: Hyper-threading
  3. Wikipedia: Thread (computing) and Multithreading
  4. TechNet: Backup and recovery best practices (SharePoint Server 2010)

Finding a GUID in a SharePoint Haystack

In this post, I share a PowerShell script that I recently wrote to help out a friend. The script allows you to search and identify content items in SharePoint site collections by object ID (GUID). The script isn’t something you’d probably use every day, but it might be handy to keep in the script library “just in case.”

HaystackHere’s another blog post to file in the “I’ll probably never need it, but you never know” bucket of things you’ve seen from me.

Admittedly, I don’t get to spend as much time as I’d like playing with PowerShell and assembling scripts. So when the opportunity to whip-up a “quick hit” script comes along, I usually jump at it.

The Situation

I feel very fortunate to have made so many friends in the SharePoint community over the last several years. One friend who has been with me since the beginning (i.e., since my first presentation at the original SharePoint Saturday Ozarks) is Kirk Talbot. Kirk has become something of a “regular” on the SharePoint Saturday circuit, and many of you may have seen him at a SharePoint Saturday event someplace in the continental United States. To tell you the truth, I’ve seen Kirk as far north as Michigan and as far south as New Orleans. Yes, he really gets around.

Kirk and I keep up fairly regular correspondence, and he recently found himself in a situation where he needed to determine which objects (in a SharePoint site collection) were associated with a handful of GUIDs. Put a different way: Kirk had a GUID (for example, 89b66b71-afc8-463f-b5ed-9770168996a6) and wanted to know – was it a web? A list? A list item? And what was the identity of the item?

PowerShell to the Rescue

I pointed Kirk to a script I had previously written (in my Finding Duplicate GUIDs in Your SharePoint Site Collection post) and indicated that it could probably be adapted for his purpose. Kirk was up to the challenge, but like so many other SharePoint administrators was short on time.

I happened to find myself with a bit of free time in the last week and was due to run into Kirk at SharePoint Saturday Louisville last weekend, so I figured “what the heck?” I took a crack at modifying the script I had written earlier so that it might address Kirk’s need. By the time I was done, I had basically thrown out my original script and started over. So much for following my own advice.

The Script

The PowerShell script that follows is relatively straightforward in its operation. You supply it with a site collection URL and a target object GUID. The script then searches through the webs, lists/libraries, and list items of the site collection for an object with an ID that matches the GUID specified. If it finds a match, it reports some information about the matching object.

A sample run of the script appears below. In the case of this example, a list item match was found in the target site collection for the supplied GUID.

Sample Script Execution

 

This script leverages the SharePoint object model directly, so it can be used with either SharePoint 2007 or SharePoint 2010. Its search algorithm is relatively efficient, as well, so match results should be obtained in seconds to maybe minutes – not hours.

<#
.SYNOPSIS
   FindObjectByGuid.ps1
.DESCRIPTION
   This script attempts to locate a SharePoint object by its unique ID (GUID) within
   a site collection. The script first attempts to locate a match by examining webs;
   following webs, lists/libraries are examined. Finally, individual items within
   lists and libraries are examined. If an object with the ID is found, information 
   about the object is reported back.
.NOTES
   Author: Sean McDonough
   Last Revision: 27-July-2012
.PARAMETER SiteUrl
   The URL of the site collection that will be searched
.PARAMETER ObjectGuid
   The GUID that identifies the object to be located
.EXAMPLE
   FindObjectByGuid -SiteUrl http://mysitecollection.com -ObjectGuid 91ce5bbf-eebb-4988-9964-79905576969c
#>
param 
(
	[string]$SiteUrl = "$(Read-Host 'The URL of the site collection to search [e.g. http://mysitecollection.com]')",
	[Guid]$ObjectGuid = "$(Read-Host 'The GUID of the object you are trying to find [e.g. 91ce5bbf-eebb-4988-9964-79905576969c]')"
)

function FindObject($startingUrl, $targetGuid)
{
	# To work with SP2007, we need to go directly against the object model
	Add-Type -AssemblyName "Microsoft.SharePoint, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c"

	# Grab the site collection and all webs associated with it to start
	$targetSite = New-Object Microsoft.SharePoint.SPSite($startingUrl)
	$matchObject = $false
	$itemsTotal = 0
	$listsTotal = 0
	$searchStart = Get-Date

	Clear-Host
	Write-Host ("INITIATING SEARCH FOR GUID: {0}" -f $targetGuid)

	# Step 1: see if we can find a matching web.
	$allWebs = $targetSite.AllWebs
	Write-Host ("`nPhase 1: Examining all webs ({0} total)" -f $allWebs.Count)
	foreach ($spWeb in $allWebs)
	{
		$listsTotal += $spWeb.Lists.Count
		if ($spWeb.ID -eq $targetGuid)
		{
			Write-Host "`nMATCH FOUND: Web"
			Write-Host ("- Web Title: {0}" -f $spWeb.Title)
			Write-Host ("-   Web URL: {0}" -f $spWeb.Url)
			$matchObject = $true
			break
		}
		$spWeb.Dispose()
	}
	
	# If we don't yet have match, we'll continue with list iteration
	if ($matchObject -eq $false)
	{
		Write-Host ("Phase 2: Examining all lists and libraries ({0} total)" -f $listsTotal)
		$allWebs = $targetSite.AllWebs
		foreach ($spWeb in $allWebs)
		{
			$allLists = $spWeb.Lists
			foreach ($spList in $allLists)
			{
				$itemsTotal += $spList.Items.Count
				if ($spList.ID -eq $targetGuid)
				{
					Write-Host "`nMATCH FOUND: List/Library"
					Write-Host ("-            List Title: {0}" -f $spList.Title)
					Write-Host ("- List Default View URL: {0}" -f $spList.DefaultViewUrl)
					Write-Host ("-      Parent Web Title: {0}" -f $spWeb.Title)
					Write-Host ("-        Parent Web URL: {0}" -f $spWeb.Url)
					$matchObject = $true
					break
				}
			}
			if ($matchObject -eq $true)
			{
				break
			}

		}
		$spWeb.Dispose()
	}
	
	# No match yet? Look at list items (which includes folders)
	if ($matchObject -eq $false)
	{
		Write-Host ("Phase 3: Examining all list and library items ({0} total)" -f $itemsTotal)
		$allWebs = $targetSite.AllWebs
		foreach ($spWeb in $allWebs)
		{
			$allLists = $spWeb.Lists
			foreach ($spList in $allLists)
			{
				try
				{ 
					$listItem = $spList.GetItemByUniqueId($targetGuid)
				}
				catch
				{
					$listItem = $null
				}
				if ($listItem -ne $null)
				{
					Write-Host "`nMATCH FOUND: List/Library Item"
					Write-Host ("-                    Item Name: {0}" -f $listItem.Name)
					Write-Host ("-                    Item Type: {0}" -f $listItem.FileSystemObjectType)
					Write-Host ("-       Site-Relative Item URL: {0}" -f $listItem.Url)
					Write-Host ("-            Parent List Title: {0}" -f $spList.Title)
					Write-Host ("- Parent List Default View URL: {0}" -f $spList.DefaultViewUrl)
					Write-Host ("-             Parent Web Title: {0}" -f $spWeb.Title)
					Write-Host ("-               Parent Web URL: {0}" -f $spWeb.Url)
					$matchObject = $true
					break
				}
			}
			if ($matchObject -eq $true)
			{
				break
			}

		}
		$spWeb.Dispose()
	}
	
	# No match yet? Too bad; we're done.
	if ($matchObject -eq $false)
	{
		Write-Host ("`nNO MATCH FOUND FOR GUID: {0}" -f $targetGuid)
	}
	
	# Dispose of the site collection
	$targetSite.Dispose()
	Write-Host ("`nTotal seconds to execute search: {0}`n" -f ((Get-Date) - $searchStart).TotalSeconds)
	
	# Abort script processing in the event an exception occurs.
	trap
	{
		Write-Warning "`n*** Script execution aborting. See below for problem encountered during execution. ***"
		$_.Message
		break
	}
}

# Launch script
FindObject $SiteUrl $ObjectGuid

Conclusion

Again, I don’t envision this being something that everyone needs. I want to share it anyway. One thing I learned with the “duplicate GUID” script I referenced earlier is that I generally underestimate the number of people who might find something like this useful.

Have fun with it, and please feel free to share your feedback!

Additional Reading and Resources

  1. Event: SharePoint Saturday Ozarks
  2. Twitter: Kirk Talbot (@kctElgin)
  3. Post: Finding Duplicate GUIDs in Your SharePoint Site Collection
  4. Event: SharePoint Saturday Louisville

Do You Know What’s Going to Happen When You Enable the SharePoint BLOB Cache?

The SharePoint BLOB Cache can be a very powerful tool for use in improving farm performance and scalability, but some planning should take place before the BLOB Cache is enabled. In this post, I explain how end users can suffer if BLOB Cache planning isn’t performed. I also make some recommendations on how to configure the BLOB Cache to provide administrators with performance benefits that don’t come at the cost of a negative end user experience.

The topic of the SharePoint BLOB Cache and how it operates jumped back into the front of my brain recently given some conversations I’ve had and things I’ve seen (e.g., a promising CodePlex project called the SharePoint 2010 BlobCache Manager).

SharePoint PSA

"Just Do It" Post-It NoteThis post is my way of doing something akin to a SharePoint public service announcement. I’ve recently seen some caching-related functionality and topics – especially the BLOB Cache – getting some real traction in different circles, and I think that the attention and love is generally a good thing. I am somewhat concerned, though, by the fact that the discussions and projects that have been surfacing don’t seem to say much beyond the Post-It on the right.

What do I mean by “Just do-it?” Well, here’s the high-level summary of what I’ve been seeing people say, post, and practice with the SharePoint BLOB Cache:

  • The SharePoint BLOB Cache can lighten the load on your SQL Servers by caching BLOB (binary large object) data such as images, video, audio, CSS, etc., on your web front-ends (WFEs)
  • BLOB assets are then served directly from the WFEs. This prevents regular round trips from the WFEs to SQL Servers for every BLOB item needed, and this conserves network bandwidth and reduces SQL Server load.
  • To realize the benefits of the BLOB Cache, simply turn it on and you’re good to go. Nothing to it!

To be fair, I think that I’ve done a disservice by contributing to the perception that all you need to do to kick-start BLOB caching is change this web.config line …

<BlobCache location="C:\BlobCache\14" path="\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$" maxSize="10" enabled="false" />

… to this:

<BlobCache location="C:\BlobCache\14" path="\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$" maxSize="10" enabled="true" />

If you look closely, you’ll see that the only difference between the two XML elements is that the enabled attribute is changed from false to true in the second example.

As you might have guessed, I wouldn’t be writing this blog post if simply changing the BlobCache element’s enabled attribute to true didn’t cause potential problems.

The Small Print

Disclaimer text that includes some BLOB cache usage warningsAt the recent SPTechCon in San Francisco, I gave a five-minute lightning talk called Pushing SharePoint’s ‘Go Faster’ Button. It was a lighthearted look at SharePoint performance, and it focused on a couple of caching changes that could be easily implemented to improve SharePoint performance. One of the recommended changes was (surprise surprise) to simply “turn on” SharePoint’s BLOB Cache.

I only had five minutes to deliver the lightning talk, so I had to cram all of the disclaimers for what I was recommending into the legal style slide that appears on the left. Although the slide got a chuckle from the crowd (the print did look pretty small on-screen), I actually did invest some time in its warnings and watch-outs for anyone who wanted to go and dig them up later.

Of the two tips I delivered in the lightning talk, Tip #2 dealt with the SharePoint BLOB cache. I included a very specific warning in the “Disclaimer of Liability” aimed at those who sought to simply “set it and forget it.” The text of that warning read:

Failure to specify a max-age attribute in the BlobCache element of the web.config will result in the default value of 86,400 seconds (24 hours) being used. Use of a non-zero max-age attribute will result in the attachment of client-side cacheability headers to assets that are being BLOB cached, and such headers can result in BLOB assets being cached on the client beyond the duration of the current user session; such caching can easily result in "stale" BLOB resources being used from the client rather than newer ones being fetched from the WFE, so adjust max-age values carefully.

Put another way: if you simply enable the BLOB cache and do nothing else, your users may be getting a SharePoint behavior change that you hadn’t intended for them to have.

Why Did You Have To Bring Age Into This?

The sticking point with SharePoint’s default BlobCache element and attribute settings is that a max-age of 24 hours is assumed and used when the max-age attribute isn’t explicitly specified or set. What does that mean? I wrote a separate post a while back titled Client-Server Interactions and the max-age Attribute with SharePoint BLOB Caching, and that post addressed the effect that explicit and implicit max-age attribute value specifications have on BLOB Caching. I recommend checking out the post for the full background; for anyone who needs a quick summary, though, I can distill it down to two bullet points:

  • Enabling the BLOB Cache without specifying a max-age attribute means that BLOBs will be cached on both the WFEs in your farm and within users’ browser caches (through the use of Cache-Control HTTP headers).
  • In collaboration environments and anyplace else where BLOB assets may be edited or turn over frequently (within the course of a day), the default client-side caching behavior can mess with the UI/UX of your SharePoint site in all sorts of interesting ways.

What does this mean for the average user of SharePoint? Well, let me walk through a fictitious scenario with supporting detail – as told from the perspective of a SharePoint end user. If you already understand the problem, you’re short on time, and you want to get right to what I recommend, jump down to the “Recommendations Before You Enable the BLOB Cache” section.

Acme Online Goes Live!

Welcome to the Acme Corporation! The Acme Corporation recently completed a “webification” of its entire product catalog, and the end result is a publishing site collection that is implemented in SharePoint 2010. The site collection houses all of Acme’s products, and those products are available for the public to browse and order. Acme’s web content management team is responsible for maintaining the product catalog as it appears on the site, and that team is led by a crafty old fellow named Wile E. Coyote (who we’ll simply refer to as “Wiley” from here on out).

Wiley has many years of experience with Acme’s products and has tried nearly all of them personally; he’s something of a legend. He and his team worked diligently to get Acme’s products into SharePoint before the launch. Not all of the products made it into SharePoint before the launch, though, so a phased approach was taken to rolling out the entire catalog.

The Launch

A SharePoint article page featuring a bundle of dynamiteThe first products that Wiley and his team worked to get into SharePoint were Acme’s line of explosives. To prepare for the launch of the new online catalog, Wiley wrote up an article on Acme’s top-selling “Bundle o’ Dynamite” product. The article featured a picture of the Bundle o’ Dynamite, along with some descriptive text about the product, how it operates, a few safety warnings, and a couple of other informational points. When Wiley finished, a mockup of the article page looked like the screenshot seen on the left.

A Fiddler trace of the first request for the dynamite article pageUnbeknownst to Wiley, the Acme product catalog site collection is served-up by one Web application through one zone (the Default zone) on one WFE. This means that all product catalog requests, whether they come from customers or Wiley’s team, go to one IIS site on one server. The first time that someone (or more specifically, someone’s browser) requests the article page that Wiley put together, a series of web requests are kicked-off to pull down the page content, images, scripts, CSS, and everything else needed to render the page in a browser. This series of interactions (captured using Fiddler) is shown on the top right.

A Fiddler trace of the second request for the dynamite article pageSubsequent requests for the same article page (within the context of a single browser session) will follow the series of interactions seen directly to the right. One thing that you may notice upon inspecting the Fiddler trace is that subsequent page requests result in fewer calls back to the server. This is because SharePoint applies per session caching to many of the items it passes back to the browser, and this caching (which is not the same as BLOB caching) removes the need for constant re-fetching of items that haven’t changed.

In both of the Fiddler traces above, the focus is on the newsarticleimage.jpg file  – the file which houses a picture of the Bundle o’ Dynamite. The first time the browser requests the image within a session, a successful HTTP 200 response is returned to the browser along with the image. Also important to note is the Cache-Control header that comes back with the image:

Cache-Control: private,max-age=0

The private part of the Cache-Control header tells the client browser to cache the image locally for the duration of the browser session. The max-age=0 portion says, in effect, that subsequent uses of the image by the browser (from its cache) should be validated with a call back to the WFE to ensure that the image hasn’t changed.

And that’s what is shown happening in the second Fiddler trace. When subsequent page requests attempt to use the image, a GET request from the browser is answered by the WFE with

HTTP/1.1 304 NOT MODIFIED

This response code tells the browser that the image hasn’t changed and that it’s safe to use the locally cached copy. If the image were to change, then an HTTP 200 would be returned instead and the new/updated version of the image would be sent to the browser.

When the browser is closed, the locally cached copy of the image is flushed and the process begins anew the next time the browser opens.

Meep Meep

Not long after the launch of Acme’s online product catalog, customers began complaining that browsing the catalog was simply too slow. After some discussion, Management decided to bring in Roadrunner Consulting to assess the site and make suggestions that would improve performance.

Roadrunner’s team raced around (as they are wont to do), ran some tests, made some observations, and provided a list of suggestions. At the top of the list was “Implement SharePoint BLOB Caching.”

So, Acme’s SharePoint administrators jumped right in and turned on BLOB caching. Since the site is served up through a single IIS site (SharePoint zone), the admins set enabled=“true” in the BlobCache element of the site’s web.config file. No other changes were made to the BlobCache element.

So, what happened? Well, things got snappier! The administrators watching their back-end performance noticed that the file system on the WFE started to cache BLOBs that were being requested by users. Each request to the WFE for one of those BLOBs resulted in the BLOB being served back directly from the WFE without a round-trip to the SQL Server. Internal network bandwidth utilization dropped significantly, and the SQL Servers started breathing a bit easier. The administrators were most definitely happy with the change they’d made … and it was as easy as setting enabled=”true” in the BlobCache element of the web.config file. Talk about the greatest thing since sliced bread! Everyone exchanged a round of high-fives after the change was made, and talks of how the geeks would rise up to dominate the world resumed.

Dynamite Article Page - First Request with BLOB Caching enabledSo, how do things look on the client side after enabling the BLOB Cache? Well, when someone goes to retrieve Wiley’s article for the first time, the first browser request series for the page looks much like it did without the BLOB Cache enabled. See the Fiddler trace on the right.

There is one very important difference when retrieving items with the BLOB Cache enabled, though, and you have to look closely to see it. Do you see the Cache-Control HTTP header that is returned with the request for the newsarticleimage.jpg image? It’s different than it was before the BLOB Cache was enabled. Now it says

Cache-Control: public, max-age=86400

Whoa … what does this mean? Well, it means two important things. First, the public designation means that when the image is cached by the browser, it will no longer be private to the current session. It can be re-used across sessions, so it won’t necessarily “go away” when the browser is closed.

Second, the max-age=86400 means that the image will continue to “live” in the browser’s cache for 86400 seconds, or 24 hours. For that period of time, the browser won’t even attempt to contact the WFE to see if the image has changed; it will just use the copy that it holds onto. Nothing short of a browser cache flush (which is manual intervention by the user) will change this behavior.

Dynamite Article Page - Subsequent page requests with BLOB Caching enabledAnd that’s what we see with the Fiddler trace on the right. This trace represents what subsequent page requests look like for the next 24 hours. Notice that the newsarticleimage.jpg image doesn’t get re-requested or checked. There are no HTTP 304 response codes coming back, because the browser simply isn’t requesting the image; it’s using its cached copy.

Admittedly, the Fiddler trace will look a little different when the browser is closed and re-opened … but a re-fetch of the newsarticleimage.jpg file will not take place for a full 24 hours unless a user clears the browser cache.

What does this change in behavior mean for actual users of the site? Read on to find out …

Running Off the Edge of the Cliff

The corrected article page showing the TNT barrelShortly after the BLOB Cache changes were made, Wiley got an (unrelated) call from the Fulfillment Department. They were furious because they’d been getting all sorts of returns for the Bundle o’ Dynamite. The reason for the returns? It’s because Wiley put the wrong image in his article page!

Even though Acme sells a product called the “Bundle o’ Dynamite,” the actual product that ships is a barrel of TNT. Since the product image was wrong, customers were incorrectly concluding that they’d get several sticks of dynamite instead of a barrel, and this was rubbing many of them the wrong way. Who knew?

Wiley went out to SharePoint, checked the article that he wrote, and saw that he did indeed use a series of dynamite sticks for an image. The page should have actually appeared as it does in the screenshot that is above and to the left. After a quick facepalm, Wiley realized that he needed to make a change – and fast.

Wiley went out to the Publishing Images library for the site collection and uploaded a new version of the newsarticleimage.jpg image file – one that contained a barrel of TNT instead of a bundle of dynamite. He then browsed to the article page and did a refresh.

Nothing changed.

Wiley hit F5 in his browser. Still nothing changed.

Over the course of the hour that followed, Wiley grew increasingly more bewildered and panicked as he tried in vain to get the new TNT barrel to show up on the article page. He uploaded the image several more times, closed and re-opened his browser, deleted and then reloaded the image, re-published and re-approved the actual article page, and even got the administrators to flush the SharePoint BLOB Cache. None of the actions made a difference.

The Coyote Never Wins

Why didn’t any of Wiley’s efforts make a difference? Because what Wiley didn’t understand was that there was nothing he could do short of flushing his cache that would prompt the browser to re-request the updated image. The browser started using the cached copy of the image after the first request Wiley made in the morning; i.e., the request to verify that the image on the page was incorrect as Fulfillment indicated. For another 24 hours (86400 seconds), the browser would continue to use the cached image.

Wiley’s image problem was just one of the potential issues that might surface as a result of the BLOB Cache change. It was also one of the more visible problems. In looking at the path attribute of the BlobCache element, you might have noticed some of the other file types that got cached by default – file types with js (JavaScript) and css (Cascading Style Sheets) extensions, for example. Any of those file types which were served from site collection lists and libraries would also be impacted by the “fetch once and use for 24 hours” behavior.

Recommendations Before You Enable the BLOB Cache

A frustrated end userI hope the example featuring Wiley did an adequate job of explaining why I think that blindly turning on the BLOB Cache can be a bad thing for end users. Having seen first-hand what an improperly configured BLOB Cache can do to the user experience, I’d like to offer up a handful of suggestions based on my own experience.

1. Don’t just “enable” the BLOB Cache with its out-of-the-box (OOTB) default settings. There are a couple of OOTB settings that you should really think hard about changing. I mentioned the default max-age value you get if you don’t actually specify the attribute value. I’m going to talk more about that one in a bit. Also: do you really want the BLOB Cache using your system drive (C:) as its target location for cached files? Most admins I know aren’t particularly friendly with that idea, so relocate the BLOB Cache to another drive.

2. If your Web application has only one zone (i.e., the Default zone), strongly consider specifying a max-age attribute value of zero (max-age=”0”). Why do I say this? Because it avoids the situation I described with Wiley above, and it’s a compromise that gives administrators some of the performance boosts they seek without completely shafting users in the process.

Dynamite Article Page - max-age = 0 in effectWhen the BLOB Cache is enabled and a max-age attribute value of 0 is explicitly specified, things change a bit. BLOB caching and offloading still happens on the WFEs, so administrators get the internal performance boosts they were probably seeking in the first place. On the other side of the equation (i.e., the “user side”), persistent client side caching ceases as shown on the left. Although the Cache-Control header still specifies public cacheability, the max-age=0 ensures that the browser will round-trip to the server each time it intends to use a locally cached resource to ensure that the most up-to-date copy of the resource is in the cache. This will keep users like Wiley from going off the deep end due to the wonky and inconsistent user experience that afflicts users who need to edit and proof a site that employs persistent client-side caching.

3. If you have a Web application that is extended to two or more zones, apply BLOB Cache settings that are appropriate for each zone. This is relatively common in public-facing SharePoint site collections and Web applications where anonymous access is in-use. In these particular scenarios, there are usually at least two SharePoint zones per Web application: an internal zone (typically the Default zone) through which editors and other users may authenticate to carry out content work, and an external zone (e.g., the Internet zone) which is set up for anonymous access and “external consumption.”

In this dual-zone scenario, it makes sense to configure each zone (IIS site) differently since usage patterns differ between zones. The BlobCache element in the web.config for the internal (Default) zone, for example, should probably be configured according to #2 (above – the one zone scenario with a max-age attribute value of zero). For the web.config that is used in the external zone, though, it may make sense to apply a non-zero max-age value for use with the BLOB Cache – especially since anonymous users aren’t (normally) content editors. A non-zero max-age means fewer trips (overall) to your WFEs from outside the LAN environment, and this helps to keep bandwidth utilization on your Internet connection. There is still a risk that external users may see “stale” content, but the impact is generally more acceptable for straight viewers since they aren’t actively working on content.

4. Consider changing the path expression to restrict what goes into the BLOB Cache. The default path expression for SharePoint 2010’s BlobCache element looks like this:

\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$

Most administrators are savvy enough to add and remove file extensions from this expression as needed; for example, taking |wmv out of the path expression means that the BLOB Cache will no longer store and serve files with a .wmv extension. Adding and removing extensions really only scratches the surface of what can be done, though. The path attribute value is actually a regular expression, so the full power of regular expressions can be applied to select and exclude files for use with the BLOB Cache.

Suppose you want to explicitly control which images, videos, and other files (that match the list of extensions) end up in the BLOB Cache? Maybe you want to specially name files you intend to cache with an additional .cache extension before the actual file type extension (e.g., .gif). To accomplish this, you could change the path expression to this:

\.cache\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$

With this path expression, filenames like these would be included in the BLOB Cache:

  • SampleImage.cache.jpg
  • MyVideo.cache.wmv

… but anything without the additional .cache qualifier would get omitted, such as:

  • AnotherImage.jpg
  • ExcludeThisVideo.wmv

This is just a simple example, but hopefully it gives you an idea of what you could do with the path regular expression to control the contents of the BLOB Cache.

Summing It Up

The SharePoint BLOB Cache is a powerful mechanism to improve farm performance and scalability, but it shouldn’t be turned on without some forethought and a couple of changes to the default BlobCache element attribute values.

If you are an administrator and have enabled the BLOB Cache with its default values, check with your users. They might have some feedback for you …

Additional Reading and Resources

  1. CodePlex: SharePoint 2010 BlobCache Manager
  2. Event: SPTechCon San Francisco 2012
  3. Prezi: Pushing SharePoint’s ‘Go Faster’ Button
  4. Blog Post: Client-Server Interactions and the max-age Attribute with SharePoint BLOB Caching
  5. Tool: Fiddler Web Debugging Proxy

Mirror, Mirror, In the Farm …

SQL Server mirroring support is a welcome addition to SharePoint 2010. Although SharePoint 2010 makes use of the Failover Partner keyword in its connection strings, SharePoint itself doesn’t appear to know whether or not SQL Server has failed-over for any given database. This post explores this topic in more depth and provides a PowerShell script to dump a farm’s mirroring configuration.

This is a post I’ve been meaning to write for some time, but I’m only now getting around to it. It’s a quick one, and it’s intended to share a couple of observations and a script that may be of use to those of you who are SharePoint 2010 administrators.

Mirroring and SharePoint

The use of SQL Server mirroring isn’t something that’s unique to SharePoint, and it was possible to leverage mirroring with SharePoint 2007 … though I tended to steer people away from trying it unless they had a very specific reason for doing so and no other approach would work. There were simply too many hoops you needed to jump through in order to get mirroring to work with SharePoint 2007, primarily because SharePoint 2007 wasn’t mirroring-aware. Even if you got it working, it was … finicky.

SharePoint 2010, on the other hand, is fully mirroring-aware through the use of the Failover Partner keyword in connection strings used by SharePoint to connect to its databases.

(Side note: if you aren’t familiar with the Failover Partner keyword, here’s an excellent breakdown by Michael Aspengren on how the SQL Server Native Provider leverages it in mirroring configurations.)

There are plenty of blog posts, articles (like this one from TechNet), and books (like the SharePoint 2010 Disaster Recovery Guide that John Ferringer and I wrote) that talk about how to configure mirroring. It’s not particularly tough to do, and it can really help you in situations where you need a SQL Server-based high availability and/or remote redundancy solution for SharePoint databases.

This isn’t a blog post about setting up mirroring; rather, it’s a post to share some of what I’ve learned (or think I’ve learned) and related “ah-ha” moments when it comes to mirroring.

What Are You Pointing At?

This all started when Jay Strickland (one of the Quality Assurance (QA) folks on my team at Idera) ran into some problems with one of our SharePoint 2010 farms that was used for QA purposes. The farm contained two SQL Server instances, and the database instances were setup such that the databases on the second instance mirrored the databases on the first (principal) instance. Jay had configured SharePoint’s service applications and Web applications for mirroring, so all was good.

But not really. The farm had been running properly for quite some time, but something had gone wrong with the farm’s mirroring configuration – or so it seemed. That’s when Jay pinged me on Skype one day with a question (which I’m paraphrasing here):

Is there any way to tell (from within SharePoint) which SQL Server instance is in-use by SharePoint at any given time for a database that is being mirrored?

It seemed like a simple question that should have a simple answer, but I was at a loss to give Jay anything usable off the top of my head. I told Jay that I’d get back to him and started doing some digging.

The SPDatabase Type

Putting on my developer hat for a second, I recalled that all SharePoint databases are represented by an instance of the SPDatabase type (Microsoft.SharePoint.Administration.Database specifically) or one of the other classes that derive from it, such as SPContentDatabase. Running down the available members for the SPDatabase type, I came up with the following properties and methods that were tied to mirroring in some way:

  • FailoverServer
  • FailoverServiceInstance
  • AddFailoverServiceInstance()

What I thought I would find (but didn’t) was one or more properties and/or methods that would allow me to determine which SQL Server instance was serving as the active connection point for SharePoint requests.

In fact, the more digging that I did, the more that it appeared that SharePoint had no real knowledge of where it was actually connecting to for data in mirrored setups. It was easy enough to specify which database instances should be used for mirroring configurations, but there didn’t appear to be any way to determine (from within SharePoint) if the principal was in-use or if failover to the mirrored instance had taken place.

The Key Takeaway

If you’re familiar with SQL Server mirroring and how it’s implemented, then the following diagram (which I put together for discussion) probably looks familiar:

SharePoint connecting to mirrored database

This diagram illustrates a couple of key points:

  1. SharePoint connects to SQL Server databases using the SQL Server Native Client
  2. SharePoint supplies a connection string that tells the native client which SQL Server instances (as Data Source and Failover Partner) should be used as part of a mirroring configuration.
  3. It’s the SQL Server Native Client that actually determines where connections are made, and the results of the Client’s decisions don’t directly surface through SharePoint.
    Number 3 was the point that I kept getting stuck on. I knew that it was possible to go into SQL Server Management Studio or use SQL Server’s Management Objects (SMO) directly to gain more insight around a mirroring configuration and what was happening in real-time, but I thought that SharePoint must surely surface that information in some form.

Apparently not.

Checking with the Experts

I hate when I can’t nail down a definitive answer. Despite all my reading, I wanted to bounce the conclusions I was drawing off of a few people to make sure I wasn’t missing something obvious (or hidden) with my interpretation.

  • I shot Bill Baer (Senior Technical Product Manager for SharePoint and an MCM) a note with my question about information surfacing through SharePoint. If anyone could have given me a definitive answer, it would have been him. Unfortunately, I didn’t hear back from him. In his defense, he’s pretty doggone busy.
  • I put a shout out on Twitter, and I did hear back from my good friend Todd Klindt. While he couldn’t claim with absolute certainty that my understanding was on the mark, he did indicate that my understanding was in-line with everything he’d read and conclusions he had drawn.
  • I turned to Enrique Lima, another good friend and SQL Server MCM, with my question. Enrique confirmed that SQL SMO would provide some answers, but he didn’t have additional thoughts on how that information might surface through SharePoint.

Long and short: I didn’t receive rock-solid confirmation on my conclusions, but my understanding appeared to be on-the-mark. If anyone knows otherwise, though, I’d love to hear about it (and share the information here – with proper recognition for the source, of course!)

Back to the Farm

In the end, I wasn’t really able to give Jay much help with the QA farm that he was trying to diagnose. Since I couldn’t determine where SharePoint was pointing from within SharePoint itself, I did the next best thing: I threw together a PowerShell script that would dump the (mirroring) configuration for each database in the SharePoint farm.

<#
.SYNOPSIS
   SPDBMirrorInfo.ps1
.DESCRIPTION
   Examines each of the databases in the SharePoint environment to identify which have failover partners and which don't.
.NOTES
   Author: Sean McDonough
   Last Revision: 19-August-2011
#>
function DumpMirroringInfo ()
{
	# Make sure we have the required SharePoint snap-in loaded.
	$spCmdlets = Get-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction silentlycontinue
	if ($spCmdlets -eq $Null)
	{ Add-PSSnapin Microsoft.SharePoint.PowerShell }

	# Grab databases and determine which have failover support (and which don't)
	$allDatabases = Get-SPDatabase
	$dbsWithoutFailover = $allDatabases | Where-Object {$_.FailoverServer -eq $null} | Sort-Object -Property Name
	$dbsWithFailover = $allDatabases | Where-Object {$_.FailoverServer -ne $null} | Sort-Object -Property Name
	
	# Write out unmirrored databases
	if ($dbsWithoutFailover -eq $null)
	{ Write-Host "`n`nNo databases are configured without a mirroring partner." }
	else
	{ 
		Write-Host ("`n`nDatabases without a mirroring partner: {0}" -f $dbsWithoutFailover.Count) 
		$dbsWithoutFailover | Format-Table -Property Name, Server -AutoSize 
	}

	# Dump results for mirrored databases
	if ($dbsWithFailover -eq $null)
	{ Write-Host "`nNo databases are configured with a mirroring partner." }
	else
	{ 
		Write-Host ("`nDatabases with a mirroring partner: {0}" -f $dbsWithFailover.Count) 
		$dbsWithFailover | Format-Table -Property Name, Server, FailoverServer -AutoSize
	}
	
	# For ease of reading
	Write-Host ("`n`n")
}
DumpMirroringInfo

The script itself isn’t rocket science, but it did actually prove helpful in identifying some databases that had apparently “lost” their failover partners.

Additional Reading and Resources

  1. MSDN: Using Database Mirroring
  2. Whitepaper: Using database mirroring (Office SharePoint Server)
  3. Blog Post: Clarification on the Failover Partner in the connectionstring in Database Mirror setup
  4. TechNet: Configure availability by using SQL Server database mirroring (SharePoint Server 2010)
  5. Book: The SharePoint 2010 Disaster Recovery Guide
  6. Blog: John Ferringer’s “My Central Admin”
  7. Blog: Jay Strickland’s “Slinger’s Thoughts
  8. Company: Idera
  9. MSDN: SPDatabase members
  10. MSDN: SQL Server Management Objects (SMO)
  11. Blog: Bill Baer
  12. Blog: Todd Klindt’s SharePoint Admin Blog
  13. Blog: Enrique Lima’s Intentional Thinking

Finding Duplicate GUIDs in Your SharePoint Site Collection

In this self-described “blog post you should never need,” I talk about finding objects with duplicate GUIDs in a client’s SharePoint site collection. I supply the PowerShell script used to find the duplicate GUIDs and offer some suggestions for how you might remedy such a situation.

This is a bit of an oldie, but I figured it might help one or two random readers.

Let me start by saying something right off the bat: you should never need what I’m about to share.  Of course, how many times have you heard “you shouldn’t ever really need this” when it comes to SharePoint?  I’ve been at it a while, and I can tell you that things that never should happen seem to find a way into reality – and into my crosshairs for troubleshooting.

Disclaimer

The story and situation I’m about to share is true.  I’m going to speak in generalities when it comes to the identities of the parties and software involved, though, to “protect the innocent” and avoid upsetting anyone.

The Predicament

I was part of a team that was working with a client to troubleshoot problems that the client was encountering when they attempted to run some software that targeted SharePoint site collections.  The errors that were returned by the software were somewhat cryptic, but they pointed to a problem handling certain objects in a SharePoint site collection.  The software ran fine when targeting all other site collections, so we naturally suspected that something was wrong with only one specific site collection.

After further examination of logs that were tied to the software, it became clear that we had a real predicament.  Apparently, the site collection in question contained two or more objects with the same identity; that is, the objects had ID properties possessing the same GUID.  This isn’t anything that should ever happen, but it had.  SharePoint continued to run without issue (interestingly enough), but the duplication of object GUIDs made it downright difficult for any software that depended on unique object identities being … well, unique.

Although the software logs told us which GUID was being duplicated, we didn’t know which SharePoint object or objects the GUID was tied to.  We needed a relatively quick and easy way to figure out the name(s) of the object or objects which were being impacted by the duplicate GUIDs.

Tackling the Problem

It is precisely in times like those described that PowerShell comes to mind.

My solution was to whip-up a PowerShell script (FindDuplicateGuids.ps1) that processed each of the lists (SPList) and webs (SPWeb) in a target site collection.  The script simply collected the identities of each list and web and reported back any GUIDs that appeared more than once.

The script created works with both SharePoint 2007 and SharePoint 2010, and it has no specific dependencies beyond SharePoint being installed and available on the server where the script is run.

########################
# FindDuplicateGuids.ps1
# Author: Sean P. McDonough (sean@sharepointinterface.com)
# Blog: http://SharePointInterface.com
# Last Update: August 29, 2013
#
# Usage from prompt: ".\FindDuplicateGuids.ps1 <siteUrl>"
#   where <siteUrl> is site collection root.
########################


#########
# IMPORTS
# Import/load common SharePoint assemblies that house the types we'll need for operations.
#########
Add-Type -AssemblyName "Microsoft.SharePoint, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c"


###########
# FUNCTIONS
# Leveraged throughout the script for one or more calls.
###########
function SpmBuild-WebAndListIdMappings {param ($siteUrl)
	$targetSite = New-Object Microsoft.SharePoint.SPSite($siteUrl)
	$allWebs = $targetSite.AllWebs
	$mappings = New-Object System.Collections.Specialized.NameValueCollection
	foreach ($spWeb in $allWebs)
	{
		$webTitle = "WEB '{0}'" -f $spWeb.Title
		$mappings.Add($spWeb.ID, $webTitle)
		$allListsForWeb = $spWeb.Lists
		foreach ($currentList in $allListsForWeb)
		{
			$listEntry = "LIST '{0}' in Web '{1}'" -f $currentList.Title, $spWeb.Title
			$mappings.Add($currentList.ID, $listEntry)
		}
		$spWeb.Dispose()
	}
	$targetSite.Dispose()
	return ,$mappings
}

function SpmFind-DuplicateMembers {param ([System.Collections.Specialized.NameValueCollection]$nvMappings)
	$duplicateMembers = New-Object System.Collections.ArrayList
	$allkeys = $nvMappings.AllKeys
	foreach ($keyName in $allKeys)
	{
		$valuesForKey = $nvMappings.GetValues($keyName)
		if ($valuesForKey.Length -gt 1)
		{
			[void]$duplicateMembers.Add($keyName)
		}
	}
	return ,$duplicateMembers
}


########
# SCRIPT
# Execution of actual script logic begins here
########
$siteUrl = $Args[0]
if ($siteUrl -eq $null)
{
	$siteUrl = Read-Host "`nYou must supply a site collection URL to execute the script"
}
if ($siteUrl.EndsWith("/") -eq $false)
{
	$siteUrl += "/"
}
Clear-Host
Write-Output ("Examining " + $siteUrl + " ...`n")
$combinedMappings = SpmBuild-WebAndListIdMappings $siteUrl
Write-Output ($combinedMappings.Count.ToString() + " GUIDs processed.")
Write-Output ("Looking for duplicate GUIDs ...`n")
$duplicateGuids = SpmFind-DuplicateMembers $combinedMappings
if ($duplicateGuids.Count -eq 0)
{
	Write-Output ("No duplicate GUIDs found.")
}
else
{
	Write-Output ($duplicateGuids.Count.ToString() + " duplicate GUID(s) found.")
	Write-Output ("Non-unique GUIDs and associated objects appear below.`n")
	foreach ($keyName in $duplicateGuids)
	{
		$siteNames = $combinedMappings[$keyName]
		Write-Output($keyName + ": " + $siteNames)
	}
}
$dumpData = Read-Host "`nDo you want to send the collected data to a file? (Y/N)"
if ($dumpData -match "y")
{
	$fileName = Read-Host "  Output file path and name"
	Write-Output ("Results for " + $siteUrl) | Out-File -FilePath $fileName
	$allKeys = $combinedMappings.AllKeys
	foreach ($currentKey in $allKeys)
	{
		Write-Output ($currentKey + ": " + $combinedMappings[$currentKey]) | Out-File -FilePath $fileName -Append
	}
}
Write-Output ("`n")

Running this script in the client’s environment quickly identified the two lists that contained the same ID GUIDs.  How did they get that way?  I honestly don’t know, nor am I going to hazard a guess …

What Next?

If you’re in the unfortunate position of owning a site collection that contains objects possessing duplicate ID GUIDs, let me start by saying “I feel for you.”

Having said that: the quickest fix seemed to be deleting the objects that possessed the same GUIDs.  Those objects were then rebuilt.  I believe we handled the delete and rebuild manually, but there’s nothing to say that an export and subsequent import (via the Content Deployment API) couldn’t be used to get content out and then back in with new object IDs. 

A word of caution: if you do leverage the Content Deployment API and do so programmatically, simply make sure that object identities aren’t retained on import; that is, make sure that SPImportSettings.RetainObjectIdentity = false – not true.

Additional Reading and References

  1. TechNet: Import and export: STSADM operations
  2. MSDN: SPImportSettings.RetainObjectIdentity