Finding Duplicate GUIDs in Your SharePoint Site Collection

In this self-described “blog post you should never need,” I talk about finding objects with duplicate GUIDs in a client’s SharePoint site collection. I supply the PowerShell script used to find the duplicate GUIDs and offer some suggestions for how you might remedy such a situation.

This is a bit of an oldie, but I figured it might help one or two random readers.

Let me start by saying something right off the bat: you should never need what I’m about to share.  Of course, how many times have you heard “you shouldn’t ever really need this” when it comes to SharePoint?  I’ve been at it a while, and I can tell you that things that never should happen seem to find a way into reality – and into my crosshairs for troubleshooting.

Disclaimer

The story and situation I’m about to share is true.  I’m going to speak in generalities when it comes to the identities of the parties and software involved, though, to “protect the innocent” and avoid upsetting anyone.

The Predicament

I was part of a team that was working with a client to troubleshoot problems that the client was encountering when they attempted to run some software that targeted SharePoint site collections.  The errors that were returned by the software were somewhat cryptic, but they pointed to a problem handling certain objects in a SharePoint site collection.  The software ran fine when targeting all other site collections, so we naturally suspected that something was wrong with only one specific site collection.

After further examination of logs that were tied to the software, it became clear that we had a real predicament.  Apparently, the site collection in question contained two or more objects with the same identity; that is, the objects had ID properties possessing the same GUID.  This isn’t anything that should ever happen, but it had.  SharePoint continued to run without issue (interestingly enough), but the duplication of object GUIDs made it downright difficult for any software that depended on unique object identities being … well, unique.

Although the software logs told us which GUID was being duplicated, we didn’t know which SharePoint object or objects the GUID was tied to.  We needed a relatively quick and easy way to figure out the name(s) of the object or objects which were being impacted by the duplicate GUIDs.

Tackling the Problem

It is precisely in times like those described that PowerShell comes to mind.

My solution was to whip-up a PowerShell script (FindDuplicateGuids.ps1) that processed each of the lists (SPList) and webs (SPWeb) in a target site collection.  The script simply collected the identities of each list and web and reported back any GUIDs that appeared more than once.

The script created works with both SharePoint 2007 and SharePoint 2010, and it has no specific dependencies beyond SharePoint being installed and available on the server where the script is run.

########################
# FindDuplicateGuids.ps1
# Author: Sean P. McDonough (sean@sharepointinterface.com)
# Blog: http://SharePointInterface.com
# Last Update: August 29, 2013
#
# Usage from prompt: ".\FindDuplicateGuids.ps1 <siteUrl>"
#   where <siteUrl> is site collection root.
########################


#########
# IMPORTS
# Import/load common SharePoint assemblies that house the types we'll need for operations.
#########
Add-Type -AssemblyName "Microsoft.SharePoint, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c"


###########
# FUNCTIONS
# Leveraged throughout the script for one or more calls.
###########
function SpmBuild-WebAndListIdMappings {param ($siteUrl)
	$targetSite = New-Object Microsoft.SharePoint.SPSite($siteUrl)
	$allWebs = $targetSite.AllWebs
	$mappings = New-Object System.Collections.Specialized.NameValueCollection
	foreach ($spWeb in $allWebs)
	{
		$webTitle = "WEB '{0}'" -f $spWeb.Title
		$mappings.Add($spWeb.ID, $webTitle)
		$allListsForWeb = $spWeb.Lists
		foreach ($currentList in $allListsForWeb)
		{
			$listEntry = "LIST '{0}' in Web '{1}'" -f $currentList.Title, $spWeb.Title
			$mappings.Add($currentList.ID, $listEntry)
		}
		$spWeb.Dispose()
	}
	$targetSite.Dispose()
	return ,$mappings
}

function SpmFind-DuplicateMembers {param ([System.Collections.Specialized.NameValueCollection]$nvMappings)
	$duplicateMembers = New-Object System.Collections.ArrayList
	$allkeys = $nvMappings.AllKeys
	foreach ($keyName in $allKeys)
	{
		$valuesForKey = $nvMappings.GetValues($keyName)
		if ($valuesForKey.Length -gt 1)
		{
			[void]$duplicateMembers.Add($keyName)
		}
	}
	return ,$duplicateMembers
}


########
# SCRIPT
# Execution of actual script logic begins here
########
$siteUrl = $Args[0]
if ($siteUrl -eq $null)
{
	$siteUrl = Read-Host "`nYou must supply a site collection URL to execute the script"
}
if ($siteUrl.EndsWith("/") -eq $false)
{
	$siteUrl += "/"
}
Clear-Host
Write-Output ("Examining " + $siteUrl + " ...`n")
$combinedMappings = SpmBuild-WebAndListIdMappings $siteUrl
Write-Output ($combinedMappings.Count.ToString() + " GUIDs processed.")
Write-Output ("Looking for duplicate GUIDs ...`n")
$duplicateGuids = SpmFind-DuplicateMembers $combinedMappings
if ($duplicateGuids.Count -eq 0)
{
	Write-Output ("No duplicate GUIDs found.")
}
else
{
	Write-Output ($duplicateGuids.Count.ToString() + " duplicate GUID(s) found.")
	Write-Output ("Non-unique GUIDs and associated objects appear below.`n")
	foreach ($keyName in $duplicateGuids)
	{
		$siteNames = $combinedMappings[$keyName]
		Write-Output($keyName + ": " + $siteNames)
	}
}
$dumpData = Read-Host "`nDo you want to send the collected data to a file? (Y/N)"
if ($dumpData -match "y")
{
	$fileName = Read-Host "  Output file path and name"
	Write-Output ("Results for " + $siteUrl) | Out-File -FilePath $fileName
	$allKeys = $combinedMappings.AllKeys
	foreach ($currentKey in $allKeys)
	{
		Write-Output ($currentKey + ": " + $combinedMappings[$currentKey]) | Out-File -FilePath $fileName -Append
	}
}
Write-Output ("`n")

Running this script in the client’s environment quickly identified the two lists that contained the same ID GUIDs.  How did they get that way?  I honestly don’t know, nor am I going to hazard a guess …

What Next?

If you’re in the unfortunate position of owning a site collection that contains objects possessing duplicate ID GUIDs, let me start by saying “I feel for you.”

Having said that: the quickest fix seemed to be deleting the objects that possessed the same GUIDs.  Those objects were then rebuilt.  I believe we handled the delete and rebuild manually, but there’s nothing to say that an export and subsequent import (via the Content Deployment API) couldn’t be used to get content out and then back in with new object IDs. 

A word of caution: if you do leverage the Content Deployment API and do so programmatically, simply make sure that object identities aren’t retained on import; that is, make sure that SPImportSettings.RetainObjectIdentity = false – not true.

Additional Reading and References

  1. TechNet: Import and export: STSADM operations
  2. MSDN: SPImportSettings.RetainObjectIdentity

Review of “SharePoint 2010 Six-In-One”

In this post I review “SharePoint 2010 Six-In-One” by Chris Geier, Becky Bertram, Andrew Clark, Cathy Dew, Ray Mitchell, Wes Preston, and Ken Schaefer.

I read a lot.  Honestly, I assume that most people who work with technology spend a fair bit of their time reading.  Maybe it’s books, maybe it’s blogs – whatever.  There’s simply too much knowledge out there, and the human brain is only so big, to not be brushing-up on the ol’ technical skill set on a fairly regular basis.

When free books are dangled in front of me, naturally I jump.  I jump even higher when they’re books that I probably would have ended up buying had they not been given to me gratis.

The Opportunity

Several months ago, I received an e-mail from Becky Bertram.  Becky is an exceptionally knowledgeable SharePoint MVP and all-around wonderful woman.  Becky and I first met (albeit briefly) at the MS SharePoint Conference in Las Vegas (2009), and since that time we’ve spoken at a couple of the same events.

In my conversations with Becky and through Twitter, I knew that she was part of a team that was working to assemble a book on SharePoint 2010.  In her e-mail to me, she asked if I’d be interested in a copy of it.  Given what I’ve said about reading, it should come as no surprise to see me say that I jumped at her offer.

Fast forward a bit.  I’ve had SharePoint 2010 Six-In-One for a couple of months now, and I’ve managed to read a solid 80% of its 500+ pages thus far.  Unfortunately, I’m a very slow reader.  I always have been, and I probably always will be.  I probably should have told Becky that before she agreed to send me a copy of the book …

Top-Level Assessment

SharePoint 2010 Six-In-One CoverLet me start by saying that simply put, I think this book is an excellent SharePoint resource.  The reasons that one would find the book useful will likely vary based on their existing knowledge of SharePoint, but I believe that everyone from across the spectrum, newcomer to SharePoint journeyman, will find the book helpful in some way. 

The rest of this post/review explains the book, its intended audience, what it conveys, and some of my additional thoughts.

The Authors

First, let me start by giving credit where it was due.  The SharePoint 2010 Six-In-One is the collaborative effort of seven different and active members of the larger SharePoint community.

    I know several of these folks personally, and that’s one of the reasons why I was so excited to review the book.  Most of the authors are active in user groups.  Nearly all contribute socially through Twitter and other channels.  Many speak at SharePoint Saturdays and other events.  Some are designated Most Valuable Professionals (MVPs) by Microsoft.  All are darn good at what they do.

Target Audience

This book was written primarily for relative newcomers to SharePoint 2010, and this demographic is the one that will undoubtedly get the most value out of the book.  As the title of the book indicates, the authors covered six of the core SharePoint areas that anyone wrangling with SharePoint 2010 would need information on:

  • Branding
  • Business Connectivity Services
  • Development
  • Search
  • Social Networking
  • Workflow
    The book devotes a few chapters to each topic, and each topic is covered solidly from an introductory perspective.  Many of the common questions and concerns associated with each topic are also addressed in some way, and particulars for some of the topics (like development) are actually covered at a significantly deeper level.

Although it might get glossed-over by some, I want to call attention to a particularly valuable inclusion; specifically, the first three chapters.  These chapters do a fantastic job of explaining the essence of SharePoint, what it is, how to plan for it, concerns that implementers should have, and more.  Given SharePoint’s complexity and “tough to define” nature, I have to applaud the authors on managing to sum-up SharePoint so well in only 60 pages.  Anyone getting started with SharePoint will find these chapters to be excellent on-ramp and starting point for SharePoint.

Contents

The following is the per-chapter breakdown for the book’s content:

  • Chapter 1: SharePoint Overview
  • Chapter 2: Planning for SharePoint
  • Chapter 3: Getting Started with SharePoint
  • Chapter 4: Master Pages
  • Chapter 5: SharePoint Themes
  • Chapter 6: Cascading Style Sheets and SharePoint
  • Chapter 7: Features and Solutions
  • Chapter 8: Introducing SharePoint Development
  • Chapter 9: Publishing in SharePoint Server 2010
  • Chapter 10: Introducing Business Connectivity Services
  • Chapter 11: Building Solutions Using Business Connectivity Services
  • Chapter 12: Why Social Networking Is Important in SharePoint 2010
  • Chapter 13: Tagging and Ratings
  • Chapter 14: My Site
  • Chapter 15: Workflow Introduction and Background
  • Chapter 16: Building and Using Workflow in SharePoint 2010
  • Chapter 17: Visual Studio: When SharePoint Designer Is Not Enough
  • Chapter 18: Introduction to Enterprise Search
  • Chapter 19: Administering and Customizing
  • Chapter 20: FAST Search
  • Chapter 21: Wrapping It All Up

The Experienced SharePoint Reader

So, what if you happen to know a bit about SharePoint and/or have been working with SharePoint 2010 for some time?  I’m in this particular boat, and I have good news: this book strikes just the right balance of breadth and depth so as to be useful as a reference source.  Although the book doesn’t provide really deep dives into its topic areas (not its intent), I found myself reaching for it on a handful of occasions to get myself going on some SharePoint tasks I had to accomplish.  A quick review of Cathy’s chapters on branding, for instance, gave me just the right amount of information needed to get started on a small side project of my own.

Summary

Bottom line: SharePoint 2010 Six-In-One contains just the right mix of breadth and depth so as to be immediately informative to newcomers but also useful as a reference source in the longer term.   I’d recommend this book for anyone working with SharePoint, and I’d especially recommend it to those who are new to SharePoint 2010 and/or seeking to get a grasp on its core aspects. 

Additional Reading and References

  1. People: Becky Bertram
  2. Book: SharePoint 2010 Six-In-One
  3. Author (Twitter): Chris Geier
  4. Author (blog): Cathy Dew
  5. Author (blog): Wes Preston
  6. Author (blog): Raymond Mitchell
  7. Author (blog): Becky Bertram
  8. Author (blog): Ken Schaefer
  9. Author (Twitter): Andrew Clark
  10. Events: SharePoint Saturday
  11. Designation: Most Valuable Professional (MVP)

Release of the SharePoint 2010 Disaster Recovery Guide

The SharePoint 2010 Disaster Recovery Guide is now available! In this post, I provide a small peek into the contents of the book and the people who helped make it a reality.

Since my first copy of our new book actually arrived in the mail yesterday (from Amazon.com), I think I can officially announce that the SharePoint 2010 Disaster Recovery Guide is available!  Here’s a picture of it – straight out of the box:

SharePoint 2010 Disaster Recovery Guide

John Ferringer and I apparently didn’t learn our lesson the first time around.  When Cengage approached us about writing another version of the book, we said “yes.”  We were either in denial or had repressed the memories associated with writing the first book.  There were definitely some difficulties and challenges (like trying to learn the relevant pieces of the SharePoint 2010 platform while also writing about them), but we managed to pull it off again.

Of course, we couldn’t have done this without the technical prowess and patience of JD Wade.  JD was our technical editor, and he had a knack for questioning any assumption or statement that wasn’t clearly backed by fact.  He did a fantastic job – I couldn’t have been happier.  The book’s accuracy and quality are a direct result of his contributions.

What’s Inside?

Interested in what we included?  Here’s the table of contents by chapter:

  1. SharePoint Disaster Recovery Planning and Key Concepts
  2. SharePoint Disaster Recovery Design and Implementation
  3. SharePoint Disaster Recovery Testing and Maintenance
  4. SharePoint Disaster Recovery Best Practices
  5. Windows Server 2008 Backup and Restore
  6. Windows Server 2008 High Availability
  7. SQL Server 2008 Backup and Restore
  8. SQL Server 2008 High Availability
  9. SharePoint 2010 Central Administration Backup and Restore
  10. SharePoint 2010 Command Line Backup and Restore: PowerShell
  11. SharePoint 2010 Disaster Recovery Development
  12. SharePoint 2010 Disaster Recovery for End Users
  13. Conclusion

As you can see, we’ve included a little something for just about everyone who might work with SharePoint or interface with it for disaster recovery purposes.  SharePoint administrators will probably benefit the most from the book, but there are definitely sections that are of use to SharePoint developers, DR planners, and others who are interested in SharePoint from a business continuity perspective.

If you happen to pick up a copy of the book, please share your feedback with us – good, bad, ugly, or anything else you feel like sending our way!  We poured a lot of time and effort into this book in an attempt to “do our part” for the community, and your thoughts and feedback mean everything to us.

Thanks, and enjoy!

Additional Resources and References

  1. Book: SharePoint 2010 Disaster Recovery Guide
  2. Blog: John Ferringer’s MyCentralAdmin
  3. Blog: JD Wade’s Wading Through

Site Collection Backups and Workflow Portability in SharePoint 2010

In this post, I discuss my quest to determine whether or not site collection backups properly capture workflow information in SharePoint 2010. TechNet made a point of saying they didn’t, but Joel Oleson said they did. Who was right?

Do you trust TechNet?  I generally do, as I figure the good folks at Microsoft are doing their best to disseminate reliable information to those of us working with their products.  As I recently learned, though, even the information that appears on TechNet needs some cross-checking once in a while.

Bear with me, as this post is equal parts narrative and data discussion.  If you don’t like stories and want to cut straight to the chase, though, simply scroll down to the section titled “The Conclusion” for the key takeaway.

Site Collection Backup Primer

For those who aren’t overly familiar with site collection backups, it’s probably worth spending a moment discussing them a bit before going any further.  Site collection backups are, after all, at the heart of this blog post.

What is a site collection backup?  It is basically what you would expect from its name: a backup of a specific SharePoint site collection.  These backups can be used to restore or overwrite a site collection if it becomes lost or corrupt, and they can also be used to copy site collections from one web application (or farm) to another.

Anytime you execute one of the following operations, you’re performing a site collection backup:

  • from the command line: STSADM.exe –o backup –url <url> –filename <filename>
  • through PowerShell in SharePoint 2010: Backup-SPSite <url> –Path <filepath>
  • Using the “Perform a site collection backup” link in SharePoint 2010 Central Administration

When a site collection backup is executed, a single file with a .bak extension is generated that contains the entire contents of the site collection targeted.  This file can be freely copied and moved around as needed.  Aside from some recommendations regarding the maximum size of the site collection captured using this approach (15GB and under in SharePoint 2007, 85GB and under in SharePoint 2010), the backups themselves are really quite handy for both protection and site collection migration operations.

A Little Background

John Ferringer and I have been plugging away at the SharePoint 2010 Disaster Recovery Guide for quite some time.  As you might imagine, the writing process involves a lot of research, hands-on experimentation, and fact-checking.  This is especially true for a book that’s being written about a platform (SharePoint 2010) that is basically brand new in the marketplace.

While researching backup-related changes for the book, I made a special mental note of the following change regarding site collection backups in SharePoint 2010:

Site Collection Backups and Workflow

The text that is circled in the image above (taken straight from a TechNet page titled Backup and recovery overview (SharePoint Server 2010)) says this:

Workflows are not included in site collection backups

This stuck with me when I read it, because I hadn’t recalled any such statement being made with regard to site collection backups in SharePoint 2007.  Since Microsoft made a special note of pointing out this limitation for SharePoint 2010, though, I figured it was important to keep in mind.  Knowing that workflows had changed from 2007 to 2010, I reasoned that the new limitation was probably due to some internal workflow plumbing alterations that adversely affected the backup process.

The Setup

A couple of weeks back, I was presenting at SharePoint Saturday Ozarks alongside an awesome array of other folks (including Joel Oleson) from the SharePoint community.  Due to a speaker no-show in an early afternoon slot, Mark Rackley (the event’s one-man force-of-nature organizer) decided to hold an “ask the experts” panel where attendees could pitch questions at those of us who were willing to share what we knew.

A number of good questions came our way, and we all did our best to supply our experiences and usable advice.  Though I don’t recall the specific question that was asked in one particular case, I do remember advising someone to perform a site collection backup before attempting to do whatever it was they wanted to do.  After sharing that advice, though, things got a little sketchy.  The following captures the essence of the exchange that took place between Joel and me:

Me: <to the attendee> Site collection backups don’t capture everything in SharePoint 2010, though, so be careful.

Joel: No, site collection backups are full-fidelity.

Me: TechNet specifically indicates that workflows aren’t covered in site collection backups with SharePoint 2010.

Joel: No, the backups are still full fidelity.

Me: <blank stare>

The discussion topic and associated questions for the panel quickly changed, but my brain was still stripping a few gears trying to reconcile what I’d read on TechNet with what Joel was saying.

After the session, I forwarded the TechNet link I had quoted to Joel and asked if he happened to have an “inside track” or perhaps some information I didn’t have access to.  We talked about the issue for a while at the hotel a little later on, but the only thing that we could really conclude was that more research was needed to see if site collection backups had in fact changed with SharePoint 2010.  Before taking off that weekend, we decided to stay in contact and work together to get some answers.

Under The Hood

To understand why this issue bothered me so much, remember that I’m basically in the middle of co-authoring a book on the topic of disaster recovery – a topic that is intimately linked to backup and restore operations.  The last thing I would ever want to do is write a book that contains ambiguous or (worse) flat-out wrong information about the book’s central topic.

To get to the heart of the matter, I decided to start where most developers would: with the SharePoint object model.  In both SharePoint 2007 and SharePoint 2010, the object model types that are used to backup and export content typically fall into one of two general categories:

  • Catastrophic Backup and Restore API.  These types are located in the Microsoft.SharePoint.Administration.Backup namespace, and they provide SharePoint’s full-fidelity backup and restore functions.  Backup and restore operations take place on content components such as content databases, service applications, and the entire SharePoint farm.  Catastrophic backup and restore operations are full-fidelity, meaning that no data is lost or selectively ignored during a backup and subsequent restore.  By default, catastrophic backup and restore operation don’t get any more granular than a content database.  If you want to protect something within a content database, such as a site collection, sub-site, or list, you have to backup the entire content database containing the target object(s).
  • Content Deployment API.  The member types of this API (also known internally at Microsoft as the PRIME API) reside within the Microsoft.SharePoint.Deployment namespace and are used for granular content export and import operations.  The exports that are created by the types in this namespace target objects from the site collection level all the way down to the field level – typically webs, lists, list items, etc.  Content Deployment exports are not full-fidelity and are commonly used for moving content around more than they are for actual backup and restore operations.

So, where does this leave site collection backups?  In truth, site collection backups don’t fit into either of these categories.  They are a somewhat unusual case, both in SharePoint 2007 and SharePoint 2010.

Whether a site collection backup is initiated through STSADM, PowerShell, or Central Administration, a single method is called on the SPSiteCollection type which resides in the Microsoft.SharePoint.Administration namespace.  This is basically the signature of the method:

SPSiteCollection.Backup(string strSiteUrl, string strFilename, bool bOverwrite)

To carry out a site collection backup, all that is needed is the URL of the site collection, the filename that will be used for the resultant backup file, and a TRUE or FALSE to indicate whether an overwrite should occur if the selected file already exists.

If you were to pop open Reflector and drill into the Backup method on the SPSiteCollection type, you wouldn’t get very far before running into a wall at the SPRequest type.  SPRequest is a managed wrapper around the take-off point for a whole host of external calls, and the execution of the Backup method is actually handled in unmanaged legacy code.  Examining the internals of what actually takes place during a site collection backup (or restore, for that matter) simply isn’t possible with Reflector.

Since the internals of the Backup method weren’t available for reflective analysis, I was forced to drop back and punt in order to determine how site collection backups and workflow interacted within SharePoint 2010.

Testing Factors

I knew that I was going to have to execute backup and restore tests at some point; I was just hoping that I would be a bit more informed (through object model inspection) about where I needed to focus my efforts.  Without any visibility into the internals of the site collection backup process, though, I didn’t really have much to start with.

Going into the testing process, I knew that I wasn’t going to have enough time to perform exhaustive testing for every scenario, execution path, variable, and edge-case that could be relevant to the backup and restore processes.  I had to develop a testing strategy that would hit the likely problem areas as quickly (and with as few runs) as possible.

After some thought, I decided that these points were important facets to consider and account for while testing:

  • Workflow Types.  Testing the most common workflow types was important.  I knew that I would need to test at least one out of the box (OOTB) workflow type.  I also decided that I needed to test at least one instance of each type of workflow that could be created through SharePoint Designer (SPD) 2010; that meant testing a list-bound workflow, a site collection workflow, and a reusable workflow.  I decided that custom code workflows, such as those that might be created through Visual Studio, were outside the scope of my testing.
  • Workflow Data.  In order to test the impact of backup and restore operations on a workflow, I obviously had to ensure that one or more workflows were in-place within the site collection targeted for backup.  Having a workflow attached to a list would obviously test the static data portions of the workflow, but there was other workflow-related data that had to be considered.  In particular, I decided that the testing of both workflow history information and in-process workflow state were important.  More on the workflow state in a bit …
  • Backup and Restore Isolation.  While testing, it would be important to ensure that backup operations and restore operations impacted one another (or rather, had the potential to impact one another) as little as possible.  Though backups and restores occurred within the same virtual farm, I isolated them to the extent that I could.  Backups were performed in one web application, and restores were performed in a separate web application.  I even placed each web application in its own (IIS) application pool – just to be sure.  I also established a single VM snapshot starting point; after each backup and restore test, I rolled back to the snapshot point to ensure that nothing remained in the farm (or VM, for that matter) that was tied to the previous round of testing.

Testing Procedure

I created a single Publishing Portal, bolted a couple of sub-sites and Document Libraries into it, and used it as the target for my site collection backup operations.  The Document Library that I used for workflow testing varied between tests; it was not held constant and did change according to the needs of each specific test.

I ran four different workflow test scenarios.  My OOTB workflow scenario involved testing the page approval workflow for publishing pages.  My other three SPD workflow tests (list-bound, site collection, and reusable workflow) all involved the same basic set of workflow steps:

  1. Wait five minutes
  2. Create a To Do item (which had to be completed to move on)
  3. Wait five more minutes
  4. Add a comment to the workflow target

In both the OOTB workflow and SPD workflow scenarios, I wanted to perform backups while workflows were basically “in flight” to see how workflow state would or wouldn’t be impacted by the backup and restore processes.  For the publishing approval workflow, this meant taking a site collection backup while at least one page was pending approval.  For the SPD workflows, it meant capturing a backup while at least one workflow instance was in a five minute wait period and another was waiting on the completion of the To Do item.

Prior to executing a backup in each test case, I ran a couple of workflow instances from start to finish.  This ensured that I had some workflow history information to capture and restore.

Once site collection backups were captured in each test case, I restored them into the empty web application.  I then opened the restored site collection to determine what did and didn’t get transcribed through the backup and restore process.

Results Of Testing

In each workflow case (OOTB and all three SPD workflows), all workflow information that I could poke and prod appeared to survive the backup and restore process without issue.  Workflow definition data was preserved, and workflow history came over intact.  Even more impressive, though, was the fact that in-process workflow state was preserved.  SPD workflow steps that were in the middle of a wait period when a backup was taken completed their wait period after restore and moved on.  To Do items that were waiting for user intervention continued to wait and then proceeded to the next step when they were marked as completed in the restored site collection.

In addition, new instances of each workflow type could be created and started in both site collections following the backup and restore operations.  The backup and subsequent restore didn’t appear to have any effect on either the source or destination.

Though my testing wasn’t exhaustive, it did cast a doubt on the absolute nature of the statement made on TechNet regarding site collection backups failing to include workflows.

Joel’s Legwork

While I was conducting my research and testing, Joel was leveraging his network of contacts and asking folks at Microsoft for the real story behind site collection backups and workflow.  He made a little progress with each person he spoke to, and in the end, he managed to get someone to go on the record.

The Conclusion

The official word from Microsoft is that the TechNet note indicating that site collection backups don’t include workflows is a misprint.  In reality, the point that should have been conveyed through TechNet was that content exports (via the Content Deployment API) don’t include workflows – a point that is perfectly understandable considering that the Content Deployment API doesn’t export or import with full-fidelity.  Microsoft indicated that they’ll be correcting the error, and TechNet may have been corrected by the time you read this.

My takeaway on this: if something on TechNet (or anywhere else on the web) doesn’t quite add up, it never hurts to test and seek additional information from others in the community who are knowledgeable on the subject matter.  In this case, it made a huge difference.

Additional Resources and References

  1. Blog: John Ferringer
  2. Book: SharePoint 2010 Disaster Recovery Guide
  3. TechNet: Backup and recovery overview (SharePoint Server 2010)
  4. Event: SharePoint Saturday Ozarks
  5. Blog: Joel Oleson
  6. Blog: Mark Rackley
  7. Tools: Reflector