Site Collection Backups and Workflow Portability in SharePoint 2010

In this post, I discuss my quest to determine whether or not site collection backups properly capture workflow information in SharePoint 2010. TechNet made a point of saying they didn’t, but Joel Oleson said they did. Who was right?

Do you trust TechNet?  I generally do, as I figure the good folks at Microsoft are doing their best to disseminate reliable information to those of us working with their products.  As I recently learned, though, even the information that appears on TechNet needs some cross-checking once in a while.

Bear with me, as this post is equal parts narrative and data discussion.  If you don’t like stories and want to cut straight to the chase, though, simply scroll down to the section titled “The Conclusion” for the key takeaway.

Site Collection Backup Primer

For those who aren’t overly familiar with site collection backups, it’s probably worth spending a moment discussing them a bit before going any further.  Site collection backups are, after all, at the heart of this blog post.

What is a site collection backup?  It is basically what you would expect from its name: a backup of a specific SharePoint site collection.  These backups can be used to restore or overwrite a site collection if it becomes lost or corrupt, and they can also be used to copy site collections from one web application (or farm) to another.

Anytime you execute one of the following operations, you’re performing a site collection backup:

  • from the command line: STSADM.exe –o backup –url <url> –filename <filename>
  • through PowerShell in SharePoint 2010: Backup-SPSite <url> –Path <filepath>
  • Using the “Perform a site collection backup” link in SharePoint 2010 Central Administration

When a site collection backup is executed, a single file with a .bak extension is generated that contains the entire contents of the site collection targeted.  This file can be freely copied and moved around as needed.  Aside from some recommendations regarding the maximum size of the site collection captured using this approach (15GB and under in SharePoint 2007, 85GB and under in SharePoint 2010), the backups themselves are really quite handy for both protection and site collection migration operations.

A Little Background

John Ferringer and I have been plugging away at the SharePoint 2010 Disaster Recovery Guide for quite some time.  As you might imagine, the writing process involves a lot of research, hands-on experimentation, and fact-checking.  This is especially true for a book that’s being written about a platform (SharePoint 2010) that is basically brand new in the marketplace.

While researching backup-related changes for the book, I made a special mental note of the following change regarding site collection backups in SharePoint 2010:

Site Collection Backups and Workflow

The text that is circled in the image above (taken straight from a TechNet page titled Backup and recovery overview (SharePoint Server 2010)) says this:

Workflows are not included in site collection backups

This stuck with me when I read it, because I hadn’t recalled any such statement being made with regard to site collection backups in SharePoint 2007.  Since Microsoft made a special note of pointing out this limitation for SharePoint 2010, though, I figured it was important to keep in mind.  Knowing that workflows had changed from 2007 to 2010, I reasoned that the new limitation was probably due to some internal workflow plumbing alterations that adversely affected the backup process.

The Setup

A couple of weeks back, I was presenting at SharePoint Saturday Ozarks alongside an awesome array of other folks (including Joel Oleson) from the SharePoint community.  Due to a speaker no-show in an early afternoon slot, Mark Rackley (the event’s one-man force-of-nature organizer) decided to hold an “ask the experts” panel where attendees could pitch questions at those of us who were willing to share what we knew.

A number of good questions came our way, and we all did our best to supply our experiences and usable advice.  Though I don’t recall the specific question that was asked in one particular case, I do remember advising someone to perform a site collection backup before attempting to do whatever it was they wanted to do.  After sharing that advice, though, things got a little sketchy.  The following captures the essence of the exchange that took place between Joel and me:

Me: <to the attendee> Site collection backups don’t capture everything in SharePoint 2010, though, so be careful.

Joel: No, site collection backups are full-fidelity.

Me: TechNet specifically indicates that workflows aren’t covered in site collection backups with SharePoint 2010.

Joel: No, the backups are still full fidelity.

Me: <blank stare>

The discussion topic and associated questions for the panel quickly changed, but my brain was still stripping a few gears trying to reconcile what I’d read on TechNet with what Joel was saying.

After the session, I forwarded the TechNet link I had quoted to Joel and asked if he happened to have an “inside track” or perhaps some information I didn’t have access to.  We talked about the issue for a while at the hotel a little later on, but the only thing that we could really conclude was that more research was needed to see if site collection backups had in fact changed with SharePoint 2010.  Before taking off that weekend, we decided to stay in contact and work together to get some answers.

Under The Hood

To understand why this issue bothered me so much, remember that I’m basically in the middle of co-authoring a book on the topic of disaster recovery – a topic that is intimately linked to backup and restore operations.  The last thing I would ever want to do is write a book that contains ambiguous or (worse) flat-out wrong information about the book’s central topic.

To get to the heart of the matter, I decided to start where most developers would: with the SharePoint object model.  In both SharePoint 2007 and SharePoint 2010, the object model types that are used to backup and export content typically fall into one of two general categories:

  • Catastrophic Backup and Restore API.  These types are located in the Microsoft.SharePoint.Administration.Backup namespace, and they provide SharePoint’s full-fidelity backup and restore functions.  Backup and restore operations take place on content components such as content databases, service applications, and the entire SharePoint farm.  Catastrophic backup and restore operations are full-fidelity, meaning that no data is lost or selectively ignored during a backup and subsequent restore.  By default, catastrophic backup and restore operation don’t get any more granular than a content database.  If you want to protect something within a content database, such as a site collection, sub-site, or list, you have to backup the entire content database containing the target object(s).
  • Content Deployment API.  The member types of this API (also known internally at Microsoft as the PRIME API) reside within the Microsoft.SharePoint.Deployment namespace and are used for granular content export and import operations.  The exports that are created by the types in this namespace target objects from the site collection level all the way down to the field level – typically webs, lists, list items, etc.  Content Deployment exports are not full-fidelity and are commonly used for moving content around more than they are for actual backup and restore operations.

So, where does this leave site collection backups?  In truth, site collection backups don’t fit into either of these categories.  They are a somewhat unusual case, both in SharePoint 2007 and SharePoint 2010.

Whether a site collection backup is initiated through STSADM, PowerShell, or Central Administration, a single method is called on the SPSiteCollection type which resides in the Microsoft.SharePoint.Administration namespace.  This is basically the signature of the method:

SPSiteCollection.Backup(string strSiteUrl, string strFilename, bool bOverwrite)

To carry out a site collection backup, all that is needed is the URL of the site collection, the filename that will be used for the resultant backup file, and a TRUE or FALSE to indicate whether an overwrite should occur if the selected file already exists.

If you were to pop open Reflector and drill into the Backup method on the SPSiteCollection type, you wouldn’t get very far before running into a wall at the SPRequest type.  SPRequest is a managed wrapper around the take-off point for a whole host of external calls, and the execution of the Backup method is actually handled in unmanaged legacy code.  Examining the internals of what actually takes place during a site collection backup (or restore, for that matter) simply isn’t possible with Reflector.

Since the internals of the Backup method weren’t available for reflective analysis, I was forced to drop back and punt in order to determine how site collection backups and workflow interacted within SharePoint 2010.

Testing Factors

I knew that I was going to have to execute backup and restore tests at some point; I was just hoping that I would be a bit more informed (through object model inspection) about where I needed to focus my efforts.  Without any visibility into the internals of the site collection backup process, though, I didn’t really have much to start with.

Going into the testing process, I knew that I wasn’t going to have enough time to perform exhaustive testing for every scenario, execution path, variable, and edge-case that could be relevant to the backup and restore processes.  I had to develop a testing strategy that would hit the likely problem areas as quickly (and with as few runs) as possible.

After some thought, I decided that these points were important facets to consider and account for while testing:

  • Workflow Types.  Testing the most common workflow types was important.  I knew that I would need to test at least one out of the box (OOTB) workflow type.  I also decided that I needed to test at least one instance of each type of workflow that could be created through SharePoint Designer (SPD) 2010; that meant testing a list-bound workflow, a site collection workflow, and a reusable workflow.  I decided that custom code workflows, such as those that might be created through Visual Studio, were outside the scope of my testing.
  • Workflow Data.  In order to test the impact of backup and restore operations on a workflow, I obviously had to ensure that one or more workflows were in-place within the site collection targeted for backup.  Having a workflow attached to a list would obviously test the static data portions of the workflow, but there was other workflow-related data that had to be considered.  In particular, I decided that the testing of both workflow history information and in-process workflow state were important.  More on the workflow state in a bit …
  • Backup and Restore Isolation.  While testing, it would be important to ensure that backup operations and restore operations impacted one another (or rather, had the potential to impact one another) as little as possible.  Though backups and restores occurred within the same virtual farm, I isolated them to the extent that I could.  Backups were performed in one web application, and restores were performed in a separate web application.  I even placed each web application in its own (IIS) application pool – just to be sure.  I also established a single VM snapshot starting point; after each backup and restore test, I rolled back to the snapshot point to ensure that nothing remained in the farm (or VM, for that matter) that was tied to the previous round of testing.

Testing Procedure

I created a single Publishing Portal, bolted a couple of sub-sites and Document Libraries into it, and used it as the target for my site collection backup operations.  The Document Library that I used for workflow testing varied between tests; it was not held constant and did change according to the needs of each specific test.

I ran four different workflow test scenarios.  My OOTB workflow scenario involved testing the page approval workflow for publishing pages.  My other three SPD workflow tests (list-bound, site collection, and reusable workflow) all involved the same basic set of workflow steps:

  1. Wait five minutes
  2. Create a To Do item (which had to be completed to move on)
  3. Wait five more minutes
  4. Add a comment to the workflow target

In both the OOTB workflow and SPD workflow scenarios, I wanted to perform backups while workflows were basically “in flight” to see how workflow state would or wouldn’t be impacted by the backup and restore processes.  For the publishing approval workflow, this meant taking a site collection backup while at least one page was pending approval.  For the SPD workflows, it meant capturing a backup while at least one workflow instance was in a five minute wait period and another was waiting on the completion of the To Do item.

Prior to executing a backup in each test case, I ran a couple of workflow instances from start to finish.  This ensured that I had some workflow history information to capture and restore.

Once site collection backups were captured in each test case, I restored them into the empty web application.  I then opened the restored site collection to determine what did and didn’t get transcribed through the backup and restore process.

Results Of Testing

In each workflow case (OOTB and all three SPD workflows), all workflow information that I could poke and prod appeared to survive the backup and restore process without issue.  Workflow definition data was preserved, and workflow history came over intact.  Even more impressive, though, was the fact that in-process workflow state was preserved.  SPD workflow steps that were in the middle of a wait period when a backup was taken completed their wait period after restore and moved on.  To Do items that were waiting for user intervention continued to wait and then proceeded to the next step when they were marked as completed in the restored site collection.

In addition, new instances of each workflow type could be created and started in both site collections following the backup and restore operations.  The backup and subsequent restore didn’t appear to have any effect on either the source or destination.

Though my testing wasn’t exhaustive, it did cast a doubt on the absolute nature of the statement made on TechNet regarding site collection backups failing to include workflows.

Joel’s Legwork

While I was conducting my research and testing, Joel was leveraging his network of contacts and asking folks at Microsoft for the real story behind site collection backups and workflow.  He made a little progress with each person he spoke to, and in the end, he managed to get someone to go on the record.

The Conclusion

The official word from Microsoft is that the TechNet note indicating that site collection backups don’t include workflows is a misprint.  In reality, the point that should have been conveyed through TechNet was that content exports (via the Content Deployment API) don’t include workflows – a point that is perfectly understandable considering that the Content Deployment API doesn’t export or import with full-fidelity.  Microsoft indicated that they’ll be correcting the error, and TechNet may have been corrected by the time you read this.

My takeaway on this: if something on TechNet (or anywhere else on the web) doesn’t quite add up, it never hurts to test and seek additional information from others in the community who are knowledgeable on the subject matter.  In this case, it made a huge difference.

Additional Resources and References

  1. Blog: John Ferringer
  2. Book: SharePoint 2010 Disaster Recovery Guide
  3. TechNet: Backup and recovery overview (SharePoint Server 2010)
  4. Event: SharePoint Saturday Ozarks
  5. Blog: Joel Oleson
  6. Blog: Mark Rackley
  7. Tools: Reflector

Author: Sean McDonough

I am the Chief Technology Officer for Bitstream Foundry LLC, a SharePoint solutions, services, and consulting company headquartered in Cincinnati, Ohio. My professional development background goes back to the COM and pre-COM days - as well as SharePoint (since 2004) - and I've spent a tremendous amount of time both in the plumbing (as an IT Pro) and APIs (as a developer) associated with SharePoint and SharePoint Online. In addition, Microsoft awarded me an MVP (most valuable professional) in 2016 for the Office Development and the Office Servers and Services categories.

30 thoughts on “Site Collection Backups and Workflow Portability in SharePoint 2010”

  1. Thanks, Jeremy! This was definitely a case of “had to have the facts.” I love a little experimentation every now and then, as well.

  2. Dear Sean,

    I keep running a related issue on import/export a site or list that contains workflows, for instance, a document library containing some documents in the mid of document approval process. Say, the original site might contains a few workflow items at the time of “import” using the Import function from the CA site. But after restore the site to a different location, the workflow items disppears. Could you please shut light this or offer some solutions? Thanks.

  3. Huijie,

    It’s important to clearly differentiate between “import/export” and “site collection backup/restore.” Import/export functionality is handled by the Content Deployment API (also known as the PRIME API), while site collection backup/restore is handled through a managed code entry point on the SPSiteCollection type. The two mechanisms operate very differently.

    In the case of import/export, Microsoft is pretty clear in saying that workflow preservation may not (and probably will not) occur. If you export a site or list from either the command line (“STSADM -o export” or the Export-SPWeb PowerShell cmdlet) or SharePoint 2010’s new Central Administration “Export a site or list” capability, you’ll likely lose workflow data and state.

    For this blog post, my efforts were focused instead on site collection backup/restore. When you use the Backup-SPSite PowerShell cmdlet, “STSADM -o backup -url ” command line, or SharePoint 2010 Central Administration’s “Perform a site collection backup” function, you’re doing a true site collection backup. Workflow data, state, and history are maintained.

    I hope that clears things up. It sounds like you’re using functionality that’s based in the Content Deployment API, so the results you’re seeing make sense to me. If I’m misunderstanding, please let me know and I’ll try to help.

    Also, I’m not familiar with the “Import function” you talked about on the Central Administration site. If you are still running into problems and elaborate on them here, please provide a little more detail about this “Import function.”

    Thanks, and good luck!

  4. “Someone”: good question. My question back to you is this: when taking a trip, is the destination the only thing you’re interested in, or is the journey half the fun?

    I hadn’t checked back to see if Microsoft had updated TechNet yet, but I’m glad to hear that they did. Since TechNet has been updated, my post obviously doesn’t add much in terms of a warning or final answer. If you’re an “I only care about the destination” type (reference my earlier question), then I can see where this post would seem pretty pointless.

    I’m going to leave the post up, though, because I’d like to think that there’s some value in the experimental procedure and process I used to test things out. One’s approach to problem-solving is sometimes more valuable than the actual conclusions that are drawn after all is said and done. Quite a few of the classes I took in high school and college required that I “show my work” for full credit :-)

  5. Do you know if this holds true for SharePoint 2007? You said “…. because I hadn’t recalled any such statement (workflows not being included in site collection backups) being made with regard to site collection backups in SharePoint 2007.” The reason I ask is because I went to move a site collection from one content database to another by using stsadm backup and restore and all of my in process workflows broke.

  6. Very good post, please keep it up especially because the Technet Pages for Import/Export SPWeb do not list the Workflow limitation, so people googling for that issue will end up here and get this important info.

  7. I’ve been traveling for the better part of the last week, so I’m only now getting around to answering some questions. Sorry it’s taken me so long to respond to you, Priscilla.

    To address your question: using site collection backup and restore, workflows in SharePoint 2007 should survive the process of being moved from one content database to another. Site collection backups are full-fidelity, so everything in the site collection being backed-up gets put back during the restore.

    Now, having said that: I can think of all sorts of different reasons why workflows might “blow up” during this process. Failing to lock the site collection (prior to SP2, or overriding it afterwards) during backup can leave content – including workflows – in an inconsistent state. Third party workflow products that maintain dependencies outside of the site collection can break since the workflow is utilizing more than just the contents of the site collection. Restoring a site collection to another Web application or farm where dependent Features haven’t been deployed and/or activated can also lead to problems.

    I’m not questioning your experiences or the results you’ve seen. My only point in mentioning these different items is to point out that the backup and restore process can be affected by more than just the fidelity of the data capture.

    For what it’s worth!

    P.S. I deleted the comments you indicated that you wanted to pull down :-)

  8. Thanks for the feedback, Michael. I do intend to leave this post up even though Microsoft corrected the specific TechNet page I referenced early on. I’ve seen inconsistencies across some TechNet pages given the sheer quantity of content housed there, so I hope that I help things a bit by leaving this in-place.

  9. Hello Sean,
    It’s unfortunate that the workflows do not transfer over on an export/import. If one would like to duplicate a site (and all its content/workflows) on the same web application (or site collection) on a different URL, how is this possible? Is it feasible?

    If saving the site as a template is out of the question, is there another way to accomplish this? There are quite a few workflows on the site I would like to duplicate. Although the workflow definitions are there, the associated lists are blank for each workflow.

    Thanks!

    1. Hi Joe,

      I should start by telling you that backup/restore is my strong suit … and that I’m the furthest thing from a “workflow guy.” I do know that declarative workflows tend to stick to everything they touch, and although things are better with SharePoint 2010 (versus 2007), workflows are not yet perfect from a portability standpoint.

      I couldn’t tell from your comment if you’re on 2010 or 2007. SharePoint 2010 does introduce reusable workflows, and those are certainly more portable than the “classic” list-bound workflows we had in 2007. Perhaps reusable workflows are an option for you?

      You mentioned that “associated lists are blank for each workflow.” If you’ve built out a number of lists to support workflows (perhaps for things like lookups) and are trying to make all of it portable, I’m not sure how much luck you’re going to have outside of taking the custom solution route. Workflow definitions are one thing, but site content is another. You could package all of it together with a Feature, but you’ll need to pull out Visual Studio and really dig in.

      Sorry I don’t have more to offer :-(

  10. What if you are backing up a SPWeb site? How would you do this and then promote the SPWeb to an SPSite and keep all workflow data?

  11. Michael,

    If you’re attempting to protect an SPWeb (versus a site collection, or SPSite), it’s important to realize that there isn’t a true out-of-the-box (OOTB) *backup* mechanism you can use short of backing up the parent site collection. You can *export* an individual SPWeb, but that isn’t a full-fidelity process. Many SPWebs also maintain dependencies on objects that live at the site collection level, as well; e.g., content types, users, permission groups/levels, etc., and these undergo varying degrees of transformation (or wholesale omission) during the export process.

    Unfortunately, the process you’re describing (that you’d like to undertake) is going to take you beyond the OOTB tools and into the land of 3rd party products – both for some form of SPWeb backup as well as for the promotion of an SPWeb to SPSite. Tools do exist; they’re just not free :-(

    1. Yeah i am learning this in vain. I am using Gary’s tool http://blog.falchionconsulting.com/index.php/2007/09/convert-a-sub-site-to-a-site-collection/ , everything worked fine except the OTB approval workflow with wf history didn’t not get carried over. Big oversite on my part.

      Any other suggestions? I am going to do a db restore and see if I can grab the list and even and at the very list copy over the wf history list to new promoted site collection

      Michael

  12. Great posting and I appreciate the ammunition to describe the issue to clients, as I tell them they same thing (that site collection backups are full-fidelity) and they reference TechNet. I am now using your blog post as the key reference point.

  13. Michael,

    Aside from third party tools (again: not free), I really don’t have much else in the way of suggestions. I don’t think your situation is unique, and I know that many have tried to tackle the problem with varying degrees of success. Besides the issues of structural components and dependencies, it’s my understanding that workflow *state* isn’t really exposed in any fashion that makes it easy to transcribe, either. Lots of loose ends.

    Keep looking, though, and I hope you find something. As I mentioned, I focus on backup/restore – my knowledge of workflow is limited. I’ve got to believe that there are others out there who can offer more targeted suggestions.

    Good luck!

  14. Thanks Kelly! I’m glad that you’re able to use the post to make your point with clients, and I appreciate the feedback :-)

  15. Very good post!
    Appreciate if u can clarify following..

    If SPSiteCollection is neither part of Microsoft.SharePoint.Administration.backup namespace nor Microsoft.SharePoint.deployment then, SPSiteCollection.backup and restore provides full-fidelity or non full-fidelity backup?

    When perform Site Collection backup using “Export Site or list” which generates .cmp file, does it mean it uses PRIME API?

    Any difference in content it export’s when perform SC backup using “Perform a site collection backup” or “Export Site or list” using CA?

    Thanks

    1. Kazimo,

      Thanks for the reply! To answer the questions you posed:

      1. A backup that is created using the Backup() method of the SPSiteCollection object (http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.administration.spsitecollection.backup.aspx) is full-fidelity; i.e., it’s a 1-to-1 copy of data that exists within the site collection. The actual backup mechanism lives in unmanaged code. In SharePoint 2010, this is also equivalent to executing a Backup-SPSite command.

      2. Yes, the “Site Or List Export” page (SiteAndListExport.aspx) in Central Administration leverages the Content Deployment API (PRIME API) to create exports of content – not true backups. As a rule of thumb, any content operation within SharePoint that generates one or more .CMP files is leveraging the Content Deployment API. If you use this page to create a recursive site collection export, the results will be different than creating a true site collection backup (as described in #1 above). In SharePoint 2010, exports can be generated from PowerShell using Export-SPWeb

      3. Yes, there is a difference between using the “Site collection backup” page (SiteCollectionBackup.aspx) and the “Site Or List Export” page (SiteAndListExport.aspx). The SiteCollectionBackup.aspx page generates a true site collection backup using the same mechanism described in point #1 (above). Because of this, backups that are generated are true backups and full-fidelity. The SiteAndListExport.aspx page, on the other hand, is doing an export. Content migration packages (.cmp) that are generated are not full-fidelity.

      I hope that helps!

  16. I’ve just completed a site collection backup for a SharePoint 2013 site and the workflows did not come across. Maybe its because its a host named site collection.

    1. Thanks for the feedback, Craig. Your experience confirms a suspicion I had in light of SharePoint 2013’s complete workflow redesign. Even though a SharePoint 2010 (compatibility/legacy) workflow model exists, newer 2013 workflows leverage the Workflow Manager Client which runs externally with regard to the SharePoint environment. In essence, workflow activities aren’t in the farm any longer: http://msdn.microsoft.com/en-us/library/office/jj163181.aspx

      Hearing that a site collection backup doesn’t capture 2013 workflows is a good thing to know and have confirmed. With SharePoint 2013’s general decoupling of services to provide greater scale-out capability versus previous versions, I suspect site collection backups are going to be less capable from a one-to-one migration perspective than they once were.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s