Site Collection Backups and Workflow Portability in SharePoint 2010

In this post, I discuss my quest to determine whether or not site collection backups properly capture workflow information in SharePoint 2010. TechNet made a point of saying they didn’t, but Joel Oleson said they did. Who was right?

Do you trust TechNet?  I generally do, as I figure the good folks at Microsoft are doing their best to disseminate reliable information to those of us working with their products.  As I recently learned, though, even the information that appears on TechNet needs some cross-checking once in a while.

Bear with me, as this post is equal parts narrative and data discussion.  If you don’t like stories and want to cut straight to the chase, though, simply scroll down to the section titled “The Conclusion” for the key takeaway.

Site Collection Backup Primer

For those who aren’t overly familiar with site collection backups, it’s probably worth spending a moment discussing them a bit before going any further.  Site collection backups are, after all, at the heart of this blog post.

What is a site collection backup?  It is basically what you would expect from its name: a backup of a specific SharePoint site collection.  These backups can be used to restore or overwrite a site collection if it becomes lost or corrupt, and they can also be used to copy site collections from one web application (or farm) to another.

Anytime you execute one of the following operations, you’re performing a site collection backup:

  • from the command line: STSADM.exe –o backup –url <url> –filename <filename>
  • through PowerShell in SharePoint 2010: Backup-SPSite <url> –Path <filepath>
  • Using the “Perform a site collection backup” link in SharePoint 2010 Central Administration

When a site collection backup is executed, a single file with a .bak extension is generated that contains the entire contents of the site collection targeted.  This file can be freely copied and moved around as needed.  Aside from some recommendations regarding the maximum size of the site collection captured using this approach (15GB and under in SharePoint 2007, 85GB and under in SharePoint 2010), the backups themselves are really quite handy for both protection and site collection migration operations.

A Little Background

John Ferringer and I have been plugging away at the SharePoint 2010 Disaster Recovery Guide for quite some time.  As you might imagine, the writing process involves a lot of research, hands-on experimentation, and fact-checking.  This is especially true for a book that’s being written about a platform (SharePoint 2010) that is basically brand new in the marketplace.

While researching backup-related changes for the book, I made a special mental note of the following change regarding site collection backups in SharePoint 2010:

Site Collection Backups and Workflow

The text that is circled in the image above (taken straight from a TechNet page titled Backup and recovery overview (SharePoint Server 2010)) says this:

Workflows are not included in site collection backups

This stuck with me when I read it, because I hadn’t recalled any such statement being made with regard to site collection backups in SharePoint 2007.  Since Microsoft made a special note of pointing out this limitation for SharePoint 2010, though, I figured it was important to keep in mind.  Knowing that workflows had changed from 2007 to 2010, I reasoned that the new limitation was probably due to some internal workflow plumbing alterations that adversely affected the backup process.

The Setup

A couple of weeks back, I was presenting at SharePoint Saturday Ozarks alongside an awesome array of other folks (including Joel Oleson) from the SharePoint community.  Due to a speaker no-show in an early afternoon slot, Mark Rackley (the event’s one-man force-of-nature organizer) decided to hold an “ask the experts” panel where attendees could pitch questions at those of us who were willing to share what we knew.

A number of good questions came our way, and we all did our best to supply our experiences and usable advice.  Though I don’t recall the specific question that was asked in one particular case, I do remember advising someone to perform a site collection backup before attempting to do whatever it was they wanted to do.  After sharing that advice, though, things got a little sketchy.  The following captures the essence of the exchange that took place between Joel and me:

Me: <to the attendee> Site collection backups don’t capture everything in SharePoint 2010, though, so be careful.

Joel: No, site collection backups are full-fidelity.

Me: TechNet specifically indicates that workflows aren’t covered in site collection backups with SharePoint 2010.

Joel: No, the backups are still full fidelity.

Me: <blank stare>

The discussion topic and associated questions for the panel quickly changed, but my brain was still stripping a few gears trying to reconcile what I’d read on TechNet with what Joel was saying.

After the session, I forwarded the TechNet link I had quoted to Joel and asked if he happened to have an “inside track” or perhaps some information I didn’t have access to.  We talked about the issue for a while at the hotel a little later on, but the only thing that we could really conclude was that more research was needed to see if site collection backups had in fact changed with SharePoint 2010.  Before taking off that weekend, we decided to stay in contact and work together to get some answers.

Under The Hood

To understand why this issue bothered me so much, remember that I’m basically in the middle of co-authoring a book on the topic of disaster recovery – a topic that is intimately linked to backup and restore operations.  The last thing I would ever want to do is write a book that contains ambiguous or (worse) flat-out wrong information about the book’s central topic.

To get to the heart of the matter, I decided to start where most developers would: with the SharePoint object model.  In both SharePoint 2007 and SharePoint 2010, the object model types that are used to backup and export content typically fall into one of two general categories:

  • Catastrophic Backup and Restore API.  These types are located in the Microsoft.SharePoint.Administration.Backup namespace, and they provide SharePoint’s full-fidelity backup and restore functions.  Backup and restore operations take place on content components such as content databases, service applications, and the entire SharePoint farm.  Catastrophic backup and restore operations are full-fidelity, meaning that no data is lost or selectively ignored during a backup and subsequent restore.  By default, catastrophic backup and restore operation don’t get any more granular than a content database.  If you want to protect something within a content database, such as a site collection, sub-site, or list, you have to backup the entire content database containing the target object(s).
  • Content Deployment API.  The member types of this API (also known internally at Microsoft as the PRIME API) reside within the Microsoft.SharePoint.Deployment namespace and are used for granular content export and import operations.  The exports that are created by the types in this namespace target objects from the site collection level all the way down to the field level – typically webs, lists, list items, etc.  Content Deployment exports are not full-fidelity and are commonly used for moving content around more than they are for actual backup and restore operations.

So, where does this leave site collection backups?  In truth, site collection backups don’t fit into either of these categories.  They are a somewhat unusual case, both in SharePoint 2007 and SharePoint 2010.

Whether a site collection backup is initiated through STSADM, PowerShell, or Central Administration, a single method is called on the SPSiteCollection type which resides in the Microsoft.SharePoint.Administration namespace.  This is basically the signature of the method:

SPSiteCollection.Backup(string strSiteUrl, string strFilename, bool bOverwrite)

To carry out a site collection backup, all that is needed is the URL of the site collection, the filename that will be used for the resultant backup file, and a TRUE or FALSE to indicate whether an overwrite should occur if the selected file already exists.

If you were to pop open Reflector and drill into the Backup method on the SPSiteCollection type, you wouldn’t get very far before running into a wall at the SPRequest type.  SPRequest is a managed wrapper around the take-off point for a whole host of external calls, and the execution of the Backup method is actually handled in unmanaged legacy code.  Examining the internals of what actually takes place during a site collection backup (or restore, for that matter) simply isn’t possible with Reflector.

Since the internals of the Backup method weren’t available for reflective analysis, I was forced to drop back and punt in order to determine how site collection backups and workflow interacted within SharePoint 2010.

Testing Factors

I knew that I was going to have to execute backup and restore tests at some point; I was just hoping that I would be a bit more informed (through object model inspection) about where I needed to focus my efforts.  Without any visibility into the internals of the site collection backup process, though, I didn’t really have much to start with.

Going into the testing process, I knew that I wasn’t going to have enough time to perform exhaustive testing for every scenario, execution path, variable, and edge-case that could be relevant to the backup and restore processes.  I had to develop a testing strategy that would hit the likely problem areas as quickly (and with as few runs) as possible.

After some thought, I decided that these points were important facets to consider and account for while testing:

  • Workflow Types.  Testing the most common workflow types was important.  I knew that I would need to test at least one out of the box (OOTB) workflow type.  I also decided that I needed to test at least one instance of each type of workflow that could be created through SharePoint Designer (SPD) 2010; that meant testing a list-bound workflow, a site collection workflow, and a reusable workflow.  I decided that custom code workflows, such as those that might be created through Visual Studio, were outside the scope of my testing.
  • Workflow Data.  In order to test the impact of backup and restore operations on a workflow, I obviously had to ensure that one or more workflows were in-place within the site collection targeted for backup.  Having a workflow attached to a list would obviously test the static data portions of the workflow, but there was other workflow-related data that had to be considered.  In particular, I decided that the testing of both workflow history information and in-process workflow state were important.  More on the workflow state in a bit …
  • Backup and Restore Isolation.  While testing, it would be important to ensure that backup operations and restore operations impacted one another (or rather, had the potential to impact one another) as little as possible.  Though backups and restores occurred within the same virtual farm, I isolated them to the extent that I could.  Backups were performed in one web application, and restores were performed in a separate web application.  I even placed each web application in its own (IIS) application pool – just to be sure.  I also established a single VM snapshot starting point; after each backup and restore test, I rolled back to the snapshot point to ensure that nothing remained in the farm (or VM, for that matter) that was tied to the previous round of testing.

Testing Procedure

I created a single Publishing Portal, bolted a couple of sub-sites and Document Libraries into it, and used it as the target for my site collection backup operations.  The Document Library that I used for workflow testing varied between tests; it was not held constant and did change according to the needs of each specific test.

I ran four different workflow test scenarios.  My OOTB workflow scenario involved testing the page approval workflow for publishing pages.  My other three SPD workflow tests (list-bound, site collection, and reusable workflow) all involved the same basic set of workflow steps:

  1. Wait five minutes
  2. Create a To Do item (which had to be completed to move on)
  3. Wait five more minutes
  4. Add a comment to the workflow target

In both the OOTB workflow and SPD workflow scenarios, I wanted to perform backups while workflows were basically “in flight” to see how workflow state would or wouldn’t be impacted by the backup and restore processes.  For the publishing approval workflow, this meant taking a site collection backup while at least one page was pending approval.  For the SPD workflows, it meant capturing a backup while at least one workflow instance was in a five minute wait period and another was waiting on the completion of the To Do item.

Prior to executing a backup in each test case, I ran a couple of workflow instances from start to finish.  This ensured that I had some workflow history information to capture and restore.

Once site collection backups were captured in each test case, I restored them into the empty web application.  I then opened the restored site collection to determine what did and didn’t get transcribed through the backup and restore process.

Results Of Testing

In each workflow case (OOTB and all three SPD workflows), all workflow information that I could poke and prod appeared to survive the backup and restore process without issue.  Workflow definition data was preserved, and workflow history came over intact.  Even more impressive, though, was the fact that in-process workflow state was preserved.  SPD workflow steps that were in the middle of a wait period when a backup was taken completed their wait period after restore and moved on.  To Do items that were waiting for user intervention continued to wait and then proceeded to the next step when they were marked as completed in the restored site collection.

In addition, new instances of each workflow type could be created and started in both site collections following the backup and restore operations.  The backup and subsequent restore didn’t appear to have any effect on either the source or destination.

Though my testing wasn’t exhaustive, it did cast a doubt on the absolute nature of the statement made on TechNet regarding site collection backups failing to include workflows.

Joel’s Legwork

While I was conducting my research and testing, Joel was leveraging his network of contacts and asking folks at Microsoft for the real story behind site collection backups and workflow.  He made a little progress with each person he spoke to, and in the end, he managed to get someone to go on the record.

The Conclusion

The official word from Microsoft is that the TechNet note indicating that site collection backups don’t include workflows is a misprint.  In reality, the point that should have been conveyed through TechNet was that content exports (via the Content Deployment API) don’t include workflows – a point that is perfectly understandable considering that the Content Deployment API doesn’t export or import with full-fidelity.  Microsoft indicated that they’ll be correcting the error, and TechNet may have been corrected by the time you read this.

My takeaway on this: if something on TechNet (or anywhere else on the web) doesn’t quite add up, it never hurts to test and seek additional information from others in the community who are knowledgeable on the subject matter.  In this case, it made a huge difference.

Additional Resources and References

  1. Blog: John Ferringer
  2. Book: SharePoint 2010 Disaster Recovery Guide
  3. TechNet: Backup and recovery overview (SharePoint Server 2010)
  4. Event: SharePoint Saturday Ozarks
  5. Blog: Joel Oleson
  6. Blog: Mark Rackley
  7. Tools: Reflector

Upcoming Events (June 2010)

This post introduces SharePoint Saturday Columbus which will be taking place on August 14, 2010. Several of us are putting the event together, and we’re seeking both speakers and sponsors. I will also be speaking at SharePoint Saturday Ozarks this Saturday, June 12th, and delivering my new talk titled “‘Caching-In’ for SharePoint Performance.”

The last couple of months have been exceptionally busy, so this blog hasn’t been getting the attention it deserves.  All of my time has been spent writing chapters for the SharePoint 2010 Disaster Recovery Guide that John Ferringer and I are putting together.  The good news is that John and I have rounded the bend and are heading towards home on completion of the book, so I will be getting back to blogging about topics of greater substance towards the middle of the summer.

Announcing SharePoint Saturday Columbus!

SharePoint Saturday Columbus on August 14, 2010! Yesterday we (the planning committee) announced that SharePoint Saturday Columbus will be taking place at the Conference Center at OCLC in Dublin, Ohio on August 14th, 2010.  For those of you not familiar with the central Ohio region, Dublin is just a northern part of the Columbus area.

Brian Jackett, Jennifer Mason, Nicola Young, and I have been pulling the pieces together over the last several months, and we finally have enough done that we can announce the event.  We’re very excited to be bringing a SharePoint Saturday event to this region of the Midwest!

We are actively seeking both speakers and sponsors for the event.  If you or someone you know falls into either or both of these categories, please head out to the SharePoint Saturday Columbus site for sponsorship information, session submission forms, and other resources.  You can also follow @SPSColumbus on Twitter for more information and announcements in the time leading up to the event.

Speaking of SharePoint Saturdays …

SharePoint Saturday Ozarks

SharePoint Saturday Ozarks on June 12, 2010 It’s funny to think that the whole SharePoint Saturday experience started about a year ago for me.  I’ll be going back to the scene of the crime this weekend when I head to Harrison, Arkansas, for SharePoint Saturday Ozarks.

Mark Rackley is reminding the SharePoint community that he is a force of nature by putting all the pieces together to make this event happen.  Most SharePoint Saturday events have an organizing committee, but Mark plays all the instruments in this band.  It’s simply amazing.

This time around, I’ll actually be delivering a session on something other than SharePoint disaster recovery.  The session is titled “’Caching-In’ for SharePoint Performance,” and it’s a new one for me.  I’m really looking forward to giving the talk, because caching within SharePoint is something I am both passionate about and have deep experience with.  Here’s the abstract for my session:

Caching is a critical variable in the SharePoint scalability and performance equation, but it’s one that’s oftentimes misunderstood or dismissed as being needed only in Internet-facing scenarios.  In this session, we’ll build an understanding of the caching options that exist within the SharePoint platform and how they can be leveraged to inject some pep into most SharePoint sites.  We’ll also cover some sample scenarios, caching pitfalls, and watch-outs that every administrator should know.

If you happen to be in the Harrison, AR region on Saturday, June 12th, swing by the North Arkansas College.  There will be one heck of a SharePoint party going on!

Additional Resources and References

  1. Book: SharePoint 2010 Disaster Recovery Guide
  2. Blog: John Ferringer
  3. Event: SharePoint Saturday Columbus
  4. Location: The Conference Center at OCLC
  5. Blog: Brian Jackett
  6. Blog: Jenniffer Mason
  7. Blog: Nicola Young
  8. Twitter: @SPSColumbus
  9. Event: SharePoint Saturday Ozarks
  10. Blog: Mark Rackley

 

SharePoint Saturday Houston

In this quick post, I talk about my presentation of “Saving SharePoint” at SharePoint Saturday Houston in a few days (Saturday, May 1st).

I’d normally have posted some information about this a bit earlier, but the last few weeks have been a bit of a whirlwind given the new job.

SharePointSaturday

This Saturday, May 1st, I’ll be speaking at SharePoint Saturday Houston.  I’m already here (in Houston) on business this week, and SharePoint Saturday Houston represents a great way to wrap up the week before heading back to Cincinnati!

I’ll be presenting “Saving SharePoint,” the talk that I’ve given (both solo and with my cohort in crime, John Ferringer) at a number of SharePoint Saturday events.  In the talk, I discuss SharePoint disaster recovery, key terms and concepts for speaking the “DR lingo,” and the tools that SharePoint comes with to help you protect your data.  A substantial portion of the talk also focuses on DR procedures and business practices that anyone tasked with DR responsibilities needs to understand to effectively carry out their duties.

I hope to see you this Saturday!

Additional Reading and References

  1. Event: SharePoint Saturday Houston
  2. People: John Ferringer

A New Chapter

If you check this blog with any degree of regularity, then you know that I’ve been relatively quiet for the last couple of months.  I haven’t really posted anything new in some time, my tweets have been fewer in number (not that I’m a generator of high traffic on Twitter anyway), and I’ve generally been laying low.  This is due in part to writing for the upcoming SharePoint 2010 Disaster Recovery Guide, but writing isn’t really the largest reason I’ve been “sparse” as of late.

Idera Software

For a few months now, I’ve been in a state of transition with regard to both my career and my employer.  Now that all of the discussions are over, the details have been finalized, and I’m on my way to Houston for a week, I’m excited to announce that I’ve joined Idera as their Product Manager for SharePoint Products!  The press release with some additional details can be found at this link.

For those of you who may not be familiar with the name, Idera is a software company that is based out of Houston, Texas.  Idera makes tools for SharePoint, SQL Server, and PowerShell.  In my new role with them, I’ll be part of the team that is working to craft the next generation of Idera’s backup and restore tools.  This excites me on so many levels!

I’ve actually had a relationship with Idera for the better part of a year now, and it has been nothing but positive.  John Ferringer (my DR Guide co-author) and I wrote a SharePoint 2007 Disaster Recovery Overview whitepaper for Idera, and we also presented a webcast on SharePoint Disaster Recovery Essential Guidelines through Idera.  On top of that, I was part of an “Ask the Experts” session at the SharePoint Conference 2009 that Idera sponsored, and I am also a member of Idera’s Technical Advisory Board for SharePoint Products.  When I had determined that I’d be moving on from my previous employer and mentioned my situation to Idera, elements in the SharePoint universe actually seemed to align in my favor for once.

Given the degree to which many of my “extracurricular” activities (that is, writing and speaking) have focused on disaster recovery and the SharePoint platform, I think the new position is going to be a great fit.  The match-up is wonderful in a number of ways:

  • Though I worked with SharePoint as a consultant with my previous company, I was always one step removed from the platform.  With Idera, I’ll be working on products that specifically target SharePoint – a big win in my book.
  • About a year and a half ago, I made it a goal to get more involved in the SharePoint community.  I wanted to participate more, give back some of what I had gotten, and host of other things.  I see this position as a great way to continue those efforts in a way that helps both me and the company I work for.
  • When it comes to SharePoint, I’ve always had one foot in the development world and one foot in the infrastructure/IT pro world.  Most of the development work I’ve done for SharePoint has focused on core plumbing, interop with other systems, performance improvement, and general tools.  I’d be hard-pressed to find a better fit in this regard than Idera!

Though Idera is headquartered in Houston, I’ll still be staying in Cincinnati.  I will be in Texas all week, though, to meet with my team, discuss strategies, and get myself “into the game,” so to speak.

If you see me around at a conference, SharePoint Saturday event, or anywhere else, please stop me and let me know what you think of Idera’s products.  Make sure you share your thoughts on what you think should be done to make them better, too.  From now on, I’ll be in a unique position to do something with the feedback!

Additional Reading and References

  1. Social Networking: Twitter
  2. Book: SharePoint 2010 Disaster Recovery Guide
  3. Announcement: Sean McDonough joins Idera
  4. Companies: Idera
  5. People: John Ferringer
  6. Whitepaper: Protect your SharePoint Content: An Overview of SharePoint 2007 Disaster Recovery
  7. Webcast: SharePoint 2007 Disaster Recovery Essential Guidelines
  8. Event: SPC 2009 “Ask the Experts” Session
  9. Announcement: Idera’s SharePoint Technical Advisory Board

Upcoming Activities (March 2010)

In this post, I discuss some of my activities for the next couple of months. These include the INTERalliance TechOlympics, SharePoint Saturday Michigan, and continuing efforts to get the SharePoint 2010 Disaster Recovery Guide ready for product launch.

2010 is in full-swing, and there seems to be no shortage of activities for me to jump into!  If anything, I need more free time to take on some of the stuff I really want to sink my teeth into (such as a SharePoint 2010 CodePlex project I want to have ready for RTM).  Until I have something more tangible in hand, though, I’ll avoid talking about that topic any further.

Here are some of the things occupying my free time in the short-to-mid term:

TechOlympics Expo 2010

The TechOlympics Expo is the type of event every adult geek wishes they had when they were in high school – a weekend lock-in featuring technical competitions, cool toys, games of every imaginable sort, and pretty much everything else that would get a teenage gearhead jazzed-up.  The underlying goal of the event is to get high school kids interested in technology, careers in technology, and technical opportunities in the Cincinnati area.

The event (on March 5-7) is being put on by the INTERalliance of Greater Cincinnati, and my involvement in the event is kind of a curious thing.  My primary client of the past 2+ years is a big backer of (and heavily invested in) the INTERalliance, so naturally they kick-in help whenever events come up.  I helped the INTERalliance through a last-minute (and somewhat ugly) technical hurdle involving SMS voting for their PharaohFest event last October, and I suspect that played a part in my being asked to help out with the TechOlympics.

With the TechOlympics, I’m part of a team that’s working to make all the “technical stuff” (behind-the-scenes and otherwise) happen.  My responsibilities seem to shift a bit each day, but the bulk of what I’ve been working on is coordinating network logistics and services, translating “the vision” into technical infrastructure, providing some guidance on applications being written to support the event, and generally doing my best at “collision avoidance” to ensure that we don’t miss anything important for the event.

I’m confident that the event is going to be incredible, and it’s been a lot of fun doing the planning thus far.  Seeing everything come together is going to be neat – both for me and for everyone else who has been laboring to make the magic happen!

SharePoint Saturday Michigan

What would an “Upcoming Activities” post be without a SharePoint Saturday announcement!  The next one I’ll be attending is SharePoint Saturday Michigan in Ann Arbor on March 13th.  I’ll be presenting “Saving SharePoint,” the disaster recovery talk that John Ferringer and I have been delivering at various SharePoint Saturday events around the region.  I’ll be flying solo this time around, though, as John has some other things going on that weekend.

SharePoint Saturday Michigan As always, SharePoint Saturday events are free and open to the public.  If you have any interest in learning more about SharePoint, getting some free training, or simply networking and meeting other professionals in the SharePoint space, please sign up!

SharePoint 2010 Disaster Recovery Guide

This announcement is last, but it’s definitely not least.  Some of you are aware, but for those who aren’t: John and I have been working on the SharePoint 2010 Disaster Recovery Guide for a while now.  I’m not going to lie – it’s slow going.  Personally, I’m a very slow writer, and the process itself is exceptionally labor-intensive.  Nevertheless, we’re making progress – one page at a time.

Our goal (and Cengage’s goal for us) is to have the book ready for SharePoint 2010 RTM.  I haven’t seen or heard anything official from Microsoft, but rumor has it that SharePoint 2010 will probably be out sometime in June.  If that’s the case, then John and I are on-track.

If you have suggestions for us, particularly if you read the first book, we would love to hear them.  We’re incorporating a few that we already received (for example, a chapter that covers some real world use-cases), but our ears are open and listening.  We know that DR isn’t a topic that gets everyone overly hot and bothered (unless they’ve lost everything at some point, of course), but our goal is to make the book as useful as possible.  We’d love your help!

Additional Reading and References

  1. Site: CodePlex
  2. Event: TechOlympics Expo 2010
  3. Organization: The INTERalliance of Greater Cincinnati
  4. Event: PharaohFest
  5. Event: SharePoint Saturday Michigan
  6. Partner In Crime: John Ferringer on Twitter
  7. Book: SharePoint 2010 Disaster Recovery Guide