Launching Your SPO Site or Portal

In this short post I cover the SharePoint Online (SPO) Launch Scheduling Tool and why you should get familiar with it before you launch a new SPO site or portal.

Getting Set To Launch Your SPO Site?

I’ve noted that my style of writing tends to build the case for the point I’m going to try to make before actually getting to the point. This time around, I’m going to lead with one of my main arguments:

Don’t do “big bang” style launches of SPO portals and sites; i.e., making your new SPO site available to all potential users as once! If you do, you may inadvertently wreck the flawless launch experience you were hoping (planning?) for.

Why "Big Bang" Is A Big Mistake

SharePoint Online (SPO) is SharePoint in the cloud. One of the benefits inherent to the majority of cloud-resident applications and services is “elasticity.” In case you’re a little hazy on how elasticity is defined and what it affords:

“The degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible”

This description of elasticity helps us understand why a “big bang”-style release comes with some potential negative consequences: it goes against (rather than working with) the automatic provisioning and deprovisioning of SPO resources that serve-up the site or portal going live.

SPO is capable of reacting to an increase in user load through automated provisioning of additional SharePoint servers. This reaction and provisioning process is not instantaneous, though, and is more effective when user load increases gradually rather than all-at-once.

The Better Approach

Microsoft has gotten much better in the last bunch of years with both issuing (prescriptive) guidance and getting the word out about that guidance. And in case you might be wondering: there is guidance that covers site and portal releases.

One thing I feel compelled to mention every time I give a presentation or teach a class related to the topic at hand is this extremely useful link:

https://aka.ms/PortalHealth

The PortalHealth link is the “entry point” for planning, building, and maintaining a healthy, high performance SPO site/portal. The page at the end of that link looks like this:

I’ve taken the liberty of highlighting Microsoft’s guidance for launching portals in the screenshot above. In short, The CliffsNotes version of that guidance is this: “Launch in waves.”

The diagram that appears below is pretty doggone old at this point (I originally saw it in a Microsoft training course for SPO troubleshooting), but I find that it still does an excellent job of graphically illustrating what a wave-based/staggered rollout looks like.

Each release wave ends up introducing new users to the site. By staggering the growing user load over time, SPO’s automated provisioning mechanisms can react and respond with additional web front-ends (WFEs) to the farm (since the provisioning process isn’t instantaneous). An ideal balance is achieved when WFE capacity can be added at a rate that keeps pace with additional users in the portal/site.

Are There More Details?

As a matter of fact, there are.

In July of this year (2021), Microsoft completed the rollout of its launch scheduling tool to all SPO environments (with a small number of exceptions). The tool not only schedules users, but it manages redirects so that future waves can’t “jump the gun” and access the new portal until the wave they’re in is officially granted access. This is an extremely useful control mechanism when you’re trying to control potential user load on the SPO environment.

The nicest part of the scheduling tool (for me) is the convenience with which it is accessed. If you go to your site’s Settings dropdown (via the gear icon), you’ll see the launch scheduler link looking you in the face:

There is some “fine print” that must be mentioned. First, the launch scheduling tool is only available for use with (modern) Communication Sites. In case any of you were still hoping (for whatever reason) to avoid going to modern SharePoint, this is yet another reminder that modern is “the way” forward …

If you take a look at a screenshot of the scheduler landing page (below), you’ll note the other “fine print” item I was going to mention:

Looking at the lower left quandrant of the image, you’ll see a health assessment report. That’s right: much like a SharePoint root site swap, you’ll need a clean bill of health from the SharePoint Page Diagnostics Tool before you can schedule your portal launch using the launch scheduling tool. 

Microsoft is trying to make it increasingly clear that poorly performing sites and portals need to addressed; after all, non-performant portals/sites have the potential to impact every SPO tenant associated with the underlying farm where your tenant resides.

(2021-09-15) ADDENDUM: Scott Stewart (Senior Program Manager at Microsoft and all-around good guy) pinged me after reading this post and offered up a really useful bit of additional info. In Scott’s own words: “It may be good to also add in that waves allow the launch to be paused to fix any issues with custom code / web parts or extensions and is often what is needed when a page has customizations.” 

As someone who’s been a part of a number of portal launches in the past, I can attest to the fact that portal launches seldom go off without a hitch. The ability to pause a launch to remediate or troubleshoot a problem condition is worth scheduler-controlled rollout alone!

Conclusion

The Portal Launch Scheduler is a welcome addition to the modern SharePoint Online environment, especially for larger companies and organizations with many potential SPO users. It affords control over the new site/portal launch process to control load and give the SPO environment the time it needs to note growing user load and provision additional resources. This helps to ensure that your portal/site launch will make a good (first) impression rather than the (potentially) lousy one that would be made with a “big bang” type of launch.

References and Resources

SPFest and SPO Performance

In this brief post, I talk about my first in-person event (SPFest Chicago) since COVID hit. I also talk about and include a recent interview with the M365 Developer Podcast.

It's Alive ... ALIVE!

SharePoint Fest Chicago 2021I had the good fortune of presenting at SharePoint Fest Chicago 2021 at the end of July (about a month ago). I was initially a little hesitant on the drive up to Chicago since it was the first live event that I was going to do since COVID-19 knocked the world on its collective butt.

Although the good folks at SPFest required proof of vaccination or a clear COVID test prior to attending the conference, I wasn’t quite sure how the attendees and other speakers would handle standard conference activities. 

Thankfully, the SPFest folks put some serious thought into the topic and had a number of systems in-place to make everyone feel as “at ease” as possible – including a clever wristband system that let folks know if you were up for close contact (like a handshake) or not. I genuinely appreciated these efforts, and they allowed me to enjoy my time at the conference without constant worries.

Good For The Soul

I’m sure I’m speaking for many (if not all) of you when I say that “COVID SUCKS!” I’ve worked from my home office for quite a few years now, so I understand the value of face-to-face human contact because it’s not something I get very often. With COVID, the little I had been getting dropped to none.

I knew that it would be wonderful to see so many of my fellow speakers/friends at the event, but I wasn’t exactly prepared for just how elated I’d be. I’m not one to normally say things like this, but it was truly “good for my soul” and something I’d been desperately missing. It truly was, and I know I’m not alone in those thoughts and that specific perception.

Although these social interactions weren’t strictly part of the conference itself, I’d wager that they were just as important to others as they were to me.

There are still a lot of people I haven’t caught up with in person yet, but I’m looking forward to remedying that in the future – provided in-person events continue. I still owe a lot of people hugs.

Speaking Of ...

In addition to presenting three sessions at the conference, I also got to speak with Paul Schaeflein and talk about SharePoint Online Performance for a podcast that he co-hosts with Jeremy Thake called the M365 Developer Podcast. Paul interviewed me at the end of the conference as things were being torn down, and we talked about SharePoint Online performance, why it mattered to developers, and a number of other topics.

I’ve embedded the podcast below:

Paul wasn’t actually speaking at the conference, but he’s a Chicagoan and he lived not-too-far from the conference venue … so he stopped down to see us and catch some interviews. It was good to catch up with him and so many others.

The interview with me begins about 13 minutes into the podcast, but I highly recommend listening to the entire podcast because Paul and Jeremy are two exceptionally knowledgeable guys with a long history with Microsoft 365 and good ol’ SharePoint.

CORRECTION (2021-09-14): in the interview, I stated that Microsoft was working to enable Public CDN for SharePoint Online (SPO) sites. Scott Stewart reached-out to me recently to correct this misstatement. Microsoft isn’t working to automatically enable Public CDN for SPO sites but rather Private CDN (which makes a lot more sense in the grand scheme of things). Thanks for the catch, Scott!

References and Resources

  1. Conference: SharePoint Fest Chicago 2021
  2. Centers for Disease Control and Prevention: COVID-19
  3. Blog: Paul Schaeflein
  4. Blog: Jeremy Thake
  5. Podcast: M365 Developer Podcast

What CDN Usage Does for SharePoint Online (SPO) Performance

If you need the what’s what on CDNs (content delivery networks), this is a bit of quick reading that will get you up to speed with what a CDN is, how to configure your SPO tenant to use a CDN, and the benefits that CDNs can bring.

The (Not Entirely Obvious) TL;DR Answer

CDN

Since I’m taking the time to write about the topic, you can safely guess that yes, CDNs make a difference withSPO page operations. In many cases, proper CDN configuration will make a substantial difference in SPO page performance. So enable CDN use NOW!

The Basis For That Answer: Introduction

Knowing that some folks simply want the answer up-front, I hope that I’ve satisfied their curiosity. The rest of this post is dedicated to explaining content delivery networks (CDNs), how they operate, and how you can easily enable them for use within your SharePoint Online (SPO) sites.

Let me first address a misconception that I sometimes encountered among SPO administrators and developers (including some MVPs) – that being that CDNs don’t really “do a whole lot” to help site and/or page performance. Sure, usage of a CDN is recommended … but a common misunderstanding is that a CDN is really more of a “nice-to-have” than “need-to-have” element for SPO sites. Of the people saying such things, oftentimes that judgment comes without any real research, knowledge, or testing. Skeptics typically haven’t read the documentation (the “non-RTFM crowd”) and haven’t actually spent any time profiling and troubleshooting the performance of SPO sites. Since I enjoy addressing perf. problems and challenges, I’ve been fortunate to experience firsthand the benefits that CDNs can bring. By the end of this post, I hope I’ll have made converts of a CDN skeptic or two.

What Is A CDN?

Abstract Network

A CDN is a Content Delivery Network. There are a lot of (good) web resources that describe and illustrate what CDNs are and how they generally operate (like this one and this one), so I’m not going to attempt to “add value” with my own spin. I will simply call attention to a couple of the key characteristics that we really care about in our use of CDNs with SPO.

  1. A CDN, at its core, can be thought of as a system of distributed (typically geographically so) servers for caching and offloading of SPO content. Rather than needing to go to the Microsoft network and data center where your tenant is located in order to fetch certain files from SPO, your browser can instead go to a (geographically) closer CDN server to get those same files.
  2. By virtue of going to a closer CDN instead of the Microsoft network, the chance that you’ll have a “bigger pipe” with more bandwidth – and less latency/delay – are greater. This usually translates directly to an improvement in performance.
  3. In addition to giving us the opportunity to download certain SPO files faster and with less delay, CDNs can do other things to improve the experience for the SPO files they serve. For instance, CDN servers can pass files back to the browser with cache-control headers that allow browsers to re-serve downloaded files to other users (i.e, to users who haven’t actually download the files), store downloaded files locally (to avoid having to download them again for a period of time), and more.

If you didn’t know about CDNs prior to this post, or didn’t understand how they could help you, I hope you’re beginning to see the possibilities!

The Arrival Of The Office 365 CDN

It wasn’t all that long ago that Microsoft was a bit more “modest” in its use of CDNs. Microsoft certainly made use of them, but prior to the implementation of its own content delivery networks, Microsoft frequently turned to a company called Akamai for CDN support.

When I first started presenting on SharePoint and its built-in caching mechanisms, I often spoke about Akamai and their edge network when talking about BLOB caching and how the max-age cache-control header could be configured and misconfigured. Back then, “Akamai” was basically synonymous with “CDN,” and that’s how many of us thought about the company. They were certainly leading the pack in the CDN service space.

Back then, if you were attempting to download a large file from Microsoft (think DVD images, ISO files, etc.), then there was a good change that the download link your browser would receive (from Microsoft’s servers) would actually point to an Akamai edge node near your location geographically instead of a Microsoft destination.

Fast forward to today. In addition to utilizing third-party CDNs like those deployed by Akamai, Microsoft has built (and is improving) their own first-party CDNs. There are a couple of benefits to this. First, many data regulations you may be subject to that prevent third-party housing of your data (yes, even in temporary locations like a CDN) can be largely avoided. In the case of CDNs that Microsoft is running, there is no hand-off to a third party and thus much less practical concern regarding who is housing your data.

Second, with their own CDNs, Microsoft has a lot more latitude and ability to extend the specifics of CDN configuration and operation its customers. And that’s what they’ve done with the Office 365 CDN.

Set Up The O365 CDN For Tenant’s Use

Now we’re talking! This next part is particularly important, and it’s what drove the creation of this post. It’s also the one bit of information that I promised Scott Stewart at Microsoft that I would try to get “out in the wild” as quickly and as visibly as possible.

So, if you remember nothing else from this post,please remember this:

Set-SPOTenantCdnEnabled -CdnType Public -Enable $true

That is the line of PowerShell that needs to be executed (against your SPO tenant, so you need to have a connection to your tenant established first) to enable transparent CDN support for public files. Run that, and non-sensitive files of public origin from SPO will begin getting cached in a CDN and served from there.

The line of PowerShell I shared goes through the SharePoint Online Management Shell – something most organizations using SPO (and their admins in particular) have installed somewhere.

It is also possible to enable CDN support if you’re using the PNP PowerShell module, if that’s your preference, by executing the following PowerShell:

Set-PnPTenantCdnEnabled -CdnType Public -Enable $true

No matter how you enable the CDN, it should be noted that the PowerShell I’ve elected to share (above) enables CDN usage for files of public origin only. It is easy enough to alter the parameters being passed in our PowerShell command so as to cover all files, public and private, by switching -CdnType to Both (with the SPO management shell) or executing another line of PowerShell after the first that swaps –type Public with –type Private (in the case of the SharePointPnP PowerShell module).

The reason I chose only public enablement is because your organization may be bound by restrictions or policies that prohibit or limit CDN use with private files. This is discussed a bit in the O365 CDN post originally cited, but it’s best to do your own research.

Enabling CDN support for public files, however, is considered to be safe in general.

What Sort Of Improvements Can I Potentially See?

I’ve got a series of images that I use to illustrate performance improvements when files are served via CDN instead of SPO list/library, and those files are from Microsoft. Thankfully, MS makes the images I tend to use (and a discussion of them) free available, and they are presented at this link for your reading and reference.

The example that is called out in the link I just shared involves offloading of the jQuery JavaScript library from SPO to CDN. The real world numbers that were captured reduced fetch-and-load time from just over 1.5 seconds to less than half a second (<500ms). That is no small change … and that’s for just one file!

The Other (Secret) Benefit Of CDNs

I guess “Secret” is technically the wrong choice of term here. A more accurate description would be to say that I seldom hear or see anyone talking about another CDN benefit I consider to be very important and significant. That benefit, quite simply, involves improving file fetching and retrieval parallelism when a web page and associated assets (CSS, JS, images, etc.) are requested for download by your browser. In plain English: CDNs typically improve file downloading by allowing the browser to issue a greater number of concurrent file requests.

To help with this concept and its explanation, I’ve created a couple of diagrams that I’ll share with you. The first one appears below, and it is meant to represent the series of steps a browser might execute when retrieving everything needed to show a (SharePoint/SPO) page. As we’ve talked about, what is commonly thought of as a single page in a SharePoint site is, more accurately, a page containing all sorts of dependent assets: image files, JavaScript files, cascading style sheets, and a whole bunch more.

A request for a SharePoint page housed at http://www.thesite.com might start out with one request, but your browser is going to need all of the files referenced within the context of that page (default.aspx, in our case) to render correctly. See below:

To get what’s needed to successfully render the example SharePoint page without CDN support, we follow the numbers:

  1. Your browser issues an HTTP request for the page you want to load – http://www.thesite.com/default.aspx in the case of example above.
  2. That page request goes to (and is served by) the web server/front-end that can return the page.
  3. Our page needs other files to render properly, like styling.css, logo.png, functions.js, and more. These get queued-up and returned according to some rules – more on this in a minute.
  4. In step four (4), files get returned to the browser. Notice I say “no more than six at a time” in the illustration. That’s important and will come into play once we start introducing CDN support to the page/site.

You might be wondering, “Only six files at a time? Really? Why the limitation?” Well, I should start by saying the limit is probably six … maybe a bit more, perhaps a bit less. It depends on the browser you’re using what the specific number is. There was a good summary answer on StackOverflow to a related (but slightly different) question that provides some additional discussion.

Section eight (8) of the HTTP specification (RFC 2616) specifically addresses HTTP connections, how they should be handled, how proxies should be negotiated, etc. For our purposes, the practical implementation of the HTTP specification by modern browsers generally limits the number of concurrent/active connections a browser can have to any given host or URL to six (6).

Notice how I worded that last sentence. Since you folks are smart cookies, I’ll bet you’re already thinking “Wait a minute. CDNs typically have different URLs/hosts from the sites they cache” and you’re imaging what happens (or can happen) when a new source (i.e., different host/URL) is introduced.

This illustration roughly outlines the fetch process when a CDN is involved:

Steps one (1) through four (4) of the fetch process with a CDN are basically still the same as was illustrated without a CDN a bit earlier. When the page is served-up in step three (3) and returned in step four (4), though, there are some differences and additional activity taking place:

  1. Since at least one CDN is in-use for the SPO environment, some of the resource links within the page that is returned will have different URLs. For instance, whereas styling.css was previously served from the SPO environment in the non-CDN example, it might now be referenced through the CDN host shown as http://cdn.source.com/styling.css
  2. The requested file is retrieved, and …
  3. Files come back to the client browser from the CDN at the same time they’re being passed-back from the SPO environment.

Since we’re dealing with two different URLs/hosts in our CDN example (http://www.thesite.com and cdn.source.com), our original six (6) file concurrent download limitation transforms into a 12 file limitation (two hosts serving six files a time, 2 x 6 = 12).

Whether or not the CDN-based process is ultimately faster than without a CDN depends on a great many factors: your Internet bandwidth, the performance of your computer, the complexity/structure of the page being served-up, and more. In the majority of cases, though, at least some performance improvement is observed. In many cases, the improvement can be quite substantial (as referenced and discussed earlier).

Additional Note: 8/24/2020

In a bit of laziness on my part, I didn’t do a prior article search before writing this post. As fate would have it, Bob German (a friend and fellow MVP – well, he was an MVP prior to joining Microsoft a couple of years back) wrote a great post at the end of 2017 that I became aware of this morning with a series of tweets. Bob’s post is called “Choosing a CDN for SharePoint Client Solutions” and is a bit more developer-oriented. That being said, it’s a fantastic post with good information that is a great additional read if you’re looking for more material and/or a slightly different perspective. Nice work, Bob!

Post Update: 8/26/2020

Anders Rask was kind enough to point out that the PnP PowerShell line I originally had listed wasn’t, in fact, PnP PowerShell. That specific line of PowerShell has since been updated to reflect the correct way of altering a tenant’s CDN with the PnP PowerShell cmdlets. Many thanks for the catch, Anders!

Conclusion

So, to sum-up: enable CDN use within your SPO tenant. The benefits are compelling!

References

  1. Microsoft Docs: Use The Office 365 Content Delivery Network (CDN) With SharePoint Online
  2. Imperva: What Is A CDN?
  3. Akamai: What Does CDN Stand For?
  4. MDN Web Docs: Cache-Control
  5. Company: Akamai
  6. Presentations: Caching-In For SharePoint Performance
  7. Akamai: Download Delivery
  8. Microsoft Docs: Configure Cache Settings For A Web Application In SharePoint Server
  9. Blog Post: Do You Know What’s Going To Happen When You Enable The SharePoint BLOB Cache?
  10. LinkedIn: Scott Stewart
  11. Microsoft Docs: Enabling O365 CDN support for public origin files.
  12. Microsoft Docs: Get Started With SharePoint Online Management Shell
  13. Microsoft Docs: PnP PowerShell Overview
  14. Microsoft Docs: Set Up And Configure The Office 365 CDN By Using PnP PowerShell
  15. Microsoft Docs: What Performance Gains Does A CDN Provide?
  16. Push Technologies: Browser Connection Limitations
  17. StackOverflow: How many maximum number of simultaneous Chrome connections/threads I can start through Selenium WebDriver?
  18. W3.org: RFC 2616, Section 8: Connection

One Tool to Rule Them All

Microsoft released the second iteration of its Page Diagnostics Tool for SharePoint. If you have an SPO site, you NEED this tool in your toolbox!

Last week, on Wednesday, September 18th, 2019, Microsoft released the second iteration of its Page Diagnostics Tool for SharePoint. An announcement was made, and the Microsoft Docs site was updated, but the day passed with very little fanfare in most circles.

“The One Ring” by Mateus Amaral is licensed under CC BY-NC-ND 4.0 

In my opinion, there should have been fireworks. Lots of fireworks.

What is it?

If you’re not familiar with the Page Diagnostics Tool for SharePoint, then I need to share a little history on how I came to be “meet” this tool.

Back in 2018, the SharePoint Conference North America (SPCNA) was rebooted after having been shutdown as part of Microsoft’s consolidation of product-specific conferences a number of years earlier. I had the good fortune of making the cut to deliver a couple of sessions at the conference: “Making the Most of OneDrive for Business and SharePoint Online” and “Understanding and Avoiding Performance Pitfalls with SharePoint Online.”

Sometime in the months leading up to the conference, I received an email from out-of-the-blue from a guy named Scott Stewart – who at the time was a Senior Program Manager for OneDrive and SharePoint Engineering. In the email, Scott introduced himself, what he did in his role, and suggested that we collaborate together for the performance session I was slated to deliver at SPCNA.

I came to understand that Scott and his team were responsible for addressing and remedying many of the production performance issues that arose in SharePoint Online (SPO). The more that Scott and I chatted, the more it sounded like we were preaching many of the same things when it came to performance.

One thing Scott revealed to me was that at the time, his team had been working on a tool to help diagnose SPO performance issues. The tool was projected to be ready around the time that SPCNA was happening, so I asked him if he’d like to co-present the performance session with me and announce the tool to an audience that would undoubtedly be eager to hear the news. Thankfully, he agreed!

The audience for our performance talk at SPCNA 2018

Scott demo’d version one (really it was more like a beta) during our talk, and the demo demons got the better of him … but shortly after the conference, v1.0 of the tool went live and was available to download as a Chrome browser extension.

So, what does it do?

Simply put, the Page Diagnostics Tool for SharePoint analyzes your browser’s interaction with SPO and points out conditions and configurations that might be adversely affecting your page’s performance.

The first version of the tool only worked for classic publishing pages. And as a tool, it was only available as a Google Chrome Extension:

The Page Diagnostics for SharePoint extension in the Google Chrome Store

The second iteration of the tool that was released last Thursday addresses one of those limitations: it analyzes both modern and classic SharePoint pages. So, you’re covered no matter what’s on your SPO site.

What Can the Tool Tell Me?

For one thing, the tool can get you the metrics I’ve highlighted that are relevant to diagnosing basic page performance issues – most notably, SPRequestDuration and SPIisLatency. But it can do so much more than that!

Many of the adverse performance conditions and scenarios I’ve covered while speaking and in blog posts (such as this one here) are analyzed and called-out by the tool, as well as many other things/conditions, such as navigational style used, whether or not content deployment networks (CDNs) are used by your pages, and quite a few more.

And finally, the tool provides a simple mechanism for retrieving round-trip times for pages and page resource requests. It eliminates the need to pull up Fiddler or your browser’s debug tools to try and track down the right numbers from a scrolling list of potentially hundreds of requests and responses.

How Do I Use It?

It’s easy, but I’ll summarize it for you here.

1. Open the Chrome Web Store. Currently, the extension is only available for Google Chrome. Open Chrome and navigate to https://chrome.google.com/webstore/search/sharepoint directly or search for “SharePoint” in the Chrome Web Store. However you choose to do it, you should see the Page Diagnostics Tool for SharePoint entry within the list of results as shown below.

2. Add the Extension to Chrome. Click the Add to Chrome button. You’ll be taken directly to the diagnostic tool’s specific extension page, and then Chrome will pop up a dialog like the one seen below. The dialog will describe what the tool will be able to do once you install it, and yes: you have to click Add Extension to accept what the dialog is telling you and to actually activate the extension in your browser.

3. Navigate to a SharePoint Online page to begin diagnosing it. Once you’ve got the extension installed, you should have the following icon in the tool area to the right of the URL/address bar in Chrome:

To illustrate how the tool works, I navigated to a modern Communication Site in my Bitstream Foundry tenant:

I then clicked on the SharePoint Page Diagnostics Tool icon in the upper right of the browser (as shown above). Doing so brings up the Page Diagnostics dialog and gives me some options:

Kicking off an analysis of the current page is as simple as clicking the Start button as shown above. Once you do so, the page will reload and the Tool dialog will change several times over the course of a handful of seconds based on what it’s loading, analyzing, and attempting to do.

When the tool has completed its analysis and is ready to share some recommendations, the dialog will change once again to show something similar to what appears below.

Right off the bat, you can see that the Page Diagnostics Tool supplies you with important metrics like the SPRequestDuration and SPIIsLatency – two measures that are critical to determining where you might have some slowdown as called out in a previous blog post. But the tool doesn’t stop there.

The tool does many other things – like look at the size of your images, whether or not you’re using structural navigation (because structural navigation is oh so bad for your SPO site performance), if you’re using content delivery networks (CDNs) for frequently used scripts and resources, and a whole lot more.

Let’s drill into one of the problem items it calls out on one of my pages:

The tool explains to me, in plain English, what is wrong: Large images detected. An image I’m using is too large (i.e., larger than 300KB). It supplies the URL of the image in question so that I’m not left wondering which image it’s calling out. And if I want to know why 300KB is special or simply learn about the best way to handle images in SharePoint Online, there’s a Learn More link. Clicking that link takes me to this page in Microsoft Docs:

Targeted and detailed guidance – exactly what you need in order to do some site fixup/cleanup in the name of improving performance.

Wrapping-Up

There’s more that the tool can do – like provide round trip times for pages and assets within those pages, as well as supply a couple of data export options if you want to look at the client/server page conversation in a tool that has more capabilities.

As a one-stop shop tool, though, I’m going to basically start recommending that everyone with an SPO site start downloading the tool for use within their own tenants. There is simply no other tool that is easier and more powerful for SharePoint Online sites. And the price point is perfect: FREE!

The next time you see Scott Stewart, buy him a beer to thank him for giving us something usable in the fight against poorly performing SPO sites.

References and Resources

  1. Company: Microsoft
  2. Browser Extension: Page Diagnostics for SharePoint
  3. Microsoft Docs: Use the Page Diagnostics for SharePoint tool
  4. Conference: The SharePoint Conference North America
  5. Presentation Resource: Making the Most of OneDrive for Business and SharePoint Online
  6. Presentation Resource: Understanding and Avoiding Performance Pitfalls with SharePoint Online
  7. LinkedIn: Scott Stewart
  8. Blog Post: The Five-Minute Page Performance Troubleshooting Guide for SharePoint Online
  9. Blog Post: Caching, You Ain’t No Friend of Mine
  10. Tool: Telerik Fiddler
  11. Web Page: Chrome Web Store Extensions
  12. Microsoft Docs: Optimize images in SharePoint Online modern site pages

Obtaining Performance Metrics for SharePoint Online Modern Pages

In this post, I’ll show you how to obtain page performance core metrics from Modern SharePoint Online pages. It’s easier and more reliable than trying to obtain the same data from classic pages.

Background

It was quite some time ago that I wrote my Five-Minute Page Performance Troubleshooting Guide for SharePoint Online – a little over a year-and-a-half ago, actually. Since that time, SharePoint Online (SPO) has continued to evolve relentlessly. In fact, one slide I’ve gotten into the habit of showing during my SPO talks and presentations is the following:

FiveYearWarning

The slide usually gets the desired response of laughter from attendees, but it’s something I feel I have to say … because like so many things that seem obvious, there’s some real life basis for the inclusion of the slide:

OldPost

The exchange shown above was the result of someone commenting on a post I had shared about limitations I was running into with the SharePoint App Model. The issue didn’t have a solution or workaround at the time I’d written my post, but Microsoft had addressed it sometime later.

BestBeforeDateThis brief exchange highlights one of the other points I try hard to make while speaking: PAY ATTENTION TO DATES! It’s not safe to assume (if it ever was) that something you read online will stay accurate and/or relevant indefinitely.

In any case, I realize that much of what I share has a “born on date,” for lack of a better label. I’ll continue to share information; just note when something was written.

End of (slight) rant. Back to the real topic of this post.

Modern Pages

Since I had written the previous performance article, Microsoft’s been working hard to complete the transition to Modern SharePoint in SPO. I feel it’s a solid move on their part for a variety of reasons. Modern pages (particularly pages in communication sites) are much more WYSIWYG in nature, and SharePoint Framework (SPFx) web parts on modern pages make a whole lot of sense from a scalability perspective; after all, why assume load on the server (with classic web parts) when you can push the load to the client and use all the extra desktop/laptop power?

As good as they are, though, modern pages don’t obey the standard response header approach to sharing performance metrics. But not to worry: they do things more consistently and reliably (in my opinion).

Performance on a Modern Page

SPRequestDuration (the amount of time the server spent processing the page request) and (SP)IISLatency (the amount of time the page request waited on the server before getting processed) are critical to know when trying to diagnose potential page performance issues. Both of these are reported in milliseconds and give us some insight into what’s happening on the server-side of the performance equation.

Instead of trying to convey these values with response headers (as classic pages do – most of the time), modern pages share the same data within the body of the page itself.

Consider the following page modern page:

PerfPage

If this were a classic publishing page and we wanted to get the (SP)IISLatency and SPRequestDuration, we’d need to use our browser’s <F12> dev tools or something like Fiddler.

For modern pages, things are easier. We turn instead to the page source – not the response headers. Grab the page source (by right-clicking and selecting View page source) …

PerfPageSource

… and you’ll see something like the following:

SourceMetrics

Now, I’ll be the first to admit that you’ve got to have some sense of what you’re seeking within the page source – there’s a lot of stuff to parse through. Doing a simple <CTRL><F> search for iislatency or requestduration will land you on the content of interest. We’re interested in the metrics reported within the perf section:

PerfUpClose

The content of interest will be simple text, but the text is a JSON object that can be crunched to display values that are a bit easier to read:

Metrics

The other thing you’ll notice is that a lot of additional metrics are reported along with the page processing metrics we’ve been looking at. In a future post, I’ll try to break some of these down for you.

Conclusion

“Modern” is the future of SharePoint Online. If you haven’t yet embraced modern lists and pages, consider dipping your toe in the waters. As we’ve seen in this post, Modern also makes it easier to obtain performance metrics for our pages – something that will make page performance troubleshooting significantly more predictable and consistent.

References and Resources

  1. Blog Post: Five Minute Page Performance Troubleshooting Guide for SharePoint Online
  2. Office.com: SharePoint Classic and Modern Experiences
  3. Office.com: What Is A SharePoint Communication Site?
  4. Microsoft.com: Overview of the SharePoint Framework
  5. MDN: Response Header
  6. Telerik: Fiddler
  7. JSON Viewer: Code Beautify

The Five-Minute Page Performance Troubleshooting Guide for SharePoint Online

I regularly hear from SharePoint Online customers that their pages are slow … but they don’t know where to start troubleshooting. Is it the SPO servers? The network? Their page(s)? In this post, I’ll show you how to determine the general source of your slow pages in five minutes or less. It won’t solve your slow page(s) problem, but it will give you enough direction to know where to focus further analysis.

UPDATE (3/20/2018): As most of you who have been following-along in your own tenants know, this issue wasn’t actually truly resolved last September. For a while, in some cases, it looked like the SPIisLatency and SPRequestDuration headers came back. But the victory was fleeting, and since that time I’ve continued to get comments from people saying “but I don’t see them!” And while I had the headers for a while in my tenant, I haven’t seen them in any predictable fashion.

The good news is that after much hounding and making myself a royal pain-in-the-tuckus to Bill Baer and others at Microsoft, it looks like we FINALLY have the right engineering and dev teams engaged to look at this. We got traction on it this week, with multiple repro scenarios and Fiddler traces being passed around … so I’m truly hopeful we’ll see something before long. Stay tuned!

ItsBack

UPDATE (9/2/2017): As I was preparing slides for my IT/DevConnections talks, I decided to check on the issue of the missing Page Response Headers (SPIisLatency and SPRequestDuration). I went through three different tenants and several pages, and I’m happy to report that the headers now appear to be showing consistently. My thanks to Microsoft (I’ll credit Chris McNulty and Bill Baer – I had been pestering them) for rectifying the situation!

“Why is it so slow?” That’s how nearly every performance conversation I’ve ever had begins.

No one likes a slow intranet page, and everyone expects the intranet to just “come up” when they pop the URL into their favorite browser. From an end-user’s perspective, it doesn’t matter what’s happening on the back-end as long as the page appears quickly when someone tries to navigate to it.

SharePoint Online is a big black box to many of its users and consumers. They don’t understand what it takes to build an intranet, nor should they have to. The only thing that really matters to them is that they can bring up a browser, type in a URL, and quickly arrive at a landing page. The burden of ensuring that the site is optimized for fast loading falls to the folks in IT who are supposed to understand how everything works.

If you’re one of those folks in IT who is supposed to understand how everything works with SharePoint Online but doesn’t, then this blog post is for you. Don’t worry – I know there’s a lot to SharePoint Online, but performing some basic troubleshooting analysis for slow pages in SharePoint Online is pretty straightforward. I’ll share with you a handful of techniques to quickly ascertain if the reason for your slow pages is due to the content within the pages themselves, if the issue is network-related, or if there might be something else happening that is beyond your control.

Your Toolset

The first step in your performance troubleshooting adventure begins by opening up your browser from a client workstation. Everyone has a favorite browser, but I’m going to use and recommend Internet Explorer for this exercise because it has a solid set of development tools to assist you in finding and quantifying performance issues. In particular, it is able to chronologically list and detail the series of interactions that take place between your browser and the SharePoint Online web front-ends (WFEs) that are responding to your requests.

When recommending IE, some people ask “how come you don’t use Fiddler?” It’s a good question, and when I first started showing people how to do some quick troubleshooting, I’d do so with Fiddler. If you’re just starting out, though, Fiddler comes with one really big gotcha: operating inside an SSL tunnel. To get Fiddler (which is a transparent proxy) working with SSL, there is some non-trivial setup required involving certificate trusts. Since this is intended to be a quick and basic troubleshooting exercise, I figure it’s better to sidestep the issue altogether and use IE (which requires no special setup).

The Setup

To make this work, let us assume that I am attempting to profile the Bitstream Foundry (my company) intranet home page in order to understand how well it works – or doesn’t. My intranet home page is pretty plain by most intranet standards (remember: I’m a developer and IT Pro – not a designer), but it’s sufficient for purposes of discussion.

Step 1. Open Your Browser

I start by opening Internet Explorer and navigating to the Bitstream Foundry intranet home page at https://bitstreamfoundry.sharepoint.com. Once I move past the sign-in prompts, I’m shown my home page:

Bitstream Foundry Intranet Home Page

My home page has very little on it right now (I’m still trying to decide what would go best in the main region), but it is a SharePoint Online (SPO) page and it does work as a target for discussion purposes.

Step 2. Access the Developer Tools

Accessing the developer tools within Internet Explorer is extremely simple: either press F12, or go to the browser’s gear icon and select F12 Developer Tools from the drop-down that appears as seen below:

Accessing the IE Developer Tools

Doing either of these will pop-open the developer tools as either a stand-alone window or as a pane on the lower half of the browser as shown below:

Internet Explorer F12 Developer Tools

Step 3. PREPARE TO CAPTURE

When the developer tools first open, they’re commonly set to viewing the page structure on the DOM Explorer tab. For purposes of this troubleshooting exercise, we need to be on the Network tab so we can profile each of the calls the browser makes to the SPO WFE.

Select the Network tab and then select the “Always refresh from server” button as highlighted below in red.

Prepare for Capture

The Network tab is going to allow us to capture the series of exchanges between the SharePoint WFE and our browser as the browser fetches the elements needed to render the page. The “Always refresh from server” button is going to remove client-side caching from the picture by forcing the browser to always re-fetch all referenced content – even if it has a valid copy of one or more assets in the browser cache. This helps to achieve a consistent set of timing values between calls, and it’s also going simulate someone’s first-time visit to the page (which typically takes longer than subsequent visits) more accurately.

Step 4: Capture the Exchange

The next step is to capture the series of exchanges between IE and SPO. To do this, simply refresh the page by pressing the browsers Refresh button, pressing , or going to the browser’s address bar and re-issuing the page request.

The contents of the window on the Network tab will clear, and as content begins to flow into the browser, entries will appear on the screen. For every request that IE makes of SharePoint Online, a new line/entry will appear. It will probably take a handful of seconds to retrieve all page assets, and it’s not uncommon for a SharePoint page to have upwards of 75 to 100 resources (or more) to load.

Capture the Exchange with SPO

Strictly speaking, you shouldn’t have to stop the capture once the page has loaded, but there are several reasons why you would want to.  First, you will eventually retrieve all SharePoint assets necessary to render the page. If you continue to capture beyond this point, you’ll see the number of requests (represented in the bottom bar of the browser – the number is 83 requests in the screenshot above) continue to tick up. It will slowly go up over time and it’s not due to the contents of the SharePoint page – it’s due to Office 365.

If you look at the last entry in the screenshot above, you’ll see that it’s a request to https://outlook.office365.com/owa. In short: this is due to a background process that allows Exchange to notify you when you receive new messages and calendar/event notifications. See how the Protocol and Result/Description columns indicate a (Pending) state?

If you get to this point and additional SharePoint elements are no longer loading, press the red “recording stop” button in the toolbar of Network tab. This will stop the capture. Not only does this help to keep the captured trace “cleaner,” but it also prevents excessive distortion of certain values – like overall time to load and the graphical representation of the page load (shown on the far right of the Network tab) as shown below.

Page Timeline Distortion

Step 5: Find the SharePoint Page Request

At this point, you should have a populated Network tab with the entire dialog of requests that were needed to render your page. Of these requests, the overwhelming majority of them will be for JavaScript files (.js), cascading stylesheets (.css), and images (.png, .gif, and .jpg). Only one of them will be for the actual SharePoint page itself (.aspx) … and, of course, this is the request that you need to find in the list.

My intranet home page is named Home.aspx (as can be seen in the browser address bar), so I need to find the request for Home.aspx on the Network tab. I got lucky with this dialog attempt, because Home.aspx is the first entry listed. Note that this isn’t always the case, and it’s not uncommon to find your page request 10 or 20 down in the list.

Select the ASPX Page

When you locate the entry in the list for your .aspx page, click on it to select it. You can confirm that you’ve selected the right entry by verifying Request URL on the Headers tab to the right of the various requests listed for the exchange with SPO (highlighted in the image above).

Step 6: Analyze the Headers

At this point, we need to shift our focus to the HTTP Response Headers that are passed back with the content of the page. Much like the request headers that the browser sends to the server to provide information about the request being made, the response headers that are sent from the server supply the browser with all sorts of additional information about the page. This can include the size of the page (Content-Length), the payload (Content-Type), whether or not the page can be cached (Cache-Control), and more.

Making sure that you have the Headers tab selected, locate and record the three response headers as shown below:

Response Headers of Interest

The three values we want to record are:

  • SPIisLatency. This is a measure of the amount of time (in milliseconds) that the request spent queued and waiting to be processed by IIS (Internet Information Services – the web server). Ideally, it should be zero or very close to zero. In my example, the SPIisLatency is 3ms.
  • SPRequestDuration. This is the amount of time (again, in milliseconds) that it took to process the request on the server. Basically, this is the end-to-end processing time for the page. Healthy pages range from a couple hundred milliseconds to around a second depending on the content of the page. In my example, the SPRequestDuration is 249ms.
  • X-SharePointHealthScore. This is the value, from zero to ten, that indicates how heavily loaded the SharePoint Server is at the time when the page was served. A score of zero means the server is not under load, while a score of ten means the server is overloaded. As the X-SharePointHealthScore goes up, the server begins to selectively suspend work designated as “low priority,” like some Timer Service jobs, Search requests, and various other low-priority tasks. ideally, this value should be zero – or close to it. In my example, the value is zero.

We can infer a great deal about the page processing and network traversal of our page request with just these three values and a final number.

A quick note (2017-07-06): For some reason, a variety of SharePoint Online sites have been returning pages without the SPIisLatency and SPRequestDuration headers lately. I don’t know why this is happening, and I’ve reached out to Microsoft to see if it’s a bug or part of some larger strategy. I don’t think it’s deliberate, because the headers provide some of the only insight end-users can get into SharePoint Online page performance. When I hear something from the product team, I’ll post it here!

The Magical Trio: SPIisLatency, SPRequestDuration, and Total Trip Time

So, you’ve now got three numbers – two of which are helpful for page profiling (SPIisLatency and SPRequestDuration), and a third number (X-SharePointHealthScore) which will tell you how stressed the server was when it served your page. What can you do with them? As it turns out, quite a bit when you combine two of the three with a fourth number.

What is the fourth number? It’s the total trip time that is reported for the page being loaded, and it represents the elapsed time from the point at which the page was requested until the time when the last byte of the page was delivered. For example, I profiled my Bunker Tuneage site. It’s a SharePoint Online site (yes, I know – I have to get it moved to another location soon), so it makes a good target for analysis:

Bunker Tuneage Page Profile

In the above example, the three numbers we’re most interested in are:

  • Total Trip Time: 847.47ms
  • SPRequestDuration: 753ms
  • SPIisLatency: 0ms

If we think about what the individual values mean, we can now reason that the total amount of time spent to get the page (847.47ms), minus the total amount of time spent waiting or processing the server (753ms), should be roughly equal to the amount of time spent “elsewhere” – either in routing, traversing network boundaries, on proxies and firewalls, etc.

So, considering our numbers above, the equation looks like this:

Performance Equation

Based on our equation, this means that approximately (this isn’t exact) 94.47ms of time was spent getting from from the SharePoint Online server to our browser – not too shabby when we consider it.

The Permutations

The numbers could come out a variety of different ways when doing this, so it’s best if we try to establish a general trend. Variability between any two runs can be significant, so it’s in your best interests to conduct a number of runs (maybe a dozen) and come up with some average values.

Regardless of the specific values themselves, there are some general conclusions we draw about each value by itself – and when it is compared to the others.

  • High Total Time. The total end-to-end times can vary dramatically. The examples I’ve got shown thus far demonstrate sub-second latency (i.e., hundreds of milliseconds), and any time you can get values like that, it’s nothing to complain about. When your total round trip time climbs to two or three seconds, your generally still doing pretty good. If you hit five, six, or seven+ seconds, it’s time to move on to what to see what SPRequestDuration, SPIisLatency, and the time-spent-elsewhere values say.
  • High SPIisLatency. If you observe consistently high SPIisLatency values, they point to there being something wrong server-side, since a high SPIisLatency suggests that requests are backing up on the server. Although I’ve never seen it, I believe you could see high SPIisLatency for a brief period of time … but during that time, I’d also expect SharePoint Online to be spinning-up additional WFEs to deal with the effects of high user load. I’ve only ever seen SPIisLatency values in the single digits before, and they’ve never lasted beyond a request or two.
  • High “Time Lost ‘Elsewhere.'” If you crunch the numbers in the performance equation and come up with a significant amount of time being lost “elsewhere,” it suggests that the traffic between SharePoint Online and your computer is being slowed down for some reason. It doesn’t specifically indicate what is causing the slowdown, but the slowdown could be due to any number of network conditions: excessive routing, web proxies, egressing to the Internet out-of-region (a form of excessive routing), firewall issues, or a whole host of other conditions. What represents “excessive” time spent elsewhere? Again, I can only speak to trends here, but I tend not to get too upset about anything under 1s (1000ms) being lost to other factors. When time lost elsewhere grows to be high – especially compared to SPRequestDuration – that’s when I get concerned. For example, an SPRequestDuration of 800ms with a time-lost-elsewhere value of 2500ms makes me wonder what’s happening between SharePoint Online and my computer.
  • High SPRequestDuration. A high SPRequestDuration value can be caused by a variety of factors, and in truth the diagnosis tends to become a bit contentious. Since a high SPRequestDuration means that a page is taking a long time to process on the server, the most common response I frequently encounter (especially among those new to SPO) is that “there’s something wrong with SharePoint Online.” I hate to be the bearer of bad tidings, but repeat after me: “The problem isn’t with SharePoint Online, it’s with my site.” That 9000ms SPRequestDuration probably has very little to do with SPO and everything to do with how you customized SharePoint, your choice of navigation style, the fact that there are two dozen “expensive” web parts on the page, or something related to that. I’m not willing to rule out a problem with a SharePoint Online tenant, but in truth I have yet to encounter it.

What Can I Do About a High SPRequestDuration?

If you don’t believe me and instead feel that the problem is with the SharePoint Online environment, the good news is that there’s an easy way to tell one way or the other … and I highly recommend doing this before calling Microsoft Support (trust me, they’ll thank you for doing so).

Believe it or not, SharePoint Online is also where OneDrive for Business data is stored. A OneDrive for Business page, at its core, is a SharePoint page with nearly no customization. Using someone’s OneDrive for Business page becomes an excellent A/B test when the performance of SharePoint Online page is sub-par. Simply load up their OneDrive for Business page and compare performance numbers to the page in question.

OneDrive for Business Performance

Revisiting my Bunker Tuneage site example, you can see that the OneDrive for Business landing page is served from the same tenant as the earlier page. If I were to compare the SPRequestDuration value of the OneDrive for Business page (223ms) with the SPRequestDuration of the SharePoint page in-question (753ms), I’d note that the values differed … but are they different enough to think something is going awry in the SPO environment?

Roughly half a second (~500ms) is indeed a difference, but it’s not enough for me to think that the online environment has problems. When I see SPRequestDuration values like 9000ms for a SharePoint page but 500ms for OneDrive for Business page, that’s when I begin to suspect something is amiss. And again: with such an extreme disparity in values, SharePoint Online is healthy (500ms), but there’s clearly something wrong with my page (9000ms).

Practical Advice

When it comes to diagnosing the root cause or causes for high SPRequestDuration values, the good news is that there are plenty of fixes that range from the simple to the quite invasive. Microsoft has taken the time to compile some of the more common causes, and I highly encourage you to take a look if you’re interested.

At the end of the day, though, sometimes you just want to know where to begin troubleshooting so that you can focus remediation efforts. If you follow the steps outlined in this blog post, I think you’ll find that the five minutes they take to execute will help to focus you in the right area.

References and Resources

  1. MSDN: Discovering Windows Internet Explorer Developer Tools
  2. Company: Bitstream Foundry
  3. Telerik: Fiddler Web Debugging Proxy
  4. Fiddler: Decrypting HTTPS-protected traffic
  5. Mozilla Developer Network: HTTP Headers
  6. SPO Public Site: Bunker Tuneage Online
  7. Blog Post: Save Your SharePoint Online Public Site from the Chopping Block
  8. Office Support: Tune SharePoint Online Performance

Caching, You Ain’t No Friend Of Mine

I love caching and all that it can do to boost performance, but caching for SharePoint in the cloud isn’t the same as it is on-premises. In this post, I explore why that is for Object Caching – and what you can do about it.

I've got a caching-induced headacheI’m a big fan of leveraging caching to improve performance. If you look over my blog, you’ll find quite a few articles that cover things like implementing BLOB caching within SharePoint, working with the Object Cache, extending your own code with caching options, and more. And most of those posts were written in a time when the on-premises SharePoint farm was king.

The “caching picture” began shifting when we started moving to the cloud. SharePoint Online and hosted SharePoint services aren’t the same as SharePoint on-premises, and the things we rely upon for performance improvements on-premises don’t necessarily have our backs when we move out to the cloud.

Yeah, I’m talking about caching here. And as much as it breaks my heart to say it, caching – you ain’t no friend of mine out in SharePoint Online.

Why the heartbreak?

To understand why a couple of SharePoint’s traditional caching mechanisms aren’t doing you any favors in a multi-tenant service like SharePoint Online (with or without Office 365), it helps to first understand how memory-based caching features – like SharePoint’s Object Cache – work in an on-premises environment.

On-Premises

The typical on-premises environment has a small number of web front-ends (WFEs) serving content to users, and the number of site collections being served-up is relatively limited. For purposes of illustration, consider the following series of user requests to an environment possessing two WFEs behind a load balancer:

On-Premises Request Results

Assuming the WFEs have just been rebooted (or the application pools backing the web applications for target site collection have just been recycled) – a worst-case scenario – the user in Request #1 is going to hit a server (either #1 or #2) that does not have cached content in its Object Cache. For this example, we’ll say that the user is directed to WFE #1. Responses from WFE #1 will be slower as SharePoint works to generate the content for the user and populate its Object Cache. The WFE will then return the user’s response, but as a result of the request, its Object Cache will contain site collection-specific content such as navigational sitemaps, Content Query Web Part (CQWP) query results, common site property values, any publishing page layouts referenced by the request, and more.

The next time the farm receives a request for the same site collection (Request #2), there’s a 50/50 shot that the user will be directed to a WFE that has cached content (WFE #1, shown in green) or doesn’t yet have any cached content (WFE #2). If the user is directed to WFE #1, bingo – a better experience should result. Let’s say the user gets unlucky, though, and hits WFE #2. The same process as described earlier (for WFE #1) ensues, resulting in a slower response to the user but a populated Object Cache on WFE #2.

By the time we get to Request #3, both WFEs have at least some cached content for the site collection being visited and should thus return responses more quickly. Assuming memory pressure remains low, these WFEs will continue to serve cached content for subsequent requests – until content expires out of the cache (forcing a re-fetch and fill) or gets forced out for some reason (again, memory pressure or perhaps an application pool recycle).

Another thing worth noting with on-premises WFEs is that many SharePoint administrators use warm-up scripts and services in their environments to make the initial requests that are described (in this example) by Request #1 and Request #2. So, it’s possible in these environments that end-users never have to start with a completely “cold” WFE and make the requests that come back more slowly (but ultimately populate the Object Caches on each server).

SharePoint Online

Let’s look at the same initial series of interactions again. Instead of considering the typical on-premises environment, though, let’s look at SharePoint Online.

Cloud

The first thing you may have noticed in the diagrams above is that we’re no longer dealing with just two WFEs. In a SharePoint Online tenant, the actual number of WFEs is a variable number that depends on factors such as load. In this example, I set the number of WFEs to 50; in reality, it could be lower or (in all likelihood) higher.

Request #1 proceeds pretty much the same way as it did in the on-premises example. None of the WFEs have any cached content for the target site collection, so the WFE needs to do extra work to fetch everything needed for a response, return that information, and then place the results in its Object Cache.

In Request #2, one server has cached content – the one that’s highlighted in green. The remaining 49 servers don’t have cached content. So, in all likelihood (49 out of 50, or 98%), the next request for the same site collection is going to go to a different WFE.

By the time we get to Request #3, we see that another WFE has gone through the fetch-and-fill operation (again, highlighted in green). But, there’s something else worth noting that we didn’t see in the on-premises environment; specifically, the previous server which had been visited (in Request #1) is now red, not green. What does this mean? Well, in a multi-tenant environment like SharePoint Online, WFEs are serving-up hundreds and perhaps thousands of different site collections for each of the residents in the SharePoint environment. Object Caches do not have infinite memory, and so memory pressure is likely to be a much greater factor than it is on-premises – meaning that Object Caches are probably going to be ejecting content pretty frequently.

If the Object Cache on a WFE is forced to eject content relevant to the site collection a user is trying to access, then that WFE is going to have to do a re-fetch and re-fill just as if it had never cached content for the target site collection. The net effect, as you might expect, is longer response times and potentially sub-par performance.

The Take-Away

If there’s one point I’m trying to make in all of this, it’s this: you can’t assume that the way a SharePoint farm operates on-premises is going to translate to the way a SharePoint Online farm (or any other multi-tenant farm) is going to operate “out in the cloud.”

Is there anything you can do? Sure – there’s plenty. As I’ve tried to illustrate thus far, the first thing you can do is challenge any assumptions you might have about performance that are based on how on-premises environments operate. The example I’ve chosen here is the Object Cache and how it factors into the performance equation – again, in the typical on-premises environment. If you assume that the Object Cache might instead be working against you in a multi-tenant environment, then there are two particular areas where you should immediately turn your focus.

Navigation

By default, SharePoint site collections use structural navigation mechanisms. Structural navigation works like this: when SharePoint needs to render a navigational menu or link structure of some sort, it walks through the site collection noting the various sites and sub-sites that the site collection contains. That information gets built into a sitemap, and that sitemap is cached in the Object Cache for faster retrieval on subsequent requests that require it.

Without the Object Cache helping out, structural navigation becomes an increasingly less desirable choice as site hierarchies get larger and larger. Better options include alternatives like managed navigation or search-driven navigation; each option has its pros and cons, so be sure to read-up a bit before selecting an option.

Content Query Web Parts

When data needs to be rolled-up in SharePoint, particularly across lists or sites, savvy end-users turn to the CQWP. Since cross-list and cross-site queries are expensive operations, SharePoint will cache the results of such a query using – you guessed it – the Object Cache. Query results are then re-used from the Object Cache for a period of time to improve performance for subsequent requests. Eventually, the results expire and the query needs to be run again.

So, what are users to do when they can’t rely on the Object Cache? A common theme in SharePoint Online and other multi-tenant environments is to leverage Search whenever possible. This was called out in the previous section on Navigation, and it applies in this instance, as well.

An alternative to the CQWP is the Content Search Web Part (CSWP). The CSWP operates somewhat differently than the CQWP, so it’s not a one-to-one direct replacement … but it is very powerful and suitable in most cases. Since the CSWP pulls its query results directly from SharePoint’s search index, it’s exceptionally fast – making it just what the doctor ordered in a multi-tenant environment.

Quick note (2/1/2016): Thanks to Cory Williams for reminding me that the CSWP is currently only available to SharePoint Online Plan 2 and other “Plan 3” (e.g., E3, G3) users. Many enterprise customers fall into this bucket, but if you’re not one of them, then you won’t find the CSWP for use in your tenant :-(

There are plenty of good resources online for the CSWP, and I regularly speak on it myself; feel free to peruse resources I have compiled on the topic (and on other topics).

Wrapping-Up

In this article, I’ve tried to explain how on-premises and multi-tenant operations are different for just one area in particular; i.e., the Object Cache. In the future, I plan to cover some performance watch-outs and work-arounds for other areas … so stay tuned!

Additional Reading and References

  1. MSDN: Navigation options for SharePoint Online
  2. MSDN: Using Content Search Web Part instead of Content Query Web Part to improve performance in SharePoint Online
  3. SharePoint Interface: Presentations and Materials

Is a Higher SharePoint Backup Thread Count Better?

Many administrators have noted that SharePoint 2010 allows them to tune the number of threads that can be used for farm backup and restore operations, but very few have played with the settings. In this post, I share some results I compiled while testing the settings in my own environments. I also share the PowerShell script I assembled for my testing so you can tune the backup and restore thread settings in your own SharePoint farm.

Balls of purple, orange and grey yarn or woolScalability in the hardware and software space is all about parallel computing nowadays. Consider our modern hardware: it used to be that all we really cared about was how fast our CPU could run (“how many GHz?”) Now, we care more about how many cores our CPU has, whether or not those cores support Hyper-threading, how many memory channels our CPU has available to it, etc. Scale-out beats scale-up.

The same is largely true in the software space. Most IT folks learned some time ago that “multithreading” and “higher performance” tended to go hand-in-hand or were at least associated in some way. Multiple threads of execution meant better scheduling of limited processor resources and fewer chances that one long-running operation would bottleneck an entire application.

Configuring SharePoint 2010 Farm Backup and Restore

When I first saw the following section in the “Configure Backup Settings” section of SharePoint 2010’s Central Administration site, it brought a big grin to my face:

Thread Configuration

In SharePoint 2007 and earlier, administrators had no real levers to pull to try and tune the performance of farm backup and restore operations. This obviously changed with SharePoint 2010. We were basically being handed a way to adjust those processes as we saw fit – for better or worse.

Strangely enough, though, I never really took the time to explore the impact of those settings in my SharePoint environments. I always left the number of assigned threads for backup and restore operations at three. I would have liked to mess around with the values, but something else was always more important in the grand scheme of things.

Why Now?

I’ve been working on a new “backup tips and tricks” whitepaper, and I found myself looking for backup and restore concerns within the SharePoint platform that I may not have given much attention to in the past. It didn’t take much wading through Central Administration before I once again found myself looking at thread counts for backup and restore operations.

Doing a little bit of Internet (background) research confirmed what I had suspected: no one else had really spent any time on the topic either. In fact, the only “fresh” and non-copyright-infringing material I found came from a Microsoft TechNet post titled Backup and recovery best practices (SharePoint Server 2010) … and to tell you the truth, the following paragraph from the section titled “Configure SharePoint settings for better backup or restore performance” really bugged me:

If you are using the Backup-SPFarm cmdlet, you can use the BackupThreads parameter to specify how many threads SharePoint Server 2010 will use during the backup process. The more threads you specify, the more resources that backup operation will take, but the faster that it will finish, if sufficient resources are available. However, each thread is reported individually in the log files, so using fewer threads makes interpreting the log files easier. By default, three threads are used. The maximum number of threads available is 10.

Without an understanding of how multithreading (in general) and SharePoint backup (specifically) work, this could easily be interpreted as follows:

The greater the number of threads you assign, the faster your backups will complete.

I realize that my summary is an oversimplification, but I believe that many administrators see the TechNet paragraph as I summarized it. And that concerns me.

I’ve always told people that increasing the backup thread count could yield better performance, but any adjustments would need to be tested in the target farm where they are to be implemented. Realistically speaking, there are several participants and a lot of moving parts in any SharePoint farm backup. Besides the SharePoint server where the backup operation is being coordinated, there is the performance of one or more SQL Servers to consider. The capabilities and restrictions of the backup destination location (typically a UNC file share) also need to be factored-in since that destination is being written to by both the SharePoint Server and one or more SQL Servers.

Setting the number of backup threads to 10 on a SharePoint Server of infinite capability and resources doesn’t guarantee a fast backup, because the farm might have a slow SQL Server, a less-capable backup destination location, a slow or congested network, or a host of other complicating factors.

Oh Yeah? Prove It.

Of course, all of this is just a bunch of hand-waving without proof. So, the scientist in me (yeah, I actually used to be a chemist) decided to take over and devise a series of simple tests to see if there is any real weight to the arguments I’ve been making.

I began with the hypothesis that the easiest and most visible way to gauge the performance of a farm backup operation is to measure how long a backup takes to run; e.g., a farm backup that takes 10 minutes to run is faster than a backup that takes 20 minutes to run if farm content, hardware, configuration, and other factors remain constant. Since SharePoint 2010 provides the ability to specify anywhere from one to 10 backup threads, running a series of backups where the only variable is backup thread count should determine if greater or fewer backup threads yield better performance.

You might recall that I also mentioned that farm topology is a factor in the overall backup equation. As part of my experiment, I decided to run the tests on two different farms I have available to me. General descriptions for each farm:

  • Single-Server Farm: my single server farm environment is a VM running on my laptop. The VM houses SharePoint, SQL Server, and the backup location being targeted. The laptop hardware is a Core-i7 quad-core processor, and the underlying storage for the VM is a solid-state drive (SSD). Hardware bottlenecks should be minimized, and network latency isn’t a factor since backup operations are conducted against a local drive within the VM.
  • Multi-Server Farm: my multi-server environment is the “production” environment on my home network. It consists of a SharePoint Server VM running on a Hyper-V host that also hosts other VMs. The SQL Server instance backing the farm is a non-virtualized SQL Server housing all of the SharePoint databases as well as a few databases for other applications. The backup destination location is a virtualized file server with a pass-through drive array (eSATA with RAID-5). Overall hardware, in this case, is “okay” but obviously not dedicated purely to SharePoint. In addition, network latency and bandwidth (GbE) are also in-play as potential sources of impact.

These two environments have pretty different overall topologies, and it was my hope that I’d see some effect on the performance numbers as a result.

The Script

To run the tests reproducibly, I needed a PowerShell script. So, I put the following script together while I had a bit of free time one night. Feel free to pluck this out to use for testing in your SharePoint environment, as well.

[sourcecode language=”powershell”]
<#
.SYNOPSIS
TestBackupThreads.ps1
.DESCRIPTION
This script is used to conduct and time a series of backups using different thread counts.
The output can then be used to make an educated decision on the number of backup threads to
assign for use in farm-level backups.
.NOTES
Author: Sean McDonough
Last Revision: 25-July-2012
.PARAMETER TestLocation
A UNC path to a location that can be used to create test backup sets
.EXAMPLE
TestBackupThreads \\FileShare\TestLocation
#>
param
(
[string]$TestLocation = "$(Read-Host ‘UNC path to test backup location [e.g. \\FileShare\TestLocation]’)"
)

function TestThreads($backupLocation)
{
# Ensure that the SharePoint cmdlets are loaded before continuing
$spCmdlets = Get-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction silentlycontinue
if ($spCmdlets -eq $Null)
{ Add-PSSnapin Microsoft.SharePoint.PowerShell }

# Setup some variables we’ll need for execution.
$threadTimes = @{} # Hash table to hold timing results
$backupItems = Join-Path $backupLocation "spbr*" # Used to delete temp backup files

# We need to execute a full farm backup for each thread count 1 through 10
Clear-Host
Write-Host "`nBackup thread count testing process beginning."
for ($threads = 1; $threads -lt 11; $threads++)
{
# Clean out any backup contents from the test location
Remove-Item $backupItems -recurse

# Grab the starting date/time (for later comparison), kick-off a farm backup, and then
# grab the stop date/time.
Write-Host "`nInitiating a backup with $threads thread(s) …"
$startPoint = Get-Date
Backup-SPFarm -BackupMethod Full -Directory $backupLocation -BackupThreads $threads
$stopPoint = Get-Date

# Store and report results
$keyName = "Backup with {0} thread(s)" -f $threads
$elapsedSeconds = "{0:N0}" -f ($stopPoint – $startPoint).TotalSeconds
$threadTimes[$keyName] = $elapsedSeconds
Write-Host "Backup with $threads thread(s) complete"
Write-Host ("- time to complete (in seconds): {0}" -f $elapsedSeconds)
}

# Do a final sweep of the test backup location to clean out backup items
Remove-Item $backupItems -recurse

# Dump the results sorted in order of quickest to longest
Write-Host "`nBackup thread count testing process complete."
$threadTimes.GetEnumerator() | Sort-Object Value

# Abort script processing in the event an exception occurs.
trap
{
Write-Warning "`n*** Script execution aborting. See below for problem encountered during execution. ***"
$_.Message
break
}
}

# Launch script
TestThreads $TestLocation
[/sourcecode]

The script is fairly straightforward in what it does. You supply a TestLocation parameter to specify where farm backup test data should be written to, and the script will run a series of full farm backups using the supplied location as the backup destination. The script starts with a full backup using one backup thread; at the end of each full farm backup, the script notes how long the backup took (in seconds) and cleans-up the contents of the TestLocation folder. The number of backup threads is then incremented, and the next test is run. When the script has completed running all backup tests, it sorts the results from “quickest backup” (i.e., the backup thread count requiring the least amount of time) to the slowest backup.

Test Results

I ran a series of three tests for each of the aforementioned environments for a total of six total test runs. Although there’s still quite a bit of variability between individual results within a backup thread series, some trends did appear to emerge.

Single-Server Farm

Backup Times for the Single-Server Environment

With the single-server environment, increasing the number of backup threads did appear to have a directional impact on performance. A single backup thread proved to be the slowest option for the farm backup, and “greater than one” thread resulted in better performance.

If you look at the average values, though, there wasn’t a tremendous difference between the slowest thread count (410 seconds for one thread) and the fastest (388 seconds for 10 threads). We’re only talking about a 5% to 6% difference overall. To truly find the optimum number of backup threads in an environment like this would require more than three test runs to account for standard deviation and establish significance.

Oh, and for those that might be wondering: I’m sure I introduced some of my own variability into the results. Although I didn’t do anything processor or disk intensive during the test runs, I didn’t go out of my way to minimize the impact of services, background operations, etc. To repeat: more testing (with better controls) would be needed for truly conclusive results. The only thing I started to show with this particular set of tests is that multithreading seemed to improve backup performance.

Multi-Server Farm

Things got quite a bit more interesting (to me) when I switched over to multi-server farm testing.

Backup Times for the Multi-Server Environment

In the multi-server environment, the average for using just one backup thread (1413 seconds) appeared to be significantly faster than the next best option (1747 seconds for seven backup threads) – in the neighborhood of 20% or so faster. Just like the single-server results, additional trials would be needed to completely validate the observations, but the results are less ambiguous (given the relatively greater precision of the samples) than with the single-server runs.

Do you find this surprising? Given my multi-server environment and what I know about it, I can’t really say that I was caught flat-footed by the results. Going into the tests, my hypothesis was that my backup destination location would likely be the “weak link” in my overall farm and backup topology. The SharePoint Server was doing well, the SQL Server was relatively robust … but all of that backup activity was hard on my (virtualized) file server. Multiple servers trying to write to the backup location were swamping it and the network, and adding additional backup threads to the mix didn’t end up helping or improving the overall backup process.

The Take-Away

At the end of the day, I recognize that these tests of mine didn’t prove anything conclusively. Frankly, conclusive proof wasn’t my goal. The intent of these experiments wasn’t to say “more threads are better” or “more threads are worse.”

The only point I’m making (I hope) by sharing these results is this: until you run some real tests of your own in your SharePoint environment, you really don’t know where your backup thread sweet spot is. You can try to guess it, but it’s just a guess. And guessing is really no better than simply leaving the backup thread count set to its default value of three.

References and Resources

  1. Wikipedia: Parallel Computing
  2. Wikipedia: Hyper-threading
  3. Wikipedia: Thread (computing) and Multithreading
  4. TechNet: Backup and recovery best practices (SharePoint Server 2010)

Do You Know What’s Going to Happen When You Enable the SharePoint BLOB Cache?

The SharePoint BLOB Cache can be a very powerful tool for use in improving farm performance and scalability, but some planning should take place before the BLOB Cache is enabled. In this post, I explain how end users can suffer if BLOB Cache planning isn’t performed. I also make some recommendations on how to configure the BLOB Cache to provide administrators with performance benefits that don’t come at the cost of a negative end user experience.

The topic of the SharePoint BLOB Cache and how it operates jumped back into the front of my brain recently given some conversations I’ve had and things I’ve seen (e.g., a promising CodePlex project called the SharePoint 2010 BlobCache Manager).

SharePoint PSA

"Just Do It" Post-It NoteThis post is my way of doing something akin to a SharePoint public service announcement. I’ve recently seen some caching-related functionality and topics – especially the BLOB Cache – getting some real traction in different circles, and I think that the attention and love is generally a good thing. I am somewhat concerned, though, by the fact that the discussions and projects that have been surfacing don’t seem to say much beyond the Post-It on the right.

What do I mean by “Just do-it?” Well, here’s the high-level summary of what I’ve been seeing people say, post, and practice with the SharePoint BLOB Cache:

  • The SharePoint BLOB Cache can lighten the load on your SQL Servers by caching BLOB (binary large object) data such as images, video, audio, CSS, etc., on your web front-ends (WFEs)
  • BLOB assets are then served directly from the WFEs. This prevents regular round trips from the WFEs to SQL Servers for every BLOB item needed, and this conserves network bandwidth and reduces SQL Server load.
  • To realize the benefits of the BLOB Cache, simply turn it on and you’re good to go. Nothing to it!

To be fair, I think that I’ve done a disservice by contributing to the perception that all you need to do to kick-start BLOB caching is change this web.config line …

[sourcecode language=”xml”]
<BlobCache location="C:\BlobCache\14" path="\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$" maxSize="10" enabled="false" />
[/sourcecode]

… to this:

[sourcecode language=”xml”]
<BlobCache location="C:\BlobCache\14" path="\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$" maxSize="10" enabled="true" />
[/sourcecode]

If you look closely, you’ll see that the only difference between the two XML elements is that the enabled attribute is changed from false to true in the second example.

As you might have guessed, I wouldn’t be writing this blog post if simply changing the BlobCache element’s enabled attribute to true didn’t cause potential problems.

The Small Print

Disclaimer text that includes some BLOB cache usage warningsAt the recent SPTechCon in San Francisco, I gave a five-minute lightning talk called Pushing SharePoint’s ‘Go Faster’ Button. It was a lighthearted look at SharePoint performance, and it focused on a couple of caching changes that could be easily implemented to improve SharePoint performance. One of the recommended changes was (surprise surprise) to simply “turn on” SharePoint’s BLOB Cache.

I only had five minutes to deliver the lightning talk, so I had to cram all of the disclaimers for what I was recommending into the legal style slide that appears on the left. Although the slide got a chuckle from the crowd (the print did look pretty small on-screen), I actually did invest some time in its warnings and watch-outs for anyone who wanted to go and dig them up later.

Of the two tips I delivered in the lightning talk, Tip #2 dealt with the SharePoint BLOB cache. I included a very specific warning in the “Disclaimer of Liability” aimed at those who sought to simply “set it and forget it.” The text of that warning read:

Failure to specify a max-age attribute in the BlobCache element of the web.config will result in the default value of 86,400 seconds (24 hours) being used. Use of a non-zero max-age attribute will result in the attachment of client-side cacheability headers to assets that are being BLOB cached, and such headers can result in BLOB assets being cached on the client beyond the duration of the current user session; such caching can easily result in "stale" BLOB resources being used from the client rather than newer ones being fetched from the WFE, so adjust max-age values carefully.

Put another way: if you simply enable the BLOB cache and do nothing else, your users may be getting a SharePoint behavior change that you hadn’t intended for them to have.

Why Did You Have To Bring Age Into This?

The sticking point with SharePoint’s default BlobCache element and attribute settings is that a max-age of 24 hours is assumed and used when the max-age attribute isn’t explicitly specified or set. What does that mean? I wrote a separate post a while back titled Client-Server Interactions and the max-age Attribute with SharePoint BLOB Caching, and that post addressed the effect that explicit and implicit max-age attribute value specifications have on BLOB Caching. I recommend checking out the post for the full background; for anyone who needs a quick summary, though, I can distill it down to two bullet points:

  • Enabling the BLOB Cache without specifying a max-age attribute means that BLOBs will be cached on both the WFEs in your farm and within users’ browser caches (through the use of Cache-Control HTTP headers).
  • In collaboration environments and anyplace else where BLOB assets may be edited or turn over frequently (within the course of a day), the default client-side caching behavior can mess with the UI/UX of your SharePoint site in all sorts of interesting ways.

What does this mean for the average user of SharePoint? Well, let me walk through a fictitious scenario with supporting detail – as told from the perspective of a SharePoint end user. If you already understand the problem, you’re short on time, and you want to get right to what I recommend, jump down to the “Recommendations Before You Enable the BLOB Cache” section.

Acme Online Goes Live!

Welcome to the Acme Corporation! The Acme Corporation recently completed a “webification” of its entire product catalog, and the end result is a publishing site collection that is implemented in SharePoint 2010. The site collection houses all of Acme’s products, and those products are available for the public to browse and order. Acme’s web content management team is responsible for maintaining the product catalog as it appears on the site, and that team is led by a crafty old fellow named Wile E. Coyote (who we’ll simply refer to as “Wiley” from here on out).

Wiley has many years of experience with Acme’s products and has tried nearly all of them personally; he’s something of a legend. He and his team worked diligently to get Acme’s products into SharePoint before the launch. Not all of the products made it into SharePoint before the launch, though, so a phased approach was taken to rolling out the entire catalog.

The Launch

A SharePoint article page featuring a bundle of dynamiteThe first products that Wiley and his team worked to get into SharePoint were Acme’s line of explosives. To prepare for the launch of the new online catalog, Wiley wrote up an article on Acme’s top-selling “Bundle o’ Dynamite” product. The article featured a picture of the Bundle o’ Dynamite, along with some descriptive text about the product, how it operates, a few safety warnings, and a couple of other informational points. When Wiley finished, a mockup of the article page looked like the screenshot seen on the left.

A Fiddler trace of the first request for the dynamite article pageUnbeknownst to Wiley, the Acme product catalog site collection is served-up by one Web application through one zone (the Default zone) on one WFE. This means that all product catalog requests, whether they come from customers or Wiley’s team, go to one IIS site on one server. The first time that someone (or more specifically, someone’s browser) requests the article page that Wiley put together, a series of web requests are kicked-off to pull down the page content, images, scripts, CSS, and everything else needed to render the page in a browser. This series of interactions (captured using Fiddler) is shown on the top right.

A Fiddler trace of the second request for the dynamite article pageSubsequent requests for the same article page (within the context of a single browser session) will follow the series of interactions seen directly to the right. One thing that you may notice upon inspecting the Fiddler trace is that subsequent page requests result in fewer calls back to the server. This is because SharePoint applies per session caching to many of the items it passes back to the browser, and this caching (which is not the same as BLOB caching) removes the need for constant re-fetching of items that haven’t changed.

In both of the Fiddler traces above, the focus is on the newsarticleimage.jpg file  – the file which houses a picture of the Bundle o’ Dynamite. The first time the browser requests the image within a session, a successful HTTP 200 response is returned to the browser along with the image. Also important to note is the Cache-Control header that comes back with the image:

[sourcecode language=”text”]
Cache-Control: private,max-age=0
[/sourcecode]

The private part of the Cache-Control header tells the client browser to cache the image locally for the duration of the browser session. The max-age=0 portion says, in effect, that subsequent uses of the image by the browser (from its cache) should be validated with a call back to the WFE to ensure that the image hasn’t changed.

And that’s what is shown happening in the second Fiddler trace. When subsequent page requests attempt to use the image, a GET request from the browser is answered by the WFE with

[sourcecode language=”text”]
HTTP/1.1 304 NOT MODIFIED
[/sourcecode]

This response code tells the browser that the image hasn’t changed and that it’s safe to use the locally cached copy. If the image were to change, then an HTTP 200 would be returned instead and the new/updated version of the image would be sent to the browser.

When the browser is closed, the locally cached copy of the image is flushed and the process begins anew the next time the browser opens.

Meep Meep

Not long after the launch of Acme’s online product catalog, customers began complaining that browsing the catalog was simply too slow. After some discussion, Management decided to bring in Roadrunner Consulting to assess the site and make suggestions that would improve performance.

Roadrunner’s team raced around (as they are wont to do), ran some tests, made some observations, and provided a list of suggestions. At the top of the list was “Implement SharePoint BLOB Caching.”

So, Acme’s SharePoint administrators jumped right in and turned on BLOB caching. Since the site is served up through a single IIS site (SharePoint zone), the admins set enabled=“true” in the BlobCache element of the site’s web.config file. No other changes were made to the BlobCache element.

So, what happened? Well, things got snappier! The administrators watching their back-end performance noticed that the file system on the WFE started to cache BLOBs that were being requested by users. Each request to the WFE for one of those BLOBs resulted in the BLOB being served back directly from the WFE without a round-trip to the SQL Server. Internal network bandwidth utilization dropped significantly, and the SQL Servers started breathing a bit easier. The administrators were most definitely happy with the change they’d made … and it was as easy as setting enabled=”true” in the BlobCache element of the web.config file. Talk about the greatest thing since sliced bread! Everyone exchanged a round of high-fives after the change was made, and talks of how the geeks would rise up to dominate the world resumed.

Dynamite Article Page - First Request with BLOB Caching enabledSo, how do things look on the client side after enabling the BLOB Cache? Well, when someone goes to retrieve Wiley’s article for the first time, the first browser request series for the page looks much like it did without the BLOB Cache enabled. See the Fiddler trace on the right.

There is one very important difference when retrieving items with the BLOB Cache enabled, though, and you have to look closely to see it. Do you see the Cache-Control HTTP header that is returned with the request for the newsarticleimage.jpg image? It’s different than it was before the BLOB Cache was enabled. Now it says

[sourcecode language=”text”]
Cache-Control: public, max-age=86400
[/sourcecode]

Whoa … what does this mean? Well, it means two important things. First, the public designation means that when the image is cached by the browser, it will no longer be private to the current session. It can be re-used across sessions, so it won’t necessarily “go away” when the browser is closed.

Second, the max-age=86400 means that the image will continue to “live” in the browser’s cache for 86400 seconds, or 24 hours. For that period of time, the browser won’t even attempt to contact the WFE to see if the image has changed; it will just use the copy that it holds onto. Nothing short of a browser cache flush (which is manual intervention by the user) will change this behavior.

Dynamite Article Page - Subsequent page requests with BLOB Caching enabledAnd that’s what we see with the Fiddler trace on the right. This trace represents what subsequent page requests look like for the next 24 hours. Notice that the newsarticleimage.jpg image doesn’t get re-requested or checked. There are no HTTP 304 response codes coming back, because the browser simply isn’t requesting the image; it’s using its cached copy.

Admittedly, the Fiddler trace will look a little different when the browser is closed and re-opened … but a re-fetch of the newsarticleimage.jpg file will not take place for a full 24 hours unless a user clears the browser cache.

What does this change in behavior mean for actual users of the site? Read on to find out …

Running Off the Edge of the Cliff

The corrected article page showing the TNT barrelShortly after the BLOB Cache changes were made, Wiley got an (unrelated) call from the Fulfillment Department. They were furious because they’d been getting all sorts of returns for the Bundle o’ Dynamite. The reason for the returns? It’s because Wiley put the wrong image in his article page!

Even though Acme sells a product called the “Bundle o’ Dynamite,” the actual product that ships is a barrel of TNT. Since the product image was wrong, customers were incorrectly concluding that they’d get several sticks of dynamite instead of a barrel, and this was rubbing many of them the wrong way. Who knew?

Wiley went out to SharePoint, checked the article that he wrote, and saw that he did indeed use a series of dynamite sticks for an image. The page should have actually appeared as it does in the screenshot that is above and to the left. After a quick facepalm, Wiley realized that he needed to make a change – and fast.

Wiley went out to the Publishing Images library for the site collection and uploaded a new version of the newsarticleimage.jpg image file – one that contained a barrel of TNT instead of a bundle of dynamite. He then browsed to the article page and did a refresh.

Nothing changed.

Wiley hit F5 in his browser. Still nothing changed.

Over the course of the hour that followed, Wiley grew increasingly more bewildered and panicked as he tried in vain to get the new TNT barrel to show up on the article page. He uploaded the image several more times, closed and re-opened his browser, deleted and then reloaded the image, re-published and re-approved the actual article page, and even got the administrators to flush the SharePoint BLOB Cache. None of the actions made a difference.

The Coyote Never Wins

Why didn’t any of Wiley’s efforts make a difference? Because what Wiley didn’t understand was that there was nothing he could do short of flushing his cache that would prompt the browser to re-request the updated image. The browser started using the cached copy of the image after the first request Wiley made in the morning; i.e., the request to verify that the image on the page was incorrect as Fulfillment indicated. For another 24 hours (86400 seconds), the browser would continue to use the cached image.

Wiley’s image problem was just one of the potential issues that might surface as a result of the BLOB Cache change. It was also one of the more visible problems. In looking at the path attribute of the BlobCache element, you might have noticed some of the other file types that got cached by default – file types with js (JavaScript) and css (Cascading Style Sheets) extensions, for example. Any of those file types which were served from site collection lists and libraries would also be impacted by the “fetch once and use for 24 hours” behavior.

Recommendations Before You Enable the BLOB Cache

A frustrated end userI hope the example featuring Wiley did an adequate job of explaining why I think that blindly turning on the BLOB Cache can be a bad thing for end users. Having seen first-hand what an improperly configured BLOB Cache can do to the user experience, I’d like to offer up a handful of suggestions based on my own experience.

1. Don’t just “enable” the BLOB Cache with its out-of-the-box (OOTB) default settings. There are a couple of OOTB settings that you should really think hard about changing. I mentioned the default max-age value you get if you don’t actually specify the attribute value. I’m going to talk more about that one in a bit. Also: do you really want the BLOB Cache using your system drive (C:) as its target location for cached files? Most admins I know aren’t particularly friendly with that idea, so relocate the BLOB Cache to another drive.

2. If your Web application has only one zone (i.e., the Default zone), strongly consider specifying a max-age attribute value of zero (max-age=”0”). Why do I say this? Because it avoids the situation I described with Wiley above, and it’s a compromise that gives administrators some of the performance boosts they seek without completely shafting users in the process.

Dynamite Article Page - max-age = 0 in effectWhen the BLOB Cache is enabled and a max-age attribute value of 0 is explicitly specified, things change a bit. BLOB caching and offloading still happens on the WFEs, so administrators get the internal performance boosts they were probably seeking in the first place. On the other side of the equation (i.e., the “user side”), persistent client side caching ceases as shown on the left. Although the Cache-Control header still specifies public cacheability, the max-age=0 ensures that the browser will round-trip to the server each time it intends to use a locally cached resource to ensure that the most up-to-date copy of the resource is in the cache. This will keep users like Wiley from going off the deep end due to the wonky and inconsistent user experience that afflicts users who need to edit and proof a site that employs persistent client-side caching.

3. If you have a Web application that is extended to two or more zones, apply BLOB Cache settings that are appropriate for each zone. This is relatively common in public-facing SharePoint site collections and Web applications where anonymous access is in-use. In these particular scenarios, there are usually at least two SharePoint zones per Web application: an internal zone (typically the Default zone) through which editors and other users may authenticate to carry out content work, and an external zone (e.g., the Internet zone) which is set up for anonymous access and “external consumption.”

In this dual-zone scenario, it makes sense to configure each zone (IIS site) differently since usage patterns differ between zones. The BlobCache element in the web.config for the internal (Default) zone, for example, should probably be configured according to #2 (above – the one zone scenario with a max-age attribute value of zero). For the web.config that is used in the external zone, though, it may make sense to apply a non-zero max-age value for use with the BLOB Cache – especially since anonymous users aren’t (normally) content editors. A non-zero max-age means fewer trips (overall) to your WFEs from outside the LAN environment, and this helps to keep bandwidth utilization on your Internet connection. There is still a risk that external users may see “stale” content, but the impact is generally more acceptable for straight viewers since they aren’t actively working on content.

4. Consider changing the path expression to restrict what goes into the BLOB Cache. The default path expression for SharePoint 2010’s BlobCache element looks like this:

[sourcecode language=”text”]
\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$
[/sourcecode]

Most administrators are savvy enough to add and remove file extensions from this expression as needed; for example, taking |wmv out of the path expression means that the BLOB Cache will no longer store and serve files with a .wmv extension. Adding and removing extensions really only scratches the surface of what can be done, though. The path attribute value is actually a regular expression, so the full power of regular expressions can be applied to select and exclude files for use with the BLOB Cache.

Suppose you want to explicitly control which images, videos, and other files (that match the list of extensions) end up in the BLOB Cache? Maybe you want to specially name files you intend to cache with an additional .cache extension before the actual file type extension (e.g., .gif). To accomplish this, you could change the path expression to this:

[sourcecode language=”text”]
\.cache\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$
[/sourcecode]

With this path expression, filenames like these would be included in the BLOB Cache:

  • SampleImage.cache.jpg
  • MyVideo.cache.wmv

… but anything without the additional .cache qualifier would get omitted, such as:

  • AnotherImage.jpg
  • ExcludeThisVideo.wmv

This is just a simple example, but hopefully it gives you an idea of what you could do with the path regular expression to control the contents of the BLOB Cache.

Summing It Up

The SharePoint BLOB Cache is a powerful mechanism to improve farm performance and scalability, but it shouldn’t be turned on without some forethought and a couple of changes to the default BlobCache element attribute values.

If you are an administrator and have enabled the BLOB Cache with its default values, check with your users. They might have some feedback for you …

Additional Reading and Resources

  1. CodePlex: SharePoint 2010 BlobCache Manager
  2. Event: SPTechCon San Francisco 2012
  3. Prezi: Pushing SharePoint’s ‘Go Faster’ Button
  4. Blog Post: Client-Server Interactions and the max-age Attribute with SharePoint BLOB Caching
  5. Tool: Fiddler Web Debugging Proxy

SharePoint Summer Fun

This post covers my summer SharePoint activities, including a number of appearances at SharePoint Saturday events and SPUGs. I also talk about a few other tidbits, including an appearance on Microsoft’s Talk TechNet broadcast.

My family recently relocated from the west side of Cincinnati to the east side, and it’s been a major undertaking – as anyone who’s familiar with Jim Borgman’s comic series on the east and west sides of Cincinnati can appreciate. Between the move and some other issues, I had planned on taking it easy with SharePoint activities for a while.

Despite that goal, it seems I still have a handful of SharePoint-related things planned this summer. Here’s what’s going on.

Office Web Apps’ Cache Article

Idera SharePoint SmartsAs a product manager for Idera, I occasionally author articles for the company’s SharePoint Smarts e-newsletter. A couple of weeks back, I wrote an article titled Quick Tips for Managing the SharePoint 2010 Office Web Apps’ Cache. The article basically provides an overview of the Office Web Apps’ cache and how it can be maintained for optimal performance.

The main reason I’m calling the article out here (in my blog) is because I put together a couple of PowerShell scripts that I included in the article. The first script relocates the Office Web Apps’ cache site collection to a different content database for any given Web application. The second script displays current values for some common cache settings and gives you the opportunity to change them directly.

The scripts (and article contents) are helpful for anyone trying to manage the Office Web Apps in SharePoint 2010. Check them out!

Talk TechNet Appearance

On Wednesday, July 6th (tomorrow!), I’ll be on Talk TechNet with Keith Combs and Matt Hester. I’m going to be talking with Keith and Matt about SharePoint, disaster recovery, and anything else that they want to shoot the breeze about. 60 minutes seems like a long time, but I know how quickly it can pass once my mouth starts going …

Here’s the fun part (for you): the episode is presented live, and anyone who registers for the event can “call in” with questions, comments, etc. Feel free to call in and throw me a softball question … or heckle me, if that’s your style! Although I don’t know Keith personally (yet), I do know Matt – and knowing Matt, things will be lighthearted and lively.

Evansville SPUG

On Thursday the 7th (yeah, this is a busy week), I’ll be heading down to Evansville, Indiana, to speak at the Evansville user group. This is something that Rob Wilson and I have been discussing for quite some time, and I’m glad that it’s finally coming to fruition!

I’ll be presenting my SharePoint 2010 and Your DR Plan: New Capabilities, New Possibilities! session. The abstract reads as follows:

Disaster recovery planning for a SharePoint 2010 environment is something that must be performed to insure your data and the continuity of business operations. Microsoft made significant enhancements to the disaster recovery landscape with SharePoint 2010, and we’ll be taking a good look at how the platform has evolved in this session. We’ll dive inside the improvements to the native backup and restore capabilities that are present in the SharePoint 2007 platform to see what has been changed and enhanced. We’ll also look at the array of exciting new capabilities that have been integrated into the SharePoint 2010 platform, such as unattended content database recovery, SQL Server snapshot integration, and configuration-only backup and restore. By the time we’re done, you will possess a solid understanding of how the disaster recovery landscape has changed with SharePoint 2010.

It’ll be a bit of a drive from here to Evansville and back, but I’m really looking forward to talking shop with Rob and his crew on Thursday!

SharePoint Saturday New York City (SPSNYC)

SPS New York City LogoI’ll be heading up to New York City at the end of the month to present at SharePoint Saturday New York City on July 30th. I’ll be presenting SharePoint 2010 and Your DR Plan: New Capabilities, New Possibilities! session, and it should be a lot of fun.

Amazingly enough, the primary registration (400 seats) for the event “sold out” in a little over three days. Holy smokes – that’s fast! The event is now wait listed, so if you haven’t yet signed up … you probably won’t get a spot  :-(

CincySPUG

On August 4th, I’ll be heading back up to Mason, Ohio, to present for my friends at the Cincinnati SharePoint User Group. My presentation topic this time around will be “Caching-In” for SharePoint Performance. Here’s the abstract:

Caching is a critical variable in the SharePoint scalability and performance equation, but it’s one that’s oftentimes misunderstood or dismissed as being needed only in Internet-facing scenarios. In this session, we’ll build an understanding of the caching options that exist within the SharePoint platform and how they can be leveraged to inject some pep into most SharePoint sites. We’ll also cover some sample scenarios, caching pitfalls, and watch-outs that every administrator should know.

Like most of my presentations, this one started as a PowerPoint. I converted it over to Prezi format some time ago, and I’ve been having a lot of fun with it since. I hope the CincySPUG folks enjoy it, as well!

SharePoint Saturday The Conference (SPSTC)

SPSTC LogoIf you haven’t heard of SharePoint Saturday The Conference yet, then the easiest way for me to describe is this way: it’s a SharePoint Saturday event on steroids. Instead of being just one Saturday, the event is three days long. Expected attendance is 2500 to 3000 people. It’s going to be huge.

I submitted a handful of abstracts for consideration, and I know that I’ll be speaking at the event. I just don’t know what I’ll be talking about at this point.  If you’re going to be in the Washington, DC area on August 11th through 13th, though, consider signing up for the conference!

SharePoint Saturday Columbus (SPSColumbus)

SPS Columbus LogoThe 2nd SharePoint Saturday Columbus event will be held on August 20th, 2011, at the OCLC Conference Center in Columbus, Ohio. Registration is now open, and session submissions are being accepted through the end of the day tomorrow (7/6).

Along with Brian Jackett, Jennifer Mason, and Nicola Young, I’m helping to plan and execute the event on the 20th. I’m handling speaker coordination again this year – a role that I do enjoy! We’ve had a number of great submissions thus far; in the next week or so, we (the organizing committee) will be putting our heads together to make selections for the event. Once those selections have been made, I’ll be communicating with everyone who submitted a session.

If you live in Ohio and don’t find Columbus to be an exceptionally long drive, I encourage you to head out to the SharePoint Saturday site and sign up for the event. It’s free, and the training you’ll get will be well-worth the Saturday you spend!

Additional Reading and References

  1. Jim Borgman: East Side/West Side of Cincinnati comic series
  2. Company: Idera
  3. Article: Quick Tips for Managing the SharePoint 2010 Office Web Apps’ Cache
  4. Event: Talk TechNet Webcast, Episode 43
  5. Blog: Keith Combs
  6. Blog: Matt Hester
  7. User Group: Evansville SPUG site
  8. Blog: Rob Wilson
  9. Event: SharePoint Saturday New York City
  10. User Group: CincySPUG site
  11. Software/Service: Prezi
  12. Event: SharePoint Saturday The Conference
  13. Event: SharePoint Saturday Columbus
  14. Blog: Brian Jackett
  15. Blog: Jennifer Mason
  16. Twitter: Nicola Young
%d bloggers like this: