Launching Your SPO Site or Portal
In this short post I cover the SharePoint Online (SPO) Launch Scheduling Tool and why you should get familiar with it before you launch a new SPO site or portal.
In this short post I cover the SharePoint Online (SPO) Launch Scheduling Tool and why you should get familiar with it before you launch a new SPO site or portal.
In this brief post, I talk about my first in-person event (SPFest Chicago) since COVID hit. I also talk about and include a recent interview with the M365 Developer Podcast.
If you need the what’s what on CDNs (content delivery networks), this is a bit of quick reading that will get you up to speed with what a CDN is, how to configure your SPO tenant to use a CDN, and the benefits that CDNs can bring.
Since I’m taking the time to write about the topic, you can safely guess that yes, CDNs make a difference withSPO page operations. In many cases, proper CDN configuration will make a substantial difference in SPO page performance. So enable CDN use NOW!
Knowing that some folks simply want the answer up-front, I hope that I’ve satisfied their curiosity. The rest of this post is dedicated to explaining content delivery networks (CDNs), how they operate, and how you can easily enable them for use within your SharePoint Online (SPO) sites.
Let me first address a misconception that I sometimes encountered among SPO administrators and developers (including some MVPs) – that being that CDNs don’t really “do a whole lot” to help site and/or page performance. Sure, usage of a CDN is recommended … but a common misunderstanding is that a CDN is really more of a “nice-to-have” than “need-to-have” element for SPO sites. Of the people saying such things, oftentimes that judgment comes without any real research, knowledge, or testing. Skeptics typically haven’t read the documentation (the “non-RTFM crowd”) and haven’t actually spent any time profiling and troubleshooting the performance of SPO sites. Since I enjoy addressing perf. problems and challenges, I’ve been fortunate to experience firsthand the benefits that CDNs can bring. By the end of this post, I hope I’ll have made converts of a CDN skeptic or two.
A CDN is a Content Delivery Network. There are a lot of (good) web resources that describe and illustrate what CDNs are and how they generally operate (like this one and this one), so I’m not going to attempt to “add value” with my own spin. I will simply call attention to a couple of the key characteristics that we really care about in our use of CDNs with SPO.
If you didn’t know about CDNs prior to this post, or didn’t understand how they could help you, I hope you’re beginning to see the possibilities!
It wasn’t all that long ago that Microsoft was a bit more “modest” in its use of CDNs. Microsoft certainly made use of them, but prior to the implementation of its own content delivery networks, Microsoft frequently turned to a company called Akamai for CDN support.
When I first started presenting on SharePoint and its built-in caching mechanisms, I often spoke about Akamai and their edge network when talking about BLOB caching and how the max-age cache-control header could be configured and misconfigured. Back then, “Akamai” was basically synonymous with “CDN,” and that’s how many of us thought about the company. They were certainly leading the pack in the CDN service space.
Back then, if you were attempting to download a large file from Microsoft (think DVD images, ISO files, etc.), then there was a good change that the download link your browser would receive (from Microsoft’s servers) would actually point to an Akamai edge node near your location geographically instead of a Microsoft destination.
Fast forward to today. In addition to utilizing third-party CDNs like those deployed by Akamai, Microsoft has built (and is improving) their own first-party CDNs. There are a couple of benefits to this. First, many data regulations you may be subject to that prevent third-party housing of your data (yes, even in temporary locations like a CDN) can be largely avoided. In the case of CDNs that Microsoft is running, there is no hand-off to a third party and thus much less practical concern regarding who is housing your data.
Second, with their own CDNs, Microsoft has a lot more latitude and ability to extend the specifics of CDN configuration and operation its customers. And that’s what they’ve done with the Office 365 CDN.
Now we’re talking! This next part is particularly important, and it’s what drove the creation of this post. It’s also the one bit of information that I promised Scott Stewart at Microsoft that I would try to get “out in the wild” as quickly and as visibly as possible.
So, if you remember nothing else from this post,please remember this:
Set-SPOTenantCdnEnabled -CdnType Public -Enable $true
That is the line of PowerShell that needs to be executed (against your SPO tenant, so you need to have a connection to your tenant established first) to enable transparent CDN support for public files. Run that, and non-sensitive files of public origin from SPO will begin getting cached in a CDN and served from there.
The line of PowerShell I shared goes through the SharePoint Online Management Shell – something most organizations using SPO (and their admins in particular) have installed somewhere.
It is also possible to enable CDN support if you’re using the PNP PowerShell module, if that’s your preference, by executing the following PowerShell:
Set-PnPTenantCdnEnabled -CdnType Public -Enable $true
No matter how you enable the CDN, it should be noted that the PowerShell I’ve elected to share (above) enables CDN usage for files of public origin only. It is easy enough to alter the parameters being passed in our PowerShell command so as to cover all files, public and private, by switching -CdnType to Both (with the SPO management shell) or executing another line of PowerShell after the first that swaps –type Public with –type Private (in the case of the SharePointPnP PowerShell module).
The reason I chose only public enablement is because your organization may be bound by restrictions or policies that prohibit or limit CDN use with private files. This is discussed a bit in the O365 CDN post originally cited, but it’s best to do your own research.
Enabling CDN support for public files, however, is considered to be safe in general.
I’ve got a series of images that I use to illustrate performance improvements when files are served via CDN instead of SPO list/library, and those files are from Microsoft. Thankfully, MS makes the images I tend to use (and a discussion of them) free available, and they are presented at this link for your reading and reference.
The example that is called out in the link I just shared involves offloading of the jQuery JavaScript library from SPO to CDN. The real world numbers that were captured reduced fetch-and-load time from just over 1.5 seconds to less than half a second (<500ms). That is no small change … and that’s for just one file!
I guess “Secret” is technically the wrong choice of term here. A more accurate description would be to say that I seldom hear or see anyone talking about another CDN benefit I consider to be very important and significant. That benefit, quite simply, involves improving file fetching and retrieval parallelism when a web page and associated assets (CSS, JS, images, etc.) are requested for download by your browser. In plain English: CDNs typically improve file downloading by allowing the browser to issue a greater number of concurrent file requests.
To help with this concept and its explanation, I’ve created a couple of diagrams that I’ll share with you. The first one appears below, and it is meant to represent the series of steps a browser might execute when retrieving everything needed to show a (SharePoint/SPO) page. As we’ve talked about, what is commonly thought of as a single page in a SharePoint site is, more accurately, a page containing all sorts of dependent assets: image files, JavaScript files, cascading style sheets, and a whole bunch more.
A request for a SharePoint page housed at http://www.thesite.com might start out with one request, but your browser is going to need all of the files referenced within the context of that page (default.aspx, in our case) to render correctly. See below:
To get what’s needed to successfully render the example SharePoint page without CDN support, we follow the numbers:
You might be wondering, “Only six files at a time? Really? Why the limitation?” Well, I should start by saying the limit is probably six … maybe a bit more, perhaps a bit less. It depends on the browser you’re using what the specific number is. There was a good summary answer on StackOverflow to a related (but slightly different) question that provides some additional discussion.
Section eight (8) of the HTTP specification (RFC 2616) specifically addresses HTTP connections, how they should be handled, how proxies should be negotiated, etc. For our purposes, the practical implementation of the HTTP specification by modern browsers generally limits the number of concurrent/active connections a browser can have to any given host or URL to six (6).
Notice how I worded that last sentence. Since you folks are smart cookies, I’ll bet you’re already thinking “Wait a minute. CDNs typically have different URLs/hosts from the sites they cache” and you’re imaging what happens (or can happen) when a new source (i.e., different host/URL) is introduced.
This illustration roughly outlines the fetch process when a CDN is involved:
Steps one (1) through four (4) of the fetch process with a CDN are basically still the same as was illustrated without a CDN a bit earlier. When the page is served-up in step three (3) and returned in step four (4), though, there are some differences and additional activity taking place:
Since we’re dealing with two different URLs/hosts in our CDN example (http://www.thesite.com and cdn.source.com), our original six (6) file concurrent download limitation transforms into a 12 file limitation (two hosts serving six files a time, 2 x 6 = 12).
Whether or not the CDN-based process is ultimately faster than without a CDN depends on a great many factors: your Internet bandwidth, the performance of your computer, the complexity/structure of the page being served-up, and more. In the majority of cases, though, at least some performance improvement is observed. In many cases, the improvement can be quite substantial (as referenced and discussed earlier).
Additional Note: 8/24/2020
In a bit of laziness on my part, I didn’t do a prior article search before writing this post. As fate would have it, Bob German (a friend and fellow MVP – well, he was an MVP prior to joining Microsoft a couple of years back) wrote a great post at the end of 2017 that I became aware of this morning with a series of tweets. Bob’s post is called “Choosing a CDN for SharePoint Client Solutions” and is a bit more developer-oriented. That being said, it’s a fantastic post with good information that is a great additional read if you’re looking for more material and/or a slightly different perspective. Nice work, Bob!
Post Update: 8/26/2020
Anders Rask was kind enough to point out that the PnP PowerShell line I originally had listed wasn’t, in fact, PnP PowerShell. That specific line of PowerShell has since been updated to reflect the correct way of altering a tenant’s CDN with the PnP PowerShell cmdlets. Many thanks for the catch, Anders!
So, to sum-up: enable CDN use within your SPO tenant. The benefits are compelling!
Microsoft released the second iteration of its Page Diagnostics Tool for SharePoint. If you have an SPO site, you NEED this tool in your toolbox!
Last week, on Wednesday, September 18th, 2019, Microsoft released the second iteration of its Page Diagnostics Tool for SharePoint. An announcement was made, and the Microsoft Docs site was updated, but the day passed with very little fanfare in most circles.
“The One Ring” by Mateus Amaral is licensed under CC BY-NC-ND 4.0
In my opinion, there should have been fireworks. Lots of fireworks.
If you’re not familiar with the Page Diagnostics Tool for SharePoint, then I need to share a little history on how I came to be “meet” this tool.
Back in 2018, the SharePoint Conference North America (SPCNA) was rebooted after having been shutdown as part of Microsoft’s consolidation of product-specific conferences a number of years earlier. I had the good fortune of making the cut to deliver a couple of sessions at the conference: “Making the Most of OneDrive for Business and SharePoint Online” and “Understanding and Avoiding Performance Pitfalls with SharePoint Online.”
Sometime in the months leading up to the conference, I received an email from out-of-the-blue from a guy named Scott Stewart – who at the time was a Senior Program Manager for OneDrive and SharePoint Engineering. In the email, Scott introduced himself, what he did in his role, and suggested that we collaborate together for the performance session I was slated to deliver at SPCNA.
I came to understand that Scott and his team were responsible for addressing and remedying many of the production performance issues that arose in SharePoint Online (SPO). The more that Scott and I chatted, the more it sounded like we were preaching many of the same things when it came to performance.
One thing Scott revealed to me was that at the time, his team had been working on a tool to help diagnose SPO performance issues. The tool was projected to be ready around the time that SPCNA was happening, so I asked him if he’d like to co-present the performance session with me and announce the tool to an audience that would undoubtedly be eager to hear the news. Thankfully, he agreed!
Scott demo’d version one (really it was more like a beta) during our talk, and the demo demons got the better of him … but shortly after the conference, v1.0 of the tool went live and was available to download as a Chrome browser extension.
Simply put, the Page Diagnostics Tool for SharePoint analyzes your browser’s interaction with SPO and points out conditions and configurations that might be adversely affecting your page’s performance.
The first version of the tool only worked for classic publishing pages. And as a tool, it was only available as a Google Chrome Extension:
The second iteration of the tool that was released last Thursday addresses one of those limitations: it analyzes both modern and classic SharePoint pages. So, you’re covered no matter what’s on your SPO site.
For one thing, the tool can get you the metrics I’ve highlighted that are relevant to diagnosing basic page performance issues – most notably, SPRequestDuration and SPIisLatency. But it can do so much more than that!
Many of the adverse performance conditions and scenarios I’ve covered while speaking and in blog posts (such as this one here) are analyzed and called-out by the tool, as well as many other things/conditions, such as navigational style used, whether or not content deployment networks (CDNs) are used by your pages, and quite a few more.
And finally, the tool provides a simple mechanism for retrieving round-trip times for pages and page resource requests. It eliminates the need to pull up Fiddler or your browser’s debug tools to try and track down the right numbers from a scrolling list of potentially hundreds of requests and responses.
It’s easy, but I’ll summarize it for you here.
1. Open the Chrome Web Store. Currently, the extension is only available for Google Chrome. Open Chrome and navigate to https://chrome.google.com/webstore/search/sharepoint directly or search for “SharePoint” in the Chrome Web Store. However you choose to do it, you should see the Page Diagnostics Tool for SharePoint entry within the list of results as shown below.
2. Add the Extension to Chrome. Click the Add to Chrome button. You’ll be taken directly to the diagnostic tool’s specific extension page, and then Chrome will pop up a dialog like the one seen below. The dialog will describe what the tool will be able to do once you install it, and yes: you have to click Add Extension to accept what the dialog is telling you and to actually activate the extension in your browser.
3. Navigate to a SharePoint Online page to begin diagnosing it. Once you’ve got the extension installed, you should have the following icon in the tool area to the right of the URL/address bar in Chrome:
To illustrate how the tool works, I navigated to a modern Communication Site in my Bitstream Foundry tenant:
I then clicked on the SharePoint Page Diagnostics Tool icon in the upper right of the browser (as shown above). Doing so brings up the Page Diagnostics dialog and gives me some options:
Kicking off an analysis of the current page is as simple as clicking the Start button as shown above. Once you do so, the page will reload and the Tool dialog will change several times over the course of a handful of seconds based on what it’s loading, analyzing, and attempting to do.
When the tool has completed its analysis and is ready to share some recommendations, the dialog will change once again to show something similar to what appears below.
Right off the bat, you can see that the Page Diagnostics Tool supplies you with important metrics like the SPRequestDuration and SPIIsLatency – two measures that are critical to determining where you might have some slowdown as called out in a previous blog post. But the tool doesn’t stop there.
The tool does many other things – like look at the size of your images, whether or not you’re using structural navigation (because structural navigation is oh so bad for your SPO site performance), if you’re using content delivery networks (CDNs) for frequently used scripts and resources, and a whole lot more.
Let’s drill into one of the problem items it calls out on one of my pages:
The tool explains to me, in plain English, what is wrong: Large images detected. An image I’m using is too large (i.e., larger than 300KB). It supplies the URL of the image in question so that I’m not left wondering which image it’s calling out. And if I want to know why 300KB is special or simply learn about the best way to handle images in SharePoint Online, there’s a Learn More link. Clicking that link takes me to this page in Microsoft Docs:
Targeted and detailed guidance – exactly what you need in order to do some site fixup/cleanup in the name of improving performance.
There’s more that the tool can do – like provide round trip times for pages and assets within those pages, as well as supply a couple of data export options if you want to look at the client/server page conversation in a tool that has more capabilities.
As a one-stop shop tool, though, I’m going to basically start recommending that everyone with an SPO site start downloading the tool for use within their own tenants. There is simply no other tool that is easier and more powerful for SharePoint Online sites. And the price point is perfect: FREE!
The next time you see Scott Stewart, buy him a beer to thank him for giving us something usable in the fight against poorly performing SPO sites.
In this post, I’ll show you how to obtain page performance core metrics from Modern SharePoint Online pages. It’s easier and more reliable than trying to obtain the same data from classic pages.
It was quite some time ago that I wrote my Five-Minute Page Performance Troubleshooting Guide for SharePoint Online – a little over a year-and-a-half ago, actually. Since that time, SharePoint Online (SPO) has continued to evolve relentlessly. In fact, one slide I’ve gotten into the habit of showing during my SPO talks and presentations is the following:
The slide usually gets the desired response of laughter from attendees, but it’s something I feel I have to say … because like so many things that seem obvious, there’s some real life basis for the inclusion of the slide:
The exchange shown above was the result of someone commenting on a post I had shared about limitations I was running into with the SharePoint App Model. The issue didn’t have a solution or workaround at the time I’d written my post, but Microsoft had addressed it sometime later.
In any case, I realize that much of what I share has a “born on date,” for lack of a better label. I’ll continue to share information; just note when something was written.
End of (slight) rant. Back to the real topic of this post.
Since I had written the previous performance article, Microsoft’s been working hard to complete the transition to Modern SharePoint in SPO. I feel it’s a solid move on their part for a variety of reasons. Modern pages (particularly pages in communication sites) are much more WYSIWYG in nature, and SharePoint Framework (SPFx) web parts on modern pages make a whole lot of sense from a scalability perspective; after all, why assume load on the server (with classic web parts) when you can push the load to the client and use all the extra desktop/laptop power?
As good as they are, though, modern pages don’t obey the standard response header approach to sharing performance metrics. But not to worry: they do things more consistently and reliably (in my opinion).
SPRequestDuration (the amount of time the server spent processing the page request) and (SP)IISLatency (the amount of time the page request waited on the server before getting processed) are critical to know when trying to diagnose potential page performance issues. Both of these are reported in milliseconds and give us some insight into what’s happening on the server-side of the performance equation.
Instead of trying to convey these values with response headers (as classic pages do – most of the time), modern pages share the same data within the body of the page itself.
Consider the following page modern page:
If this were a classic publishing page and we wanted to get the (SP)IISLatency and SPRequestDuration, we’d need to use our browser’s <F12> dev tools or something like Fiddler.
For modern pages, things are easier. We turn instead to the page source – not the response headers. Grab the page source (by right-clicking and selecting View page source) …
… and you’ll see something like the following:
Now, I’ll be the first to admit that you’ve got to have some sense of what you’re seeking within the page source – there’s a lot of stuff to parse through. Doing a simple <CTRL><F> search for iislatency or requestduration will land you on the content of interest. We’re interested in the metrics reported within the perf section:
The content of interest will be simple text, but the text is a JSON object that can be crunched to display values that are a bit easier to read:
The other thing you’ll notice is that a lot of additional metrics are reported along with the page processing metrics we’ve been looking at. In a future post, I’ll try to break some of these down for you.
“Modern” is the future of SharePoint Online. If you haven’t yet embraced modern lists and pages, consider dipping your toe in the waters. As we’ve seen in this post, Modern also makes it easier to obtain performance metrics for our pages – something that will make page performance troubleshooting significantly more predictable and consistent.
I regularly hear from SharePoint Online customers that their pages are slow … but they don’t know where to start troubleshooting. Is it the SPO servers? The network? Their page(s)? In this post, I’ll show you how to determine the general source of your slow pages in five minutes or less. It won’t solve your slow page(s) problem, but it will give you enough direction to know where to focus further analysis.
UPDATE (3/20/2018): As most of you who have been following-along in your own tenants know, this issue wasn’t actually truly resolved last September. For a while, in some cases, it looked like the SPIisLatency and SPRequestDuration headers came back. But the victory was fleeting, and since that time I’ve continued to get comments from people saying “but I don’t see them!” And while I had the headers for a while in my tenant, I haven’t seen them in any predictable fashion.
The good news is that after much hounding and making myself a royal pain-in-the-tuckus to Bill Baer and others at Microsoft, it looks like we FINALLY have the right engineering and dev teams engaged to look at this. We got traction on it this week, with multiple repro scenarios and Fiddler traces being passed around … so I’m truly hopeful we’ll see something before long. Stay tuned!
UPDATE (9/2/2017): As I was preparing slides for my IT/DevConnections talks, I decided to check on the issue of the missing Page Response Headers (SPIisLatency and SPRequestDuration). I went through three different tenants and several pages, and I’m happy to report that the headers now appear to be showing consistently. My thanks to Microsoft (I’ll credit Chris McNulty and Bill Baer – I had been pestering them) for rectifying the situation!
“Why is it so slow?” That’s how nearly every performance conversation I’ve ever had begins.
No one likes a slow intranet page, and everyone expects the intranet to just “come up” when they pop the URL into their favorite browser. From an end-user’s perspective, it doesn’t matter what’s happening on the back-end as long as the page appears quickly when someone tries to navigate to it.
SharePoint Online is a big black box to many of its users and consumers. They don’t understand what it takes to build an intranet, nor should they have to. The only thing that really matters to them is that they can bring up a browser, type in a URL, and quickly arrive at a landing page. The burden of ensuring that the site is optimized for fast loading falls to the folks in IT who are supposed to understand how everything works.
If you’re one of those folks in IT who is supposed to understand how everything works with SharePoint Online but doesn’t, then this blog post is for you. Don’t worry – I know there’s a lot to SharePoint Online, but performing some basic troubleshooting analysis for slow pages in SharePoint Online is pretty straightforward. I’ll share with you a handful of techniques to quickly ascertain if the reason for your slow pages is due to the content within the pages themselves, if the issue is network-related, or if there might be something else happening that is beyond your control.
The first step in your performance troubleshooting adventure begins by opening up your browser from a client workstation. Everyone has a favorite browser, but I’m going to use and recommend Internet Explorer for this exercise because it has a solid set of development tools to assist you in finding and quantifying performance issues. In particular, it is able to chronologically list and detail the series of interactions that take place between your browser and the SharePoint Online web front-ends (WFEs) that are responding to your requests.
When recommending IE, some people ask “how come you don’t use Fiddler?” It’s a good question, and when I first started showing people how to do some quick troubleshooting, I’d do so with Fiddler. If you’re just starting out, though, Fiddler comes with one really big gotcha: operating inside an SSL tunnel. To get Fiddler (which is a transparent proxy) working with SSL, there is some non-trivial setup required involving certificate trusts. Since this is intended to be a quick and basic troubleshooting exercise, I figure it’s better to sidestep the issue altogether and use IE (which requires no special setup).
To make this work, let us assume that I am attempting to profile the Bitstream Foundry (my company) intranet home page in order to understand how well it works – or doesn’t. My intranet home page is pretty plain by most intranet standards (remember: I’m a developer and IT Pro – not a designer), but it’s sufficient for purposes of discussion.
I start by opening Internet Explorer and navigating to the Bitstream Foundry intranet home page at https://bitstreamfoundry.sharepoint.com. Once I move past the sign-in prompts, I’m shown my home page:
My home page has very little on it right now (I’m still trying to decide what would go best in the main region), but it is a SharePoint Online (SPO) page and it does work as a target for discussion purposes.
Accessing the developer tools within Internet Explorer is extremely simple: either press F12, or go to the browser’s gear icon and select F12 Developer Tools from the drop-down that appears as seen below:
Doing either of these will pop-open the developer tools as either a stand-alone window or as a pane on the lower half of the browser as shown below:
When the developer tools first open, they’re commonly set to viewing the page structure on the DOM Explorer tab. For purposes of this troubleshooting exercise, we need to be on the Network tab so we can profile each of the calls the browser makes to the SPO WFE.
Select the Network tab and then select the “Always refresh from server” button as highlighted below in red.
The Network tab is going to allow us to capture the series of exchanges between the SharePoint WFE and our browser as the browser fetches the elements needed to render the page. The “Always refresh from server” button is going to remove client-side caching from the picture by forcing the browser to always re-fetch all referenced content – even if it has a valid copy of one or more assets in the browser cache. This helps to achieve a consistent set of timing values between calls, and it’s also going simulate someone’s first-time visit to the page (which typically takes longer than subsequent visits) more accurately.
The next step is to capture the series of exchanges between IE and SPO. To do this, simply refresh the page by pressing the browsers Refresh button, pressing , or going to the browser’s address bar and re-issuing the page request.
The contents of the window on the Network tab will clear, and as content begins to flow into the browser, entries will appear on the screen. For every request that IE makes of SharePoint Online, a new line/entry will appear. It will probably take a handful of seconds to retrieve all page assets, and it’s not uncommon for a SharePoint page to have upwards of 75 to 100 resources (or more) to load.
Strictly speaking, you shouldn’t have to stop the capture once the page has loaded, but there are several reasons why you would want to. First, you will eventually retrieve all SharePoint assets necessary to render the page. If you continue to capture beyond this point, you’ll see the number of requests (represented in the bottom bar of the browser – the number is 83 requests in the screenshot above) continue to tick up. It will slowly go up over time and it’s not due to the contents of the SharePoint page – it’s due to Office 365.
If you look at the last entry in the screenshot above, you’ll see that it’s a request to https://outlook.office365.com/owa. In short: this is due to a background process that allows Exchange to notify you when you receive new messages and calendar/event notifications. See how the Protocol and Result/Description columns indicate a (Pending) state?
If you get to this point and additional SharePoint elements are no longer loading, press the red “recording stop” button in the toolbar of Network tab. This will stop the capture. Not only does this help to keep the captured trace “cleaner,” but it also prevents excessive distortion of certain values – like overall time to load and the graphical representation of the page load (shown on the far right of the Network tab) as shown below.
At this point, you should have a populated Network tab with the entire dialog of requests that were needed to render your page. Of these requests, the overwhelming majority of them will be for JavaScript files (.js), cascading stylesheets (.css), and images (.png, .gif, and .jpg). Only one of them will be for the actual SharePoint page itself (.aspx) … and, of course, this is the request that you need to find in the list.
My intranet home page is named Home.aspx (as can be seen in the browser address bar), so I need to find the request for Home.aspx on the Network tab. I got lucky with this dialog attempt, because Home.aspx is the first entry listed. Note that this isn’t always the case, and it’s not uncommon to find your page request 10 or 20 down in the list.
When you locate the entry in the list for your .aspx page, click on it to select it. You can confirm that you’ve selected the right entry by verifying Request URL on the Headers tab to the right of the various requests listed for the exchange with SPO (highlighted in the image above).
At this point, we need to shift our focus to the HTTP Response Headers that are passed back with the content of the page. Much like the request headers that the browser sends to the server to provide information about the request being made, the response headers that are sent from the server supply the browser with all sorts of additional information about the page. This can include the size of the page (Content-Length), the payload (Content-Type), whether or not the page can be cached (Cache-Control), and more.
Making sure that you have the Headers tab selected, locate and record the three response headers as shown below:
The three values we want to record are:
We can infer a great deal about the page processing and network traversal of our page request with just these three values and a final number.
A quick note (2017-07-06): For some reason, a variety of SharePoint Online sites have been returning pages without the SPIisLatency and SPRequestDuration headers lately. I don’t know why this is happening, and I’ve reached out to Microsoft to see if it’s a bug or part of some larger strategy. I don’t think it’s deliberate, because the headers provide some of the only insight end-users can get into SharePoint Online page performance. When I hear something from the product team, I’ll post it here!
So, you’ve now got three numbers – two of which are helpful for page profiling (SPIisLatency and SPRequestDuration), and a third number (X-SharePointHealthScore) which will tell you how stressed the server was when it served your page. What can you do with them? As it turns out, quite a bit when you combine two of the three with a fourth number.
What is the fourth number? It’s the total trip time that is reported for the page being loaded, and it represents the elapsed time from the point at which the page was requested until the time when the last byte of the page was delivered. For example, I profiled my Bunker Tuneage site. It’s a SharePoint Online site (yes, I know – I have to get it moved to another location soon), so it makes a good target for analysis:
In the above example, the three numbers we’re most interested in are:
If we think about what the individual values mean, we can now reason that the total amount of time spent to get the page (847.47ms), minus the total amount of time spent waiting or processing the server (753ms), should be roughly equal to the amount of time spent “elsewhere” – either in routing, traversing network boundaries, on proxies and firewalls, etc.
So, considering our numbers above, the equation looks like this:
Based on our equation, this means that approximately (this isn’t exact) 94.47ms of time was spent getting from from the SharePoint Online server to our browser – not too shabby when we consider it.
The numbers could come out a variety of different ways when doing this, so it’s best if we try to establish a general trend. Variability between any two runs can be significant, so it’s in your best interests to conduct a number of runs (maybe a dozen) and come up with some average values.
Regardless of the specific values themselves, there are some general conclusions we draw about each value by itself – and when it is compared to the others.
If you don’t believe me and instead feel that the problem is with the SharePoint Online environment, the good news is that there’s an easy way to tell one way or the other … and I highly recommend doing this before calling Microsoft Support (trust me, they’ll thank you for doing so).
Believe it or not, SharePoint Online is also where OneDrive for Business data is stored. A OneDrive for Business page, at its core, is a SharePoint page with nearly no customization. Using someone’s OneDrive for Business page becomes an excellent A/B test when the performance of SharePoint Online page is sub-par. Simply load up their OneDrive for Business page and compare performance numbers to the page in question.
Revisiting my Bunker Tuneage site example, you can see that the OneDrive for Business landing page is served from the same tenant as the earlier page. If I were to compare the SPRequestDuration value of the OneDrive for Business page (223ms) with the SPRequestDuration of the SharePoint page in-question (753ms), I’d note that the values differed … but are they different enough to think something is going awry in the SPO environment?
Roughly half a second (~500ms) is indeed a difference, but it’s not enough for me to think that the online environment has problems. When I see SPRequestDuration values like 9000ms for a SharePoint page but 500ms for OneDrive for Business page, that’s when I begin to suspect something is amiss. And again: with such an extreme disparity in values, SharePoint Online is healthy (500ms), but there’s clearly something wrong with my page (9000ms).
When it comes to diagnosing the root cause or causes for high SPRequestDuration values, the good news is that there are plenty of fixes that range from the simple to the quite invasive. Microsoft has taken the time to compile some of the more common causes, and I highly encourage you to take a look if you’re interested.
At the end of the day, though, sometimes you just want to know where to begin troubleshooting so that you can focus remediation efforts. If you follow the steps outlined in this blog post, I think you’ll find that the five minutes they take to execute will help to focus you in the right area.
I love caching and all that it can do to boost performance, but caching for SharePoint in the cloud isn’t the same as it is on-premises. In this post, I explore why that is for Object Caching – and what you can do about it.
The “caching picture” began shifting when we started moving to the cloud. SharePoint Online and hosted SharePoint services aren’t the same as SharePoint on-premises, and the things we rely upon for performance improvements on-premises don’t necessarily have our backs when we move out to the cloud.
Yeah, I’m talking about caching here. And as much as it breaks my heart to say it, caching – you ain’t no friend of mine out in SharePoint Online.
To understand why a couple of SharePoint’s traditional caching mechanisms aren’t doing you any favors in a multi-tenant service like SharePoint Online (with or without Office 365), it helps to first understand how memory-based caching features – like SharePoint’s Object Cache – work in an on-premises environment.
The typical on-premises environment has a small number of web front-ends (WFEs) serving content to users, and the number of site collections being served-up is relatively limited. For purposes of illustration, consider the following series of user requests to an environment possessing two WFEs behind a load balancer:
Assuming the WFEs have just been rebooted (or the application pools backing the web applications for target site collection have just been recycled) – a worst-case scenario – the user in Request #1 is going to hit a server (either #1 or #2) that does not have cached content in its Object Cache. For this example, we’ll say that the user is directed to WFE #1. Responses from WFE #1 will be slower as SharePoint works to generate the content for the user and populate its Object Cache. The WFE will then return the user’s response, but as a result of the request, its Object Cache will contain site collection-specific content such as navigational sitemaps, Content Query Web Part (CQWP) query results, common site property values, any publishing page layouts referenced by the request, and more.
The next time the farm receives a request for the same site collection (Request #2), there’s a 50/50 shot that the user will be directed to a WFE that has cached content (WFE #1, shown in green) or doesn’t yet have any cached content (WFE #2). If the user is directed to WFE #1, bingo – a better experience should result. Let’s say the user gets unlucky, though, and hits WFE #2. The same process as described earlier (for WFE #1) ensues, resulting in a slower response to the user but a populated Object Cache on WFE #2.
By the time we get to Request #3, both WFEs have at least some cached content for the site collection being visited and should thus return responses more quickly. Assuming memory pressure remains low, these WFEs will continue to serve cached content for subsequent requests – until content expires out of the cache (forcing a re-fetch and fill) or gets forced out for some reason (again, memory pressure or perhaps an application pool recycle).
Another thing worth noting with on-premises WFEs is that many SharePoint administrators use warm-up scripts and services in their environments to make the initial requests that are described (in this example) by Request #1 and Request #2. So, it’s possible in these environments that end-users never have to start with a completely “cold” WFE and make the requests that come back more slowly (but ultimately populate the Object Caches on each server).
Let’s look at the same initial series of interactions again. Instead of considering the typical on-premises environment, though, let’s look at SharePoint Online.
The first thing you may have noticed in the diagrams above is that we’re no longer dealing with just two WFEs. In a SharePoint Online tenant, the actual number of WFEs is a variable number that depends on factors such as load. In this example, I set the number of WFEs to 50; in reality, it could be lower or (in all likelihood) higher.
Request #1 proceeds pretty much the same way as it did in the on-premises example. None of the WFEs have any cached content for the target site collection, so the WFE needs to do extra work to fetch everything needed for a response, return that information, and then place the results in its Object Cache.
In Request #2, one server has cached content – the one that’s highlighted in green. The remaining 49 servers don’t have cached content. So, in all likelihood (49 out of 50, or 98%), the next request for the same site collection is going to go to a different WFE.
By the time we get to Request #3, we see that another WFE has gone through the fetch-and-fill operation (again, highlighted in green). But, there’s something else worth noting that we didn’t see in the on-premises environment; specifically, the previous server which had been visited (in Request #1) is now red, not green. What does this mean? Well, in a multi-tenant environment like SharePoint Online, WFEs are serving-up hundreds and perhaps thousands of different site collections for each of the residents in the SharePoint environment. Object Caches do not have infinite memory, and so memory pressure is likely to be a much greater factor than it is on-premises – meaning that Object Caches are probably going to be ejecting content pretty frequently.
If the Object Cache on a WFE is forced to eject content relevant to the site collection a user is trying to access, then that WFE is going to have to do a re-fetch and re-fill just as if it had never cached content for the target site collection. The net effect, as you might expect, is longer response times and potentially sub-par performance.
If there’s one point I’m trying to make in all of this, it’s this: you can’t assume that the way a SharePoint farm operates on-premises is going to translate to the way a SharePoint Online farm (or any other multi-tenant farm) is going to operate “out in the cloud.”
Is there anything you can do? Sure – there’s plenty. As I’ve tried to illustrate thus far, the first thing you can do is challenge any assumptions you might have about performance that are based on how on-premises environments operate. The example I’ve chosen here is the Object Cache and how it factors into the performance equation – again, in the typical on-premises environment. If you assume that the Object Cache might instead be working against you in a multi-tenant environment, then there are two particular areas where you should immediately turn your focus.
By default, SharePoint site collections use structural navigation mechanisms. Structural navigation works like this: when SharePoint needs to render a navigational menu or link structure of some sort, it walks through the site collection noting the various sites and sub-sites that the site collection contains. That information gets built into a sitemap, and that sitemap is cached in the Object Cache for faster retrieval on subsequent requests that require it.
Without the Object Cache helping out, structural navigation becomes an increasingly less desirable choice as site hierarchies get larger and larger. Better options include alternatives like managed navigation or search-driven navigation; each option has its pros and cons, so be sure to read-up a bit before selecting an option.
When data needs to be rolled-up in SharePoint, particularly across lists or sites, savvy end-users turn to the CQWP. Since cross-list and cross-site queries are expensive operations, SharePoint will cache the results of such a query using – you guessed it – the Object Cache. Query results are then re-used from the Object Cache for a period of time to improve performance for subsequent requests. Eventually, the results expire and the query needs to be run again.
So, what are users to do when they can’t rely on the Object Cache? A common theme in SharePoint Online and other multi-tenant environments is to leverage Search whenever possible. This was called out in the previous section on Navigation, and it applies in this instance, as well.
An alternative to the CQWP is the Content Search Web Part (CSWP). The CSWP operates somewhat differently than the CQWP, so it’s not a one-to-one direct replacement … but it is very powerful and suitable in most cases. Since the CSWP pulls its query results directly from SharePoint’s search index, it’s exceptionally fast – making it just what the doctor ordered in a multi-tenant environment.
Quick note (2/1/2016): Thanks to Cory Williams for reminding me that the CSWP is currently only available to SharePoint Online Plan 2 and other “Plan 3” (e.g., E3, G3) users. Many enterprise customers fall into this bucket, but if you’re not one of them, then you won’t find the CSWP for use in your tenant :-(
There are plenty of good resources online for the CSWP, and I regularly speak on it myself; feel free to peruse resources I have compiled on the topic (and on other topics).
In this article, I’ve tried to explain how on-premises and multi-tenant operations are different for just one area in particular; i.e., the Object Cache. In the future, I plan to cover some performance watch-outs and work-arounds for other areas … so stay tuned!
Many administrators have noted that SharePoint 2010 allows them to tune the number of threads that can be used for farm backup and restore operations, but very few have played with the settings. In this post, I share some results I compiled while testing the settings in my own environments. I also share the PowerShell script I assembled for my testing so you can tune the backup and restore thread settings in your own SharePoint farm.
The same is largely true in the software space. Most IT folks learned some time ago that “multithreading” and “higher performance” tended to go hand-in-hand or were at least associated in some way. Multiple threads of execution meant better scheduling of limited processor resources and fewer chances that one long-running operation would bottleneck an entire application.
When I first saw the following section in the “Configure Backup Settings” section of SharePoint 2010’s Central Administration site, it brought a big grin to my face:
In SharePoint 2007 and earlier, administrators had no real levers to pull to try and tune the performance of farm backup and restore operations. This obviously changed with SharePoint 2010. We were basically being handed a way to adjust those processes as we saw fit – for better or worse.
Strangely enough, though, I never really took the time to explore the impact of those settings in my SharePoint environments. I always left the number of assigned threads for backup and restore operations at three. I would have liked to mess around with the values, but something else was always more important in the grand scheme of things.
I’ve been working on a new “backup tips and tricks” whitepaper, and I found myself looking for backup and restore concerns within the SharePoint platform that I may not have given much attention to in the past. It didn’t take much wading through Central Administration before I once again found myself looking at thread counts for backup and restore operations.
Doing a little bit of Internet (background) research confirmed what I had suspected: no one else had really spent any time on the topic either. In fact, the only “fresh” and non-copyright-infringing material I found came from a Microsoft TechNet post titled Backup and recovery best practices (SharePoint Server 2010) … and to tell you the truth, the following paragraph from the section titled “Configure SharePoint settings for better backup or restore performance” really bugged me:
If you are using the Backup-SPFarm cmdlet, you can use the BackupThreads parameter to specify how many threads SharePoint Server 2010 will use during the backup process. The more threads you specify, the more resources that backup operation will take, but the faster that it will finish, if sufficient resources are available. However, each thread is reported individually in the log files, so using fewer threads makes interpreting the log files easier. By default, three threads are used. The maximum number of threads available is 10.
Without an understanding of how multithreading (in general) and SharePoint backup (specifically) work, this could easily be interpreted as follows:
The greater the number of threads you assign, the faster your backups will complete.
I realize that my summary is an oversimplification, but I believe that many administrators see the TechNet paragraph as I summarized it. And that concerns me.
I’ve always told people that increasing the backup thread count could yield better performance, but any adjustments would need to be tested in the target farm where they are to be implemented. Realistically speaking, there are several participants and a lot of moving parts in any SharePoint farm backup. Besides the SharePoint server where the backup operation is being coordinated, there is the performance of one or more SQL Servers to consider. The capabilities and restrictions of the backup destination location (typically a UNC file share) also need to be factored-in since that destination is being written to by both the SharePoint Server and one or more SQL Servers.
Setting the number of backup threads to 10 on a SharePoint Server of infinite capability and resources doesn’t guarantee a fast backup, because the farm might have a slow SQL Server, a less-capable backup destination location, a slow or congested network, or a host of other complicating factors.
Of course, all of this is just a bunch of hand-waving without proof. So, the scientist in me (yeah, I actually used to be a chemist) decided to take over and devise a series of simple tests to see if there is any real weight to the arguments I’ve been making.
I began with the hypothesis that the easiest and most visible way to gauge the performance of a farm backup operation is to measure how long a backup takes to run; e.g., a farm backup that takes 10 minutes to run is faster than a backup that takes 20 minutes to run if farm content, hardware, configuration, and other factors remain constant. Since SharePoint 2010 provides the ability to specify anywhere from one to 10 backup threads, running a series of backups where the only variable is backup thread count should determine if greater or fewer backup threads yield better performance.
You might recall that I also mentioned that farm topology is a factor in the overall backup equation. As part of my experiment, I decided to run the tests on two different farms I have available to me. General descriptions for each farm:
These two environments have pretty different overall topologies, and it was my hope that I’d see some effect on the performance numbers as a result.
To run the tests reproducibly, I needed a PowerShell script. So, I put the following script together while I had a bit of free time one night. Feel free to pluck this out to use for testing in your SharePoint environment, as well.
[sourcecode language=”powershell”]
<#
.SYNOPSIS
TestBackupThreads.ps1
.DESCRIPTION
This script is used to conduct and time a series of backups using different thread counts.
The output can then be used to make an educated decision on the number of backup threads to
assign for use in farm-level backups.
.NOTES
Author: Sean McDonough
Last Revision: 25-July-2012
.PARAMETER TestLocation
A UNC path to a location that can be used to create test backup sets
.EXAMPLE
TestBackupThreads \\FileShare\TestLocation
#>
param
(
[string]$TestLocation = "$(Read-Host ‘UNC path to test backup location [e.g. \\FileShare\TestLocation]’)"
)
function TestThreads($backupLocation)
{
# Ensure that the SharePoint cmdlets are loaded before continuing
$spCmdlets = Get-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction silentlycontinue
if ($spCmdlets -eq $Null)
{ Add-PSSnapin Microsoft.SharePoint.PowerShell }
# Setup some variables we’ll need for execution.
$threadTimes = @{} # Hash table to hold timing results
$backupItems = Join-Path $backupLocation "spbr*" # Used to delete temp backup files
# We need to execute a full farm backup for each thread count 1 through 10
Clear-Host
Write-Host "`nBackup thread count testing process beginning."
for ($threads = 1; $threads -lt 11; $threads++)
{
# Clean out any backup contents from the test location
Remove-Item $backupItems -recurse
# Grab the starting date/time (for later comparison), kick-off a farm backup, and then
# grab the stop date/time.
Write-Host "`nInitiating a backup with $threads thread(s) …"
$startPoint = Get-Date
Backup-SPFarm -BackupMethod Full -Directory $backupLocation -BackupThreads $threads
$stopPoint = Get-Date
# Store and report results
$keyName = "Backup with {0} thread(s)" -f $threads
$elapsedSeconds = "{0:N0}" -f ($stopPoint – $startPoint).TotalSeconds
$threadTimes[$keyName] = $elapsedSeconds
Write-Host "Backup with $threads thread(s) complete"
Write-Host ("- time to complete (in seconds): {0}" -f $elapsedSeconds)
}
# Do a final sweep of the test backup location to clean out backup items
Remove-Item $backupItems -recurse
# Dump the results sorted in order of quickest to longest
Write-Host "`nBackup thread count testing process complete."
$threadTimes.GetEnumerator() | Sort-Object Value
# Abort script processing in the event an exception occurs.
trap
{
Write-Warning "`n*** Script execution aborting. See below for problem encountered during execution. ***"
$_.Message
break
}
}
# Launch script
TestThreads $TestLocation
[/sourcecode]
The script is fairly straightforward in what it does. You supply a TestLocation parameter to specify where farm backup test data should be written to, and the script will run a series of full farm backups using the supplied location as the backup destination. The script starts with a full backup using one backup thread; at the end of each full farm backup, the script notes how long the backup took (in seconds) and cleans-up the contents of the TestLocation folder. The number of backup threads is then incremented, and the next test is run. When the script has completed running all backup tests, it sorts the results from “quickest backup” (i.e., the backup thread count requiring the least amount of time) to the slowest backup.
I ran a series of three tests for each of the aforementioned environments for a total of six total test runs. Although there’s still quite a bit of variability between individual results within a backup thread series, some trends did appear to emerge.
With the single-server environment, increasing the number of backup threads did appear to have a directional impact on performance. A single backup thread proved to be the slowest option for the farm backup, and “greater than one” thread resulted in better performance.
If you look at the average values, though, there wasn’t a tremendous difference between the slowest thread count (410 seconds for one thread) and the fastest (388 seconds for 10 threads). We’re only talking about a 5% to 6% difference overall. To truly find the optimum number of backup threads in an environment like this would require more than three test runs to account for standard deviation and establish significance.
Oh, and for those that might be wondering: I’m sure I introduced some of my own variability into the results. Although I didn’t do anything processor or disk intensive during the test runs, I didn’t go out of my way to minimize the impact of services, background operations, etc. To repeat: more testing (with better controls) would be needed for truly conclusive results. The only thing I started to show with this particular set of tests is that multithreading seemed to improve backup performance.
Things got quite a bit more interesting (to me) when I switched over to multi-server farm testing.
In the multi-server environment, the average for using just one backup thread (1413 seconds) appeared to be significantly faster than the next best option (1747 seconds for seven backup threads) – in the neighborhood of 20% or so faster. Just like the single-server results, additional trials would be needed to completely validate the observations, but the results are less ambiguous (given the relatively greater precision of the samples) than with the single-server runs.
Do you find this surprising? Given my multi-server environment and what I know about it, I can’t really say that I was caught flat-footed by the results. Going into the tests, my hypothesis was that my backup destination location would likely be the “weak link” in my overall farm and backup topology. The SharePoint Server was doing well, the SQL Server was relatively robust … but all of that backup activity was hard on my (virtualized) file server. Multiple servers trying to write to the backup location were swamping it and the network, and adding additional backup threads to the mix didn’t end up helping or improving the overall backup process.
At the end of the day, I recognize that these tests of mine didn’t prove anything conclusively. Frankly, conclusive proof wasn’t my goal. The intent of these experiments wasn’t to say “more threads are better” or “more threads are worse.”
The only point I’m making (I hope) by sharing these results is this: until you run some real tests of your own in your SharePoint environment, you really don’t know where your backup thread sweet spot is. You can try to guess it, but it’s just a guess. And guessing is really no better than simply leaving the backup thread count set to its default value of three.
The SharePoint BLOB Cache can be a very powerful tool for use in improving farm performance and scalability, but some planning should take place before the BLOB Cache is enabled. In this post, I explain how end users can suffer if BLOB Cache planning isn’t performed. I also make some recommendations on how to configure the BLOB Cache to provide administrators with performance benefits that don’t come at the cost of a negative end user experience.
The topic of the SharePoint BLOB Cache and how it operates jumped back into the front of my brain recently given some conversations I’ve had and things I’ve seen (e.g., a promising CodePlex project called the SharePoint 2010 BlobCache Manager).
What do I mean by “Just do-it?” Well, here’s the high-level summary of what I’ve been seeing people say, post, and practice with the SharePoint BLOB Cache:
To be fair, I think that I’ve done a disservice by contributing to the perception that all you need to do to kick-start BLOB caching is change this web.config line …
[sourcecode language=”xml”]
<BlobCache location="C:\BlobCache\14" path="\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$" maxSize="10" enabled="false" />
[/sourcecode]
… to this:
[sourcecode language=”xml”]
<BlobCache location="C:\BlobCache\14" path="\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$" maxSize="10" enabled="true" />
[/sourcecode]
If you look closely, you’ll see that the only difference between the two XML elements is that the enabled attribute is changed from false to true in the second example.
As you might have guessed, I wouldn’t be writing this blog post if simply changing the BlobCache element’s enabled attribute to true didn’t cause potential problems.
I only had five minutes to deliver the lightning talk, so I had to cram all of the disclaimers for what I was recommending into the legal style slide that appears on the left. Although the slide got a chuckle from the crowd (the print did look pretty small on-screen), I actually did invest some time in its warnings and watch-outs for anyone who wanted to go and dig them up later.
Of the two tips I delivered in the lightning talk, Tip #2 dealt with the SharePoint BLOB cache. I included a very specific warning in the “Disclaimer of Liability” aimed at those who sought to simply “set it and forget it.” The text of that warning read:
Failure to specify a max-age attribute in the BlobCache element of the web.config will result in the default value of 86,400 seconds (24 hours) being used. Use of a non-zero max-age attribute will result in the attachment of client-side cacheability headers to assets that are being BLOB cached, and such headers can result in BLOB assets being cached on the client beyond the duration of the current user session; such caching can easily result in "stale" BLOB resources being used from the client rather than newer ones being fetched from the WFE, so adjust max-age values carefully.
Put another way: if you simply enable the BLOB cache and do nothing else, your users may be getting a SharePoint behavior change that you hadn’t intended for them to have.
The sticking point with SharePoint’s default BlobCache element and attribute settings is that a max-age of 24 hours is assumed and used when the max-age attribute isn’t explicitly specified or set. What does that mean? I wrote a separate post a while back titled Client-Server Interactions and the max-age Attribute with SharePoint BLOB Caching, and that post addressed the effect that explicit and implicit max-age attribute value specifications have on BLOB Caching. I recommend checking out the post for the full background; for anyone who needs a quick summary, though, I can distill it down to two bullet points:
What does this mean for the average user of SharePoint? Well, let me walk through a fictitious scenario with supporting detail – as told from the perspective of a SharePoint end user. If you already understand the problem, you’re short on time, and you want to get right to what I recommend, jump down to the “Recommendations Before You Enable the BLOB Cache” section.
Welcome to the Acme Corporation! The Acme Corporation recently completed a “webification” of its entire product catalog, and the end result is a publishing site collection that is implemented in SharePoint 2010. The site collection houses all of Acme’s products, and those products are available for the public to browse and order. Acme’s web content management team is responsible for maintaining the product catalog as it appears on the site, and that team is led by a crafty old fellow named Wile E. Coyote (who we’ll simply refer to as “Wiley” from here on out).
Wiley has many years of experience with Acme’s products and has tried nearly all of them personally; he’s something of a legend. He and his team worked diligently to get Acme’s products into SharePoint before the launch. Not all of the products made it into SharePoint before the launch, though, so a phased approach was taken to rolling out the entire catalog.
In both of the Fiddler traces above, the focus is on the newsarticleimage.jpg file – the file which houses a picture of the Bundle o’ Dynamite. The first time the browser requests the image within a session, a successful HTTP 200 response is returned to the browser along with the image. Also important to note is the Cache-Control header that comes back with the image:
[sourcecode language=”text”]
Cache-Control: private,max-age=0
[/sourcecode]
The private part of the Cache-Control header tells the client browser to cache the image locally for the duration of the browser session. The max-age=0 portion says, in effect, that subsequent uses of the image by the browser (from its cache) should be validated with a call back to the WFE to ensure that the image hasn’t changed.
And that’s what is shown happening in the second Fiddler trace. When subsequent page requests attempt to use the image, a GET request from the browser is answered by the WFE with
[sourcecode language=”text”]
HTTP/1.1 304 NOT MODIFIED
[/sourcecode]
This response code tells the browser that the image hasn’t changed and that it’s safe to use the locally cached copy. If the image were to change, then an HTTP 200 would be returned instead and the new/updated version of the image would be sent to the browser.
When the browser is closed, the locally cached copy of the image is flushed and the process begins anew the next time the browser opens.
Not long after the launch of Acme’s online product catalog, customers began complaining that browsing the catalog was simply too slow. After some discussion, Management decided to bring in Roadrunner Consulting to assess the site and make suggestions that would improve performance.
Roadrunner’s team raced around (as they are wont to do), ran some tests, made some observations, and provided a list of suggestions. At the top of the list was “Implement SharePoint BLOB Caching.”
So, Acme’s SharePoint administrators jumped right in and turned on BLOB caching. Since the site is served up through a single IIS site (SharePoint zone), the admins set enabled=“true” in the BlobCache element of the site’s web.config file. No other changes were made to the BlobCache element.
So, what happened? Well, things got snappier! The administrators watching their back-end performance noticed that the file system on the WFE started to cache BLOBs that were being requested by users. Each request to the WFE for one of those BLOBs resulted in the BLOB being served back directly from the WFE without a round-trip to the SQL Server. Internal network bandwidth utilization dropped significantly, and the SQL Servers started breathing a bit easier. The administrators were most definitely happy with the change they’d made … and it was as easy as setting enabled=”true” in the BlobCache element of the web.config file. Talk about the greatest thing since sliced bread! Everyone exchanged a round of high-fives after the change was made, and talks of how the geeks would rise up to dominate the world resumed.
There is one very important difference when retrieving items with the BLOB Cache enabled, though, and you have to look closely to see it. Do you see the Cache-Control HTTP header that is returned with the request for the newsarticleimage.jpg image? It’s different than it was before the BLOB Cache was enabled. Now it says
[sourcecode language=”text”]
Cache-Control: public, max-age=86400
[/sourcecode]
Whoa … what does this mean? Well, it means two important things. First, the public designation means that when the image is cached by the browser, it will no longer be private to the current session. It can be re-used across sessions, so it won’t necessarily “go away” when the browser is closed.
Second, the max-age=86400 means that the image will continue to “live” in the browser’s cache for 86400 seconds, or 24 hours. For that period of time, the browser won’t even attempt to contact the WFE to see if the image has changed; it will just use the copy that it holds onto. Nothing short of a browser cache flush (which is manual intervention by the user) will change this behavior.
Admittedly, the Fiddler trace will look a little different when the browser is closed and re-opened … but a re-fetch of the newsarticleimage.jpg file will not take place for a full 24 hours unless a user clears the browser cache.
What does this change in behavior mean for actual users of the site? Read on to find out …
Even though Acme sells a product called the “Bundle o’ Dynamite,” the actual product that ships is a barrel of TNT. Since the product image was wrong, customers were incorrectly concluding that they’d get several sticks of dynamite instead of a barrel, and this was rubbing many of them the wrong way. Who knew?
Wiley went out to SharePoint, checked the article that he wrote, and saw that he did indeed use a series of dynamite sticks for an image. The page should have actually appeared as it does in the screenshot that is above and to the left. After a quick facepalm, Wiley realized that he needed to make a change – and fast.
Wiley went out to the Publishing Images library for the site collection and uploaded a new version of the newsarticleimage.jpg image file – one that contained a barrel of TNT instead of a bundle of dynamite. He then browsed to the article page and did a refresh.
Nothing changed.
Wiley hit F5 in his browser. Still nothing changed.
Over the course of the hour that followed, Wiley grew increasingly more bewildered and panicked as he tried in vain to get the new TNT barrel to show up on the article page. He uploaded the image several more times, closed and re-opened his browser, deleted and then reloaded the image, re-published and re-approved the actual article page, and even got the administrators to flush the SharePoint BLOB Cache. None of the actions made a difference.
Why didn’t any of Wiley’s efforts make a difference? Because what Wiley didn’t understand was that there was nothing he could do short of flushing his cache that would prompt the browser to re-request the updated image. The browser started using the cached copy of the image after the first request Wiley made in the morning; i.e., the request to verify that the image on the page was incorrect as Fulfillment indicated. For another 24 hours (86400 seconds), the browser would continue to use the cached image.
Wiley’s image problem was just one of the potential issues that might surface as a result of the BLOB Cache change. It was also one of the more visible problems. In looking at the path attribute of the BlobCache element, you might have noticed some of the other file types that got cached by default – file types with js (JavaScript) and css (Cascading Style Sheets) extensions, for example. Any of those file types which were served from site collection lists and libraries would also be impacted by the “fetch once and use for 24 hours” behavior.
1. Don’t just “enable” the BLOB Cache with its out-of-the-box (OOTB) default settings. There are a couple of OOTB settings that you should really think hard about changing. I mentioned the default max-age value you get if you don’t actually specify the attribute value. I’m going to talk more about that one in a bit. Also: do you really want the BLOB Cache using your system drive (C:) as its target location for cached files? Most admins I know aren’t particularly friendly with that idea, so relocate the BLOB Cache to another drive.
2. If your Web application has only one zone (i.e., the Default zone), strongly consider specifying a max-age attribute value of zero (max-age=”0”). Why do I say this? Because it avoids the situation I described with Wiley above, and it’s a compromise that gives administrators some of the performance boosts they seek without completely shafting users in the process.
3. If you have a Web application that is extended to two or more zones, apply BLOB Cache settings that are appropriate for each zone. This is relatively common in public-facing SharePoint site collections and Web applications where anonymous access is in-use. In these particular scenarios, there are usually at least two SharePoint zones per Web application: an internal zone (typically the Default zone) through which editors and other users may authenticate to carry out content work, and an external zone (e.g., the Internet zone) which is set up for anonymous access and “external consumption.”
In this dual-zone scenario, it makes sense to configure each zone (IIS site) differently since usage patterns differ between zones. The BlobCache element in the web.config for the internal (Default) zone, for example, should probably be configured according to #2 (above – the one zone scenario with a max-age attribute value of zero). For the web.config that is used in the external zone, though, it may make sense to apply a non-zero max-age value for use with the BLOB Cache – especially since anonymous users aren’t (normally) content editors. A non-zero max-age means fewer trips (overall) to your WFEs from outside the LAN environment, and this helps to keep bandwidth utilization on your Internet connection. There is still a risk that external users may see “stale” content, but the impact is generally more acceptable for straight viewers since they aren’t actively working on content.
4. Consider changing the path expression to restrict what goes into the BLOB Cache. The default path expression for SharePoint 2010’s BlobCache element looks like this:
[sourcecode language=”text”]
\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$
[/sourcecode]
Most administrators are savvy enough to add and remove file extensions from this expression as needed; for example, taking |wmv out of the path expression means that the BLOB Cache will no longer store and serve files with a .wmv extension. Adding and removing extensions really only scratches the surface of what can be done, though. The path attribute value is actually a regular expression, so the full power of regular expressions can be applied to select and exclude files for use with the BLOB Cache.
Suppose you want to explicitly control which images, videos, and other files (that match the list of extensions) end up in the BLOB Cache? Maybe you want to specially name files you intend to cache with an additional .cache extension before the actual file type extension (e.g., .gif). To accomplish this, you could change the path expression to this:
[sourcecode language=”text”]
\.cache\.(gif|jpg|jpeg|jpe|jfif|bmp|dib|tif|tiff|ico|png|wdp|hdp|css|js|asf|avi|flv|m4v|mov|mp3|mp4|mpeg|mpg|rm|rmvb|wma|wmv)$
[/sourcecode]
With this path expression, filenames like these would be included in the BLOB Cache:
… but anything without the additional .cache qualifier would get omitted, such as:
This is just a simple example, but hopefully it gives you an idea of what you could do with the path regular expression to control the contents of the BLOB Cache.
The SharePoint BLOB Cache is a powerful mechanism to improve farm performance and scalability, but it shouldn’t be turned on without some forethought and a couple of changes to the default BlobCache element attribute values.
If you are an administrator and have enabled the BLOB Cache with its default values, check with your users. They might have some feedback for you …
This post covers my summer SharePoint activities, including a number of appearances at SharePoint Saturday events and SPUGs. I also talk about a few other tidbits, including an appearance on Microsoft’s Talk TechNet broadcast.
My family recently relocated from the west side of Cincinnati to the east side, and it’s been a major undertaking – as anyone who’s familiar with Jim Borgman’s comic series on the east and west sides of Cincinnati can appreciate. Between the move and some other issues, I had planned on taking it easy with SharePoint activities for a while.
Despite that goal, it seems I still have a handful of SharePoint-related things planned this summer. Here’s what’s going on.
The main reason I’m calling the article out here (in my blog) is because I put together a couple of PowerShell scripts that I included in the article. The first script relocates the Office Web Apps’ cache site collection to a different content database for any given Web application. The second script displays current values for some common cache settings and gives you the opportunity to change them directly.
The scripts (and article contents) are helpful for anyone trying to manage the Office Web Apps in SharePoint 2010. Check them out!
On Wednesday, July 6th (tomorrow!), I’ll be on Talk TechNet with Keith Combs and Matt Hester. I’m going to be talking with Keith and Matt about SharePoint, disaster recovery, and anything else that they want to shoot the breeze about. 60 minutes seems like a long time, but I know how quickly it can pass once my mouth starts going …
Here’s the fun part (for you): the episode is presented live, and anyone who registers for the event can “call in” with questions, comments, etc. Feel free to call in and throw me a softball question … or heckle me, if that’s your style! Although I don’t know Keith personally (yet), I do know Matt – and knowing Matt, things will be lighthearted and lively.
On Thursday the 7th (yeah, this is a busy week), I’ll be heading down to Evansville, Indiana, to speak at the Evansville user group. This is something that Rob Wilson and I have been discussing for quite some time, and I’m glad that it’s finally coming to fruition!
I’ll be presenting my SharePoint 2010 and Your DR Plan: New Capabilities, New Possibilities! session. The abstract reads as follows:
Disaster recovery planning for a SharePoint 2010 environment is something that must be performed to insure your data and the continuity of business operations. Microsoft made significant enhancements to the disaster recovery landscape with SharePoint 2010, and we’ll be taking a good look at how the platform has evolved in this session. We’ll dive inside the improvements to the native backup and restore capabilities that are present in the SharePoint 2007 platform to see what has been changed and enhanced. We’ll also look at the array of exciting new capabilities that have been integrated into the SharePoint 2010 platform, such as unattended content database recovery, SQL Server snapshot integration, and configuration-only backup and restore. By the time we’re done, you will possess a solid understanding of how the disaster recovery landscape has changed with SharePoint 2010.
It’ll be a bit of a drive from here to Evansville and back, but I’m really looking forward to talking shop with Rob and his crew on Thursday!
Amazingly enough, the primary registration (400 seats) for the event “sold out” in a little over three days. Holy smokes – that’s fast! The event is now wait listed, so if you haven’t yet signed up … you probably won’t get a spot :-(
On August 4th, I’ll be heading back up to Mason, Ohio, to present for my friends at the Cincinnati SharePoint User Group. My presentation topic this time around will be “Caching-In” for SharePoint Performance. Here’s the abstract:
Caching is a critical variable in the SharePoint scalability and performance equation, but it’s one that’s oftentimes misunderstood or dismissed as being needed only in Internet-facing scenarios. In this session, we’ll build an understanding of the caching options that exist within the SharePoint platform and how they can be leveraged to inject some pep into most SharePoint sites. We’ll also cover some sample scenarios, caching pitfalls, and watch-outs that every administrator should know.
Like most of my presentations, this one started as a PowerPoint. I converted it over to Prezi format some time ago, and I’ve been having a lot of fun with it since. I hope the CincySPUG folks enjoy it, as well!
I submitted a handful of abstracts for consideration, and I know that I’ll be speaking at the event. I just don’t know what I’ll be talking about at this point. If you’re going to be in the Washington, DC area on August 11th through 13th, though, consider signing up for the conference!
Along with Brian Jackett, Jennifer Mason, and Nicola Young, I’m helping to plan and execute the event on the 20th. I’m handling speaker coordination again this year – a role that I do enjoy! We’ve had a number of great submissions thus far; in the next week or so, we (the organizing committee) will be putting our heads together to make selections for the event. Once those selections have been made, I’ll be communicating with everyone who submitted a session.
If you live in Ohio and don’t find Columbus to be an exceptionally long drive, I encourage you to head out to the SharePoint Saturday site and sign up for the event. It’s free, and the training you’ll get will be well-worth the Saturday you spend!