Caching, You Ain’t No Friend Of Mine

I love caching and all that it can do to boost performance, but caching for SharePoint in the cloud isn’t the same as it is on-premises. In this post, I explore why that is for Object Caching – and what you can do about it.

I've got a caching-induced headacheI’m a big fan of leveraging caching to improve performance. If you look over my blog, you’ll find quite a few articles that cover things like implementing BLOB caching within SharePoint, working with the Object Cache, extending your own code with caching options, and more. And most of those posts were written in a time when the on-premises SharePoint farm was king.

The “caching picture” began shifting when we started moving to the cloud. SharePoint Online and hosted SharePoint services aren’t the same as SharePoint on-premises, and the things we rely upon for performance improvements on-premises don’t necessarily have our backs when we move out to the cloud.

Yeah, I’m talking about caching here. And as much as it breaks my heart to say it, caching – you ain’t no friend of mine out in SharePoint Online.

Why the heartbreak?

To understand why a couple of SharePoint’s traditional caching mechanisms aren’t doing you any favors in a multi-tenant service like SharePoint Online (with or without Office 365), it helps to first understand how memory-based caching features – like SharePoint’s Object Cache – work in an on-premises environment.

On-Premises

The typical on-premises environment has a small number of web front-ends (WFEs) serving content to users, and the number of site collections being served-up is relatively limited. For purposes of illustration, consider the following series of user requests to an environment possessing two WFEs behind a load balancer:

On-Premises Request Results

Assuming the WFEs have just been rebooted (or the application pools backing the web applications for target site collection have just been recycled) – a worst-case scenario – the user in Request #1 is going to hit a server (either #1 or #2) that does not have cached content in its Object Cache. For this example, we’ll say that the user is directed to WFE #1. Responses from WFE #1 will be slower as SharePoint works to generate the content for the user and populate its Object Cache. The WFE will then return the user’s response, but as a result of the request, its Object Cache will contain site collection-specific content such as navigational sitemaps, Content Query Web Part (CQWP) query results, common site property values, any publishing page layouts referenced by the request, and more.

The next time the farm receives a request for the same site collection (Request #2), there’s a 50/50 shot that the user will be directed to a WFE that has cached content (WFE #1, shown in green) or doesn’t yet have any cached content (WFE #2). If the user is directed to WFE #1, bingo – a better experience should result. Let’s say the user gets unlucky, though, and hits WFE #2. The same process as described earlier (for WFE #1) ensues, resulting in a slower response to the user but a populated Object Cache on WFE #2.

By the time we get to Request #3, both WFEs have at least some cached content for the site collection being visited and should thus return responses more quickly. Assuming memory pressure remains low, these WFEs will continue to serve cached content for subsequent requests – until content expires out of the cache (forcing a re-fetch and fill) or gets forced out for some reason (again, memory pressure or perhaps an application pool recycle).

Another thing worth noting with on-premises WFEs is that many SharePoint administrators use warm-up scripts and services in their environments to make the initial requests that are described (in this example) by Request #1 and Request #2. So, it’s possible in these environments that end-users never have to start with a completely “cold” WFE and make the requests that come back more slowly (but ultimately populate the Object Caches on each server).

SharePoint Online

Let’s look at the same initial series of interactions again. Instead of considering the typical on-premises environment, though, let’s look at SharePoint Online.

Cloud

The first thing you may have noticed in the diagrams above is that we’re no longer dealing with just two WFEs. In a SharePoint Online tenant, the actual number of WFEs is a variable number that depends on factors such as load. In this example, I set the number of WFEs to 50; in reality, it could be lower or (in all likelihood) higher.

Request #1 proceeds pretty much the same way as it did in the on-premises example. None of the WFEs have any cached content for the target site collection, so the WFE needs to do extra work to fetch everything needed for a response, return that information, and then place the results in its Object Cache.

In Request #2, one server has cached content – the one that’s highlighted in green. The remaining 49 servers don’t have cached content. So, in all likelihood (49 out of 50, or 98%), the next request for the same site collection is going to go to a different WFE.

By the time we get to Request #3, we see that another WFE has gone through the fetch-and-fill operation (again, highlighted in green). But, there’s something else worth noting that we didn’t see in the on-premises environment; specifically, the previous server which had been visited (in Request #1) is now red, not green. What does this mean? Well, in a multi-tenant environment like SharePoint Online, WFEs are serving-up hundreds and perhaps thousands of different site collections for each of the residents in the SharePoint environment. Object Caches do not have infinite memory, and so memory pressure is likely to be a much greater factor than it is on-premises – meaning that Object Caches are probably going to be ejecting content pretty frequently.

If the Object Cache on a WFE is forced to eject content relevant to the site collection a user is trying to access, then that WFE is going to have to do a re-fetch and re-fill just as if it had never cached content for the target site collection. The net effect, as you might expect, is longer response times and potentially sub-par performance.

The Take-Away

If there’s one point I’m trying to make in all of this, it’s this: you can’t assume that the way a SharePoint farm operates on-premises is going to translate to the way a SharePoint Online farm (or any other multi-tenant farm) is going to operate “out in the cloud.”

Is there anything you can do? Sure – there’s plenty. As I’ve tried to illustrate thus far, the first thing you can do is challenge any assumptions you might have about performance that are based on how on-premises environments operate. The example I’ve chosen here is the Object Cache and how it factors into the performance equation – again, in the typical on-premises environment. If you assume that the Object Cache might instead be working against you in a multi-tenant environment, then there are two particular areas where you should immediately turn your focus.

Navigation

By default, SharePoint site collections use structural navigation mechanisms. Structural navigation works like this: when SharePoint needs to render a navigational menu or link structure of some sort, it walks through the site collection noting the various sites and sub-sites that the site collection contains. That information gets built into a sitemap, and that sitemap is cached in the Object Cache for faster retrieval on subsequent requests that require it.

Without the Object Cache helping out, structural navigation becomes an increasingly less desirable choice as site hierarchies get larger and larger. Better options include alternatives like managed navigation or search-driven navigation; each option has its pros and cons, so be sure to read-up a bit before selecting an option.

Content Query Web Parts

When data needs to be rolled-up in SharePoint, particularly across lists or sites, savvy end-users turn to the CQWP. Since cross-list and cross-site queries are expensive operations, SharePoint will cache the results of such a query using – you guessed it – the Object Cache. Query results are then re-used from the Object Cache for a period of time to improve performance for subsequent requests. Eventually, the results expire and the query needs to be run again.

So, what are users to do when they can’t rely on the Object Cache? A common theme in SharePoint Online and other multi-tenant environments is to leverage Search whenever possible. This was called out in the previous section on Navigation, and it applies in this instance, as well.

An alternative to the CQWP is the Content Search Web Part (CSWP). The CSWP operates somewhat differently than the CQWP, so it’s not a one-to-one direct replacement … but it is very powerful and suitable in most cases. Since the CSWP pulls its query results directly from SharePoint’s search index, it’s exceptionally fast – making it just what the doctor ordered in a multi-tenant environment.

Quick note (2/1/2016): Thanks to Cory Williams for reminding me that the CSWP is currently only available to SharePoint Online Plan 2 and other “Plan 3” (e.g., E3, G3) users. Many enterprise customers fall into this bucket, but if you’re not one of them, then you won’t find the CSWP for use in your tenant :-(

There are plenty of good resources online for the CSWP, and I regularly speak on it myself; feel free to peruse resources I have compiled on the topic (and on other topics).

Wrapping-Up

In this article, I’ve tried to explain how on-premises and multi-tenant operations are different for just one area in particular; i.e., the Object Cache. In the future, I plan to cover some performance watch-outs and work-arounds for other areas … so stay tuned!

Additional Reading and References

  1. MSDN: Navigation options for SharePoint Online
  2. MSDN: Using Content Search Web Part instead of Content Query Web Part to improve performance in SharePoint Online
  3. SharePoint Interface: Presentations and Materials

Revisiting the Basement Datacenter in 2016

Here we are in 2016. If you’ve been following my blog for a while, you might recall a post I threw together back in 2010 called Portrait of a Basement Datacenter. Back in 2010, I was living on the west side of Cincinnati with my wife (Tracy) and three year-old twins (Brendan and Sabrina). We were kind of shoehorned into that house; there just wasn’t a lot of room. Todd Klindt visited once and had dinner with us. He didn’t say it, but I’m sure he thought it: “gosh, there’s a lot of stuff in this little house.”

Servers in 2010All of my computer equipment (or rather, nearly all of my computer equipment) was in the basement. I had what I called a “basement datacenter,” and it was quite a collection of PCs and servers in varying form factors and with a variety of capabilities.

The image on the right is how things looked in 2010. Just looking at the picture brings back a bunch of memories for me, and it also reminds me a bit of what we (as server administrators) could and couldn’t easily do. For example, nowadays we virtualize nearly everything without a second thought. Six years ago, virtualization technology certainly existed … but it hadn’t hit the level of adoption that it’s cruising at today. I look at all the boxes on the right and think “holy smokes – that’s a lot of hardware. I’m glad I don’t have all of that anymore.” It seemed like I had drives and computers everywhere, and they were all sucking down juice. I had two APC 1600W UPS units that were acting as battery backups back then. With all the servers plugged-in, they were drawing quite a bit of power. And yeah – I had the electric bill to prove it.

So, What’s Changed?

For starters, we now live on the east side of Cincinnati and have a much bigger house than we had way back when. Whenever friends come over and get a tour of the house, they inevitably head downstairs and get to see what’s in the unfinished portion of the basement. That’s where the servers are nowadays, and this is what my basement datacenter looks like in 2016:

Servers in 2016Purpose of each server

In reality, quite a bit has changed. We have much more space in our new house, and although the “server area” is smaller overall, it’s basically a dedicated working area where all I really do is play with tech, fix machines, store parts, etc. If I need to sit at a computer, I go into the gaming area or upstairs to my office. But if I need to fix a computer? I do it here.

In terms of capabilities, the last six years have been good to me.

All Hail The Fiber

Back on the west side of town, I had a BPL (broadband-over-powerline) Internet hookup from Duke Energy and The CURRENT Group. Nowadays, I don’t even know what’s happening with that technology. It looks like Duke Energy may be trying to move away from it? In any case, I know it gave me a symmetric pipe to the Internet, and I think I had about 10Mbps up and down. I also had a secondary DSL connection (from Cincinnati Bell) that was about 2.5Mbps down and 1Mbps up.

Once I moved back to the east side of Cincinnati and Anderson Township, the doors were blown off of the barn in terms of bandwidth. Initially, I signed with Time Warner Cable for a 50Mbps download / 5Mbps upload primary connection to my house. I made the mistake of putting in a business circuit (well, I was running a business), so while it gave me some static IP address options, it ended up costing a small fortune.

InternetSpeed2016My costly agreement with Time Warner ended last year, and for that I’m thankful. Nowadays, I have Cincinnati Bell Fiber coming to my house (Fioptics), and it’s a full-throttle connection. I pay for gigabit download speeds and have roughly a 250Mbps upload pipe. Realistically, the bandwidth varies … but there’s a ton of it, even on a bad day. The image on the right shows the bandwidth to my desktop as I’m typing this post. No, it’s not gigabit (at this moment) … but really, should I complain about 330Mbps download speeds from the Internet? Realistically speaking, some of the slowdown is likely due to my equipment. Running full gigabit Ethernet takes good wiring, quality switches, fast firewalls, and more. You’re only as fast as your slowest piece of equipment.

I do keep a backup connection with Time Warner Cable in case the fiber goes down, and my TMG firewall does a great job of failing over to that backup connection if something goes wrong. And yes, I’ve had a problem with the fiber once or twice. But it’s been resolved quickly, and I was back up in no time. Frankly, I love Cincinnati Bell’s fiber.

What About Storage?

ProRaidIn the last handful of years, storage limits have popped over and over again. You can buy 8TB drives on Amazon.com right now, and they’re not prohibitively expensive? We’ve come a long way in just a half dozen years, and the limits just keep expanding.

I have a bunch of storage downstairs, and frankly I’m pretty happy with it. I’ve graduated from the random drives and NAS appliances that used to occupy my basement. These days, I use Mediasonic RAID enclosures. You pop some drives in, connect an eSATA cable (or USB cable, if you have to), and away you go. They’ve been great self-contained pass-through drive arrays for specific virtual machines running on my Hyper-V hosts.  I’ve been running the Mediasonic arrays for quite a few years now, and although this isn’t a study in “how to build a basement datacenter,” I’d recommend them to anyone looking for reliable storage enclosures. I keep one as a backup unit (because eventually one will die), and as a group they seem to be in good shape at this point in time. The enclosures supply the RAID-5 that I want (and yeah, I’ve had *plenty* of drives die), so I’ve got highly-available, hot-swappable storage where I need it.

Oh, and don’t mind the minions on my enclosures. Those of you with children will understand. Those who don’t have children (or who don’t have children in the appropriate age range) should either just wait it out or go watch Despicable Me.

Hey? What About The Cloud?

Servers and their shelfThe astute will ask “why are you putting all this hardware in your house instead of shifting to the cloud?” You know, that’s a good question. I work for Cardinal Solutions Group, and we’re a Microsoft managed partner with a lot of Office 365 and Azure experience. Heck, I’m Cardinal’s National Solution Manager for Office 365, so The Cloud is what I think about day-in and day-out.

First off, I love the cloud. For enterprise scale engagements, the cloud (and Microsoft’s Azure capabilities, in particular) are awesome. Microsoft has done a lot to make it easier (not “easy,” but “easier”) for us to build for the cloud, put our stuff (like pictures, videos, etc.) in the cloud, and get things off of our thumb drives and backup boxes and into a place where they are protected, replicated, and made highly available.

What I’m doing in my basement doesn’t mean I’m “avoiding” the cloud. Actually, I moved my family onto an Office 365 plan to give them email and capabilities they didn’t have before. My kids have their first email address now, and they’re learning how to use email through Office 365. I’m going to move the SharePoint site collection that I maintain for our family (yes, I’m that big of a geek) over to SharePoint Online because I don’t want to wrangle with it at home any longer. Keeping SharePoint running is a pain-in-the-butt, and I’m more than happy to hand that over the Office 365 folks.

I’ll still be tinkering with SharePoint VMs for sure with the work I do, but I’m happy to turn over operational responsibility to Microsoft for my family’s site collection.

The Private Cloud

ServerShelfLeftSo even though I believe in The Cloud (i.e, “the big cloud that’s out there with all of our data”), I also believe in the “private cloud,” “personal cloud,” or whatever you want to call it. When I work from the Cardinal office, my first order of business is to VPN back to my house (again, through my TMG Firewall – they’ll have to pry it from my cold, dead hands) so that I have access to all of my files and systems at home.

Accessing stuff at home is only part of it, though. The other part is just knowing that I’m going through my network, interacting with my systems, and still feeling like I have some control in our increasingly disconnected world. My Plex server is there, and my file shares are available, and I can RDP into my desktop to leverage its power for something I’m working on. There’s a comfort in knowing my stuff is on my network and servers.

Critical data makes it to the cloud via OneDrive, Dropbox, etc, but I still can’t afford to pay for all of my stuff to be in the cloud. Prices are dropping all of the time, though. Will I ever give up my basement datacenter? Probably not, because maintaining it helps me keep my technical skills sharpened … but it’s also a labor of love.

Additional Reading and References

  1. Blog Post: Portrait of a Basement Datacenter
  2. Blog: Todd Klindt’s SharePoint Admin Blog
  3. Department of Justice: Current Group Broadband Overview
  4. Site: Cincinnati Bell Fioptics
  5. TechNet: Threat Management Gateway
  6. Amazon.com: Seagate Archive 8 TB Internal Hard Drive
  7. Amazon.com: Mediasonic PRORAID Drive Enclosure
  8. Amazon.com: Despicable Me
  9. Company: Cardinal Solutions Group

What Happened To My Office 365 Public Site?

Close Out (Early 2016)

It would appear that things are more or less back to normal. I never got an “everything is okay and live” email, but theming and branding are working properly both on public sites (tick, tock, tick, tock …) and internal sites. No conflicts at this point. Since I like to tie things up when complete, we’ll call this one “done” for now and move on.

What the heck happened?Update (Evening 7/23/2015)

Microsoft has been looking at this issue, and progress is being made! My public site looks like it has returned to normal … but I know that we’re not quite out of the woods yet.

John from Microsoft followed-up with me yesterday and said the following:

“We have pulled the flight that was impacting everyone from production.  The plan is to address these issues before turning the flight back on.  Would you be up for piloting the upgraded flight before we turn it back on for everyone? Also, you mention you know others who are having problems, would they be willing to pilot the new flight as well?  If so, please provide their contact information so I can follow up with them.”

Clearly, there’s a strong “flying vibe” in Redmond ….

I told John that I was definitely a “go” for the flight, and that I knew some others who’d experienced problems. And that’s where I’m hoping that some of you can help.

If you have been encountering problems with your Office 365 public site that are similar to mine – and you want to be part of the fix – let me know and I’ll hook you up with John. Shoot me your name, email address, and public site URL; I think that will do the job.

Stay tuned!

Original Post (below)

First of all, let me state that I’m not talking about the fact that Office 365 public sites are going the way of the dinosaur. That’s old news at this point. Instead, I’m talking about a “disruption in the force” that some of you may have observed when recently browsing to your public sites and discovering that they had … changed.

And what do I mean by “changed?” In my case, it was the observation that my public site’s branding had been altered to something I hadn’t chosen. The background image was different, the fonts weren’t the same, a number of the CSS styles I had applied to address scrollbar positioning and what-not weren’t in effect, and more. In essence, my custom branding had been completely steamrolled.

The After And The (Sort-Of Before)

Office 365 Public Site: Forcing Some Branding Elements BackOffice 365 Public Site: Busted BrandingDon’t take my word for it, though. Have a look for yourself. On the left is my public site as I discovered it a couple of weeks back (i.e., near the beginning of July, 2015). On the right is the way it’s supposed to look … sort of. The fonts and aspects of the responsive design are still off in the “corrected” version on the right, but I managed to hack the proper background, scrollbars, and a few other elements back to where they were previously. Even though I didn’t get everything corrected, differences between the two are immediately obvious.

I was confident that it had been months since I had changed anything on my public site, but I verified the branding assets in the Office 365 site against what I was tracking in source control. As expected, they were identical. The changes I was observing were not due to anything I had done to the public site.

Grumble Grumble …

Facebook Complaint About Branding IssueSince it wasn’t the first time I’d had issues with unexpected changes and behavior on my public site, I wasted little time before going to Facebook to complain aloud. Many of my friends in the SharePoint community are also friends on Facebook, so I figured I’d get some support (moral, if nothing else) there.

Shortly after posting the update seen on the left, I tagged a couple of my Microsoft friends (specifically, Jeremy Thake and Chris Johnson) who work with the Office 365 team(s) and asked if I should have known about an update or change that might have affected my public site in this fashion. Even though I was confident that I hadn’t done anything to directly impact my public site’s look-and-feel, I was not about to rule out the possibility that I had missed something that had been communicated to me. In fact, I have been consulting in information technology long enough to know that my “mental glove” doesn’t catch everything thrown to me; anymore, I just kind of assume that I am in error and work from there.

Jeremy got back to me first and indicated that he didn’t have anything specific to share, but he indicated he would take the issue to folks (internally) who should know. Shortly after that, Chris tagged Steve Walker (another Microsoftee with superhuman powers) to bring the issue to his attention. Steve told me to file a service request (SR) and that he would escalate that SR right away. So, I filed the service request with supporting screenshots and documentation … and as he had indicated he would, Steve escalated the SR in less than an hour.

In the meantime, I did what I could to get some of my site’s branding back to the way it had been. How did I do that? With the liberal application of the dreaded !important CSS directive in the custom style sheet I had created to go with my master page. The !important directive is definitely not something I use (or even like to go near) on a daily basis, but in this case, I was trying to achieve results with a minimal investment of time and effort. I needed my styling to trump whatever was being laid-down after my style sheet was being processed.

So, What Happened Next?

Following Steve’s escalation, I started working with an extremely approachable escalation engineer named John. John and I exchanged some emails, did a late-night screen-sharing session, and generally looked things up-and-down. John concurred that what we were seeing shouldn’t have been happening, and he mentioned that he had another customer or two that seemed to be having a similar problem. Some tracing in those cases seemed to implicate a recent client-side theming change that had been made and rolled-out.

Analysis Of Styles In Internet Explorer Developer ToolsAn issue with client-side theming “felt” right to me. I had used Internet Explorer’s Developer Tools to do some backtracking into the source of the errant styles that were being applied to my site, and I noticed that the background image was being served from a temporary theme directory on the server. Until I started forcing my styles to override the ones that were being applied (an example of which is shown on the right), the theme styles were trumping my own styles.

John was out for a while, but he came back to me recently with the following explanation. And his explanation makes complete sense:

So the problem you were having was the result of a bad interaction with a third-party CSS minification technique called minisp.  By adding HTML comments around our CssLink controls in the master page, the <link> tags we normally render are part of a comment and therefore not part of the DOM.

When Client-Side Theming goes to replace the CSS on these pages, it wants to put the generated <style> blocks immediately before the corresponding <link> tags. Since it can’t find the <link> tags in the DOM, it defaults to adding the <style> blocks to the end of the document head. Since these come after the custom CSS, the rules in our themed CSS take priority over the rules in the custom CSS.

Resolution?

As of July 21st, 2015, there is no official resolution. Since this has been identified as a problem in how client-side theming handles CSS <style> block insertions, though, the ultimate fix needs to come from Microsoft. In the meantime, I’ll be sticking with my hacked CSS style sheet to get back most of the look-and-feel that I need. I could expend additional (development) effort to ensure that my styles are applied after the Office 365 theme styles without using !important, but there are plenty of other more important tasks vying for my attention right now.

Thumbs UpSo, if you found this post and it’s helping you to realize that you’re not going insane (at least not because of public site branding changes you didn’t make), I’ll feel that my job is done.

As I hear more and changes take place, I’ll try to update this post accordingly. Check back every now and then if you want the play-by-play on this issue.

Parting Thoughts: A Tip Of My Hat To Microsoft

In the past, I’ve been pretty vocal about the way that Microsoft has sort of “rolled” changes onto its Office 365 customer base and failed to communicate problems in a timely and complete fashion – actions (or lack of action) that ultimately caused pain and problems. As vocal as I’ve been in those situations, I want to go on-record as saying (just as loudly) that Microsoft has definitely listened to the critical feedback it has been receiving and has acted to make changes that we have indicated we need.

Even though this public site branding issue is indeed a bug, Microsoft listened and responded quickly – without protest, without claiming that “nothing is wrong,” and without some of the problem behaviors I used to see.

Where there were previously few communications about Office 365 outages, problems, changes, and updates, we now have a boatload of information (with some of it actually being pushed) to us to keep us in-the-know on our tenants, where they stand at any given point in time, and where they are going. I can get both at-a-glance health information and deep explanations for issues using the administrative portal’s Service Health dashboard. I get push notifications whenever something happens in my tenant using the Office 365 Admin application that runs on my Windows Phone. I know when new service features and capabilities are rolling out, if changes have been cancelled, etc., by looking at the Office 365 Roadmap. And these are just some of the channels and information streams that are available.

Is Microsoft “all the way there” yet? No, but they are dramatically further along than when Office 365 first rolled-out. Outages still occur – as they do with any service – but I feel like I know what’s going on now. That’s a huge improvement in my book.

References and Resources

  1. Microsoft Support: Information about changes to the SharePoint Online Public Website feature in Office 365
  2. Office 365 Public Site: Bitstream Foundry LLC
  3. LinkedIn: Jeremy Thake
  4. LinkedIn: Chris Johnson
  5. LinkedIn: Steve Walker
  6. Stack Overflow: What are the implications of using “!important” in CSS?
  7. MSDN: Using the F12 developer tools
  8. Microsoft: Office 365 Service Health Status
  9. Windows Phone App Store: Office 365 Admin
  10. Office.com: Office 365 Roadmap