Adventures in Sitecore Partial Html Cache Clearing

Sitecore offers us the HtmlCacheClearer and the HtmlCacheClearAgent. The former will clear the entire HTML cache for sites defined within the handlers sites property at the end of a publish, the latter is disabled by default in the scheduler but if enabled will clear the entire HTML cache for every site hosted in the solution based on the interval you specify.

Neither of these are particularly desirable. Clearing the entire HTML cache at the end of every publish can massively reduce the performance boost you can gain from using HTML cache if publishing is a regular activity. It also clears caches for every site configured regardless if there were changes to the sites content or not.

Clearing the entire HTML cache for every site on a schedule has some advantages in that we can hold onto the cache for longer (based on the interval you set, which can be loosely tied to acceptable delays in seeing content changes go live) but again it’s indiscriminate in that it clears the cache whether the content has changed or not. Scheduled clearance can also create a “hill effect” in CPU usage on your content delivery servers.

CPU-Hills

The spikes you can see above are a direct consequence of setting the HTML cache to clear every 30 minutes for a site that makes heavy use of HTML cache. This can be compounded if you have multiple content delivery servers all clearing at the same time and it’s extremely difficult (perhaps impossible) to synchronise the interval based scheduler across delivery servers to try and offset them so this solution also does not scale very well at all.

And what about deployments, if we go down to a single delivery server while upgrading the other the spikes after cache clearance can get pretty uncomfortable.

Well the good news is we can get rid of this problem by extending Sitecore so that it only clears the HTML cache for the sections of the site we specify, within the sites we specify and when we specify.

First I’ll point out how to access the HTML cache for a given site:

// skipping null checks for brevity
var cache = Factory.GetSiteInfo("siteName").HtmlCache;

HtmlCache inherits from Sitecore.Caching.CustomCache and if you’ve ever created your own cache before by inheriting from CustomCache you will know that you need to generate a cache key to add an entry.

Sitecore generates this cache key for every rendering in the mvc.renderRendering pipeline and GenerateCacheKey processor and the cache key is a concatenation of (depending on the caching settings on the rendering item):

  • Controller name
  • MVC area
  • Data source path
  • Language
  • Querystring
  • Rendering parameters
  • Device

And so on. If you want to see all of the keys that are currently in the HTML cache to understand the format you can call:


var keys = cache.InnerCache.GetCacheKeys();

If you want to customize the way cache keys are generated or add additional context to the cache keys you can replace Sitecore’s GenerateCacheKey processor with your own. For example if you are working with dynamic pages (or virtual urls) and want to be able to cache the renderings on these pages even though they don’t have a traditional data source to vary by, you could do something like:


public class GenerateDynamicCacheKey : GenerateCacheKey
{
    protected override string GenerateKey(Rendering rendering, RenderRenderingArgs args)
    {
        var key = base.GenerateKey(rendering, args);

        // get a dynamic identifier i.e. product id

        return AppendProductId(key);
    }
}

Now we can set renderings to cacheable on dynamic pages which vary by the product being displayed.

This is where things get interesting. As well as being able to clear the entire HTML cache for a site by calling cache.Clear()CustomCache, which HtmlCache inherits from exposes two additional methods:

  1. public virtual void RemoveKeysContaining(string value);
  2. public virtual void RemovePrefix(string prefix);

The names of these methods should make it fairly self-explanatory what they do and if we think back to the example of appending the product id to the end of the cache key well now we’re in a position where we could add a handler to the publish:end:remote event that extracts the item just published, check to see if it’s a product and if so remove the HTML cache entries with keys containing the product id:

protected void OnPublishEnd(object sender, EventArgs args)
{
    // null checks skipped for brevity
    var item = (Event.ExtractParameter(args, 0) as Publisher).Options.RootItem;

    if (item.IsProduct()) // extension method
    {
        Factory.GetSiteInfo("siteName").HtmlCache.RemoveKeysContaining(item["Product Id"]);
    }
}

So now we can hold on to the precious HTML cache for all of our product pages and only clear the cache for individual products when they’re updated.

Conversely we may still want to clear all non-product related caches on a schedule. First let’s take a look at what RemoveKeysContaining actually does. Well it calls InnerCache.RemoveKeysContaining(value). InnerCache is a property that implements ICache, which turns out to be Sitecore.Caching.Cache, which in turns inherits from Cache. The RemoveKeysContaining method in Sitecore.Caching.Cache looks like this:


public void RemoveKeysContaining(string keyPart)
{
    base.Remove((string key) => key.IndexOf(keyPart, StringComparison.InvariantCultureIgnoreCase) > -1);
}

So essentially RemoveKeysContaining is really a helper method on the Cache class, which internally creates a predicate that gets passed into the Remove method of the generic cache class it inherits from.

Ok, great, so this means we can create our own predicate. Well how about when we generate our own cache keys as described earlier we add an identifier to all product related caches $”{productId}.product“.

This would enable us to do something like the following in our scheduled, non-product related cache clearer:

// agent registered in the scheduling section of the Sitecore config
// siteName(s) to clear could be defined in the agent config
public void Run
{
    var cache = Factory.GetSiteInfo("siteName").HtmlCache;
cache.InnerCache.Remove((string key) => !key.EndsWith(".product"));
}

That simple. So now we have individual product page caches clearing only when the product is modified and non-product related caches clearing based on an interval.

Hopefully it’s evident now that using a combination of:

  1. Customised HTML cache keys
  2. Custom predicates for removing specific entries based on their key

You can really begin to fine tune your HTML cache clearing strategy to to clear entries only when it’s absolutely necessary.

Happy optimization!