Sitecore Publishing Service Failed Publish Handling

If you have opted in to using Sitecore’s Publishing Service module to scale up your publishing operations you have probably come across situations where a particular publishing job fails behind the scenes. There are many reasons that a publish job can fail from incorrect configuration to issues related to the manifest calculation that the publishing service executes for every job.

The problem with this is that the failure will only really be detected if you happen to be looking at the publishing dashboard, which is often not the case for editors publishing content using workflow or automated publications that are triggered as part of a deployment pipeline.

So how can we detect and alert ourselves to a failed publish job? Well the publishing service comes with it’s own set of pipelines and event handlers defined in Sitecore.PublishingService.config (which is installed as part of the publishing service module package for Sitecore). The event to pay attention to is publishingservice:publishend. You will notice that the publishing service comes with a couple of out of the box handlers, one of which is responsible for raising “traditional” publish related events such as publish:end and publish:end:remote to ensure that code listening to these events continues to function. Go ahead and add a custom handler and patch it in after the out of the box job end handler:

patch:after=”*[@type=’Sitecore.Publishing.Service.Events.PublishingJobEndHandler, Sitecore.Publishing.Service’]” type=”YourClass, YourAssembly” method=”FailedPublishAlert”

Within your handler you need to reference libraries that are unfortunately not currently distributed via nuget so you will have to add local references for these by fetching copies from your local bin folder::

  • Sitecore.Publishing.Service.dll
  • Sitecore.Framework.Publishing.Abstractions.dll

You can also write to the publishing log with a bit of setup in the constructor which gives you access to the publishing service logger:


public class PublishingServiceEventHandler
{
    protected readonly IPublishingLog _logger;

    public PublishingServiceEventHandler() : this(
        new PublishingLogWrapper())
    {

    }

    public PublishingServiceEventHandler(IPublishingLog logger)
    {
        _logger = logger;
    }
}

In the FailedPublishAlert method the first job is to get access to the PublishingJobEndEventArgs:


protected PublishingJobEndEventArgs ExtractArgs(EventArgs args)
{
    var scArgs = args as SitecoreEventArgs;

    if (scArgs == null)
        return null;

    if (scArgs.Parameters == null || !scArgs.Parameters.Any() || !(scArgs.Parameters[0] is PublishingJobEndEventArgs))
        return null;

    return (PublishingJobEndEventArgs)scArgs.Parameters[0];
}

With access to the PublishingJobEndEventArgs EventData property you can check the status of the publish job which is an enumeration and then handle failure cases however you want to:


public void FailedPublishAlert(object sender, EventArgs args)
{
    var publishArgs = ExtractArgs(args);

    if (publishArgs == null)
        return;

    if (publishArgs.EventData.Status == PublishJobStatus.Failed)
    {
        HandleFailure(publishArgs);
    }
}

protected virtual void HandleFailure(PublishingJobEndEventArgs publishArgs)
{
    _logger.Info($"handling failed publish: {publishArgs.EventData.JobId}");
}

There are plenty of properties on the PublishingJobEndEventArgs EventData property that should allow you to do something meaningful in the event of a failure:

  • CompareRevisions
  • IncludeDescendants
  • ItemId
  • JobId
  • LanguageNames
  • MetaData
  • PublishDate
  • PublishType
  • RepublishAll
  • SourceDatabaseName
  • Status
  • Targets
  • Username

To wrap up, the publishing service certainly outperforms traditional publishing and is pretty much a must for solutions with large scale publishing requirements, however it is not without its problems (some would say that’s an understatement) and publish jobs can fail for many reasons. Fortunately the events and associated data are available so we can at least handle failure.

Happy publishing!

 

Advertisements

Sitecore 9.1 Initial Release & SIF 2.0 Stuck on Prerequisites

After recently installing a scaled Sitecore 9.1 Initial Release topology on a single Windows Server 2016 virtual machine using SIF 2.0 I ran into some issues with the installation of prerequisites.

Some prerequisites cannot be installed by SIF, namely:

  • Solr
  • Microsoft Machine Learning Server
  • Sql Server
  • MongoDb

But it’s great that SIF will help with things like installing the Web Platform Installer etc (the full list is in the documentation). To install the prerequisites we simply need to execute the PowerShell command:

Install-SitecoreConfiguration configs\prerequisites.json

You may need to change the path to your prerequisites.json file but it’s included in the packages that come with the Sitecore platform download (this post assumes you’ve already installed the SitecoreInstallFramework PowerShell module). Where things were getting stuck was on the task EnableWindowsOptionalFeature. After running a couple of times it consistently got stuck on this step.

After running the installation again with verbose output:

Install-SitecoreConfiguration configs\prerequisites.json -Verbose

The following line was being output right before the installation got stuck:

VERBOSE: Target Image Version 10.0.14393.2602

This refers to the version of IIS installed on the server and in my case the IIS version was behind this (10.0.0.14393.0). I decided to check for Windows updates and found that there was a cumulative update pending. Installing this update and restarting the server resolved the issue.

After this however I got another strange error message after the installation downloads the Web Platform Installer (the task InstallWebPlatformInstaller):

Install-SitecoreConfiguration : This command cannot be run completely because the system cannot find all the information required.
At S:\sitecore\install.ps1:5 char:1
+ Install-SitecoreConfiguration configs\prerequisites.json -Verbose
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorException
+ FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Install-SitecoreConfiguration

What was actually happening here is the installation was launching the Web Platform Installer msi in a separate window. After completing the Web Platform Installer installation in the separate window and running the prerequisites installation again via PowerShell the prerequisites installation was able to progress all the way to the end successfully.

“Sitecore prerequisites are now installed, you may proceed with further installations.”

Happy SIFing!

 

Sitecore – After parsing a value an unexpected character was encountered t.Path.

In Sitecore 8.x (and possibly 9.x but can’t confirm) if you get a red bar at the top of the page when trying to save in Experience Editor with an error message that looks something like this:

An error occurred. [Log message: After parsing a value an unexpected character was encountered: t. Path ‘scFieldValues.fld_504A9E8795B1458CA24E1117F3AED60A_FBAB79F728054AB9B86996CFC31F061D_en_2_f13e115eea254229a2d5439b325848e3_5220’, line 1, position 3850.]

This likely means that one of the fields on the page or a component data source contains encoded html that Sitecore doesn’t like. In my case it was %22 (or double quotes). To find the item that contains the encoded html take the immediate guid after “fld_” in the error message and convert it to a “full” guid. In this case:

504A9E8795B1458CA24E1117F3AED60A

Becomes

{504A9E87-095B-1458-CA24-E1117F3AED60A}

Pop it into the search bar in the bottom right of the taskbar in Sitecore’s desktop interface and the item will appear. Then locate the offending encoded html and fix as necessary.

Sitecore Content Tagging & Azure Text Analytics

To implicitly profile / understand site users, the more we know about our content the better. And by the more we know, what I really mean is taxonomy and meta data associated with our content that we can analyse to draw educated conclusions about the nature of the content our site users are interested in and the tasks they are trying to accomplish.

In the best case scenario, all of our site content would be nicely tagged with relevant and well designed meta data, controlled and uncontrolled vocabulary and organised within a carefully designed and effective information architecture. But as we know that is often not the case, it’s simply too much of a hands on and ongoing effort for most teams and in a lot of cases content is also produced by third party agencies or freelancers who may not be as dialled in to the nuances of the data model and entities that are important to the organisation.

As developers, this is an opportunity to put our thinking caps on to see where technology can help.

All three major cloud providers (Google, Amazon, Microsoft) provide machine learning based content tagging services and they all look pretty similar. The premise is this, give us your unstructured content, we’ll analyse it and send you back structured data in the form of sentiment analysis, recognised entities (people, places, organisations, objects), recognised topics, key phrases etc. The focus of this post is on Microsoft’s Text Analytics service, which is part of Cognitive Services.

The fastest way to get started and see what Text Analytics can do for you is to visit the home page and provide the demo with some of your content:

azure-text-analytics

But I’m going to jump straight into how we can integrate this service with Sitecore. The good news is that Microsoft offers a free tier with Text Analytics, so assuming you have an Azure account (if not sign up for a free trial) go ahead and follow the documentation to create a Text Analytics resource in your Azure subscription, which will give you a Text Analytics service endpoint for your region and the API keys you need to work with the service.

I’m going to integrate Text Analytics by subscribing to the publish begin event so as editorial teams make changes to and publish new content, it’s automatically tagged up before it makes it to the content delivery roles. So first we need a handler:

First I add a handler to the publish begin event. In the handler, I extract the item being published and then call an Item extension method, which takes care of the rest:


public static void AnalyseTextAzure(this Item item)
{
    var api = new AzureApi();

    var keyPhraseAnalysis = api.SendKeyPhraseAnalysis(item);

    if (keyPhraseAnalysis != null)
    {
        var keyPhrases = new List();

        foreach (var document in keyPhraseAnalysis.Documents)
        {
            keyPhrases.AddRange(document.KeyPhrases);
        }

        item.Editing.BeginEdit();
        item["Azure Key Phrases"] = string.Join(",", keyPhrases);
        item.Editing.EndEdit();
    }

    var linkedEntitiesAnalysis = api.SendLinkedEntitiesAnalysis(item);

    if (linkedEntitiesAnalysis != null)
    {
        var linkedEntities = new List();

        foreach (var document in linkedEntitiesAnalysis.Documents)
        {
            linkedEntities.AddRange(document.Entities.Select(x => x.Name));
        }

        item.Editing.BeginEdit();
        item["Azure Linked Entities"] = string.Join(",", linkedEntities);
        item.Editing.EndEdit();
    }
}

To explain this. Azure Text Analytics has separate APIs for getting linked entities and key phrases in relation to a piece of text. So for the item being published we are calling both of these APIs and storing the responses in separate fields on the item.

A couple of things to note:

  1. Before sending text to Text Analysis we must convert it to plain text, so if you are sending rich text content you will need to convert, handily Sitecore has a utility method to do just this TextUtil.StripHtml
  2. We can send up to 1000 documents at a time for analysis. You may choose to bundle fields on the item and select fields on component data sources into a single document or send fields on the item and fields on the data sources as separate documents, it depends on how you structure your content. In either case 1000 documents should be way more than enough
  3. Each document you send to Text Analysis counts as a transaction, your free tier is based on up-to 5000 transactions every 30 days. So think about this when choosing how to bundle your content
  4. The maximum length of an individual document is 5000 characters
  5. Text Analysis works better with larger volumes of text, so take this into consideration when thinking about how you want to package up your content into documents
  6. Text Analysis works better with English currently, so expect better results with English. You can send non-english content for analysis but you will need to map Sitecore’s culture codes to the codes expected by Azure

To take a closer look at one of these APIs:


public KeyPhraseResponse SendKeyPhraseAnalysis(Item item)
{
    var request = BuildRequest("keyPhrases");

    var textAnalysis = new TextAnalysisRequest
    {
        Documents = BuildDocuments(item)
    };

    request.AddJsonBody(textAnalysis);

    return Execute(request);
}

The first thing it does is build a web request using the RestSharp library available via Nuget, this pulls things like the Text Analysis base url, API keys and key phrase analysis endpoint from include file settings. It then goes on to build an object (TextAnalysisRequest) that can be serialized to Json in the format that Azure Text analysis expects and contains the content we want to analyse, which is a combination of item level field values and selected field values on select renderings added to the final layout of the item (components). In this example, each component is sent as a separate document:


protected virtual List BuildDocuments(Item item)
{
    var documents = new List
    {
        BuildDocument(item)
    };

    documents.AddRange(item.ExtractRenderings(new List { ID.Parse("{A2D485F5-F114-402E-BA36-25216401CD6A}"), ID.Parse("{41104F41-7136-4A18-B673-6D755566FA73}") }).Select(BuildDocument));

    return documents;
}

protected virtual TextAnalysisDocument BuildDocument(Item item)
{
    return new TextAnalysisDocument
    {
        Id = item.ID.ToString(),
        Language = "en",
        Text = ItemToText(item, new Dictionary
        {
            { ID.Parse("{F9901CAC-A69D-47B2-85BA-C210C357C252}"), new List { ID.Parse("{AD3BB97D-5816-4A06-A0F5-EF899C6C6766}"), ID.Parse("{17B05F6B-2E3D-4F3D-B224-C8E7BFC7656F}") } },
            { ID.Parse("{3B02F9D9-8E2C-4136-BEE8-57F6295E956D}"), new List { ID.Parse("{FBAB79F7-2805-4AB9-B869-96CFC31F061D}") } },
            { ID.Parse("{E5B6503F-C4D5-4458-8066-17C11115C871}"), new List { ID.Parse("{93A5C271-9687-463E-95B2-FF5F2485FC5C}") } }
        })
    };
}   

protected virtual string ItemToText(Item item, IDictionary templateToFieldMappings)
{
    var sb = new StringBuilder();

    if (templateToFieldMappings.ContainsKey(item.TemplateID))
    {
        foreach (var fieldId in templateToFieldMappings[item.TemplateID])
        {
            sb.Append(item[fieldId] + " ");
        }
    }

    var rawText = sb.ToString();

    if (string.IsNullOrWhiteSpace(rawText))
        return string.Empty;

    // azure text analysis currently only allows up to 5000 characters per analysed document

    return TextUtil.StripHtml(rawText.Length < 5000 ? rawText : rawText.Substring(0, 5000));
}

There's a bunch of hard coded GUIDs in here (NEVER DO THIS!!!), which is partly why this code's not on GitHub yet but I will sort that out soon and make the whole thing available to clone/download, it's Helix architecture based so should be OK to include it any solution. The concept is fairly straight forward, given the page item being published and the data source items associated with select renderings on the page item (using the ExtractRenderings extension method),  get the fields that map to the items template (the fields we want to send for analysis), concatenate them together, strip out the HTML and return the text up-to 5000 characters.

So now every page that gets published is being sent off to Cognitive Services and its machine learning based algorithms to analyse the content and automatically tag items. This data can then be used to drive personalisation, automatic profiling (see my Profile Mapping module) or in any way you see fit for your needs.

I haven’t spent a whole bunch of time looking at it but Sitecore 9.1 comes with an out of the box integration with Open Calais , which includes content tagging abilities. As soon as I get some time I will look deeper into that and report back with findings.

Adventures in Sitecore Partial Html Cache Clearing

Sitecore offers us the HtmlCacheClearer and the HtmlCacheClearAgent. The former will clear the entire HTML cache for sites defined within the handlers sites property at the end of a publish, the latter is disabled by default in the scheduler but if enabled will clear the entire HTML cache for every site hosted in the solution based on the interval you specify.

Neither of these are particularly desirable. Clearing the entire HTML cache at the end of every publish can massively reduce the performance boost you can gain from using HTML cache if publishing is a regular activity. It also clears caches for every site configured regardless if there were changes to the sites content or not.

Clearing the entire HTML cache for every site on a schedule has some advantages in that we can hold onto the cache for longer (based on the interval you set, which can be loosely tied to acceptable delays in seeing content changes go live) but again it’s indiscriminate in that it clears the cache whether the content has changed or not. Scheduled clearance can also create a “hill effect” in CPU usage on your content delivery servers.

CPU-Hills

The spikes you can see above are a direct consequence of setting the HTML cache to clear every 30 minutes for a site that makes heavy use of HTML cache. This can be compounded if you have multiple content delivery servers all clearing at the same time and it’s extremely difficult (perhaps impossible) to synchronise the interval based scheduler across delivery servers to try and offset them so this solution also does not scale very well at all.

And what about deployments, if we go down to a single delivery server while upgrading the other the spikes after cache clearance can get pretty uncomfortable.

Well the good news is we can get rid of this problem by extending Sitecore so that it only clears the HTML cache for the sections of the site we specify, within the sites we specify and when we specify.

First I’ll point out how to access the HTML cache for a given site:

// skipping null checks for brevity
var cache = Factory.GetSiteInfo("siteName").HtmlCache;

HtmlCache inherits from Sitecore.Caching.CustomCache and if you’ve ever created your own cache before by inheriting from CustomCache you will know that you need to generate a cache key to add an entry.

Sitecore generates this cache key for every rendering in the mvc.renderRendering pipeline and GenerateCacheKey processor and the cache key is a concatenation of (depending on the caching settings on the rendering item):

  • Controller name
  • MVC area
  • Data source path
  • Language
  • Querystring
  • Rendering parameters
  • Device

And so on. If you want to see all of the keys that are currently in the HTML cache to understand the format you can call:


var keys = cache.InnerCache.GetCacheKeys();

If you want to customize the way cache keys are generated or add additional context to the cache keys you can replace Sitecore’s GenerateCacheKey processor with your own. For example if you are working with dynamic pages (or virtual urls) and want to be able to cache the renderings on these pages even though they don’t have a traditional data source to vary by, you could do something like:


public class GenerateDynamicCacheKey : GenerateCacheKey
{
    protected override string GenerateKey(Rendering rendering, RenderRenderingArgs args)
    {
        var key = base.GenerateKey(rendering, args);

        // get a dynamic identifier i.e. product id

        return AppendProductId(key);
    }
}

Now we can set renderings to cacheable on dynamic pages which vary by the product being displayed.

This is where things get interesting. As well as being able to clear the entire HTML cache for a site by calling cache.Clear()CustomCache, which HtmlCache inherits from exposes two additional methods:

  1. public virtual void RemoveKeysContaining(string value);
  2. public virtual void RemovePrefix(string prefix);

The names of these methods should make it fairly self-explanatory what they do and if we think back to the example of appending the product id to the end of the cache key well now we’re in a position where we could add a handler to the publish:end:remote event that extracts the item just published, check to see if it’s a product and if so remove the HTML cache entries with keys containing the product id:

protected void OnPublishEnd(object sender, EventArgs args)
{
    // null checks skipped for brevity
    var item = (Event.ExtractParameter(args, 0) as Publisher).Options.RootItem;

    if (item.IsProduct()) // extension method
    {
        Factory.GetSiteInfo("siteName").HtmlCache.RemoveKeysContaining(item["Product Id"]);
    }
}

So now we can hold on to the precious HTML cache for all of our product pages and only clear the cache for individual products when they’re updated.

Conversely we may still want to clear all non-product related caches on a schedule. First let’s take a look at what RemoveKeysContaining actually does. Well it calls InnerCache.RemoveKeysContaining(value). InnerCache is a property that implements ICache, which turns out to be Sitecore.Caching.Cache, which in turns inherits from Cache. The RemoveKeysContaining method in Sitecore.Caching.Cache looks like this:


public void RemoveKeysContaining(string keyPart)
{
    base.Remove((string key) => key.IndexOf(keyPart, StringComparison.InvariantCultureIgnoreCase) > -1);
}

So essentially RemoveKeysContaining is really a helper method on the Cache class, which internally creates a predicate that gets passed into the Remove method of the generic cache class it inherits from.

Ok, great, so this means we can create our own predicate. Well how about when we generate our own cache keys as described earlier we add an identifier to all product related caches $”{productId}.product“.

This would enable us to do something like the following in our scheduled, non-product related cache clearer:

// agent registered in the scheduling section of the Sitecore config
// siteName(s) to clear could be defined in the agent config
public void Run
{
    var cache = Factory.GetSiteInfo("siteName").HtmlCache;
cache.InnerCache.Remove((string key) => !key.EndsWith(".product"));
}

That simple. So now we have individual product page caches clearing only when the product is modified and non-product related caches clearing based on an interval.

Hopefully it’s evident now that using a combination of:

  1. Customised HTML cache keys
  2. Custom predicates for removing specific entries based on their key

You can really begin to fine tune your HTML cache clearing strategy to to clear entries only when it’s absolutely necessary.

Happy optimization!

Sitecore Publishing Service Field Casing

There is a bug confirmed by Sitecore support for version 2.2.1 (revision 180807) of the publishing service, where changes to fields where the only change is the casing of the field value are not promoted to publishing targets. This is a bit of an edge case but it can manifest itself if for example you are storing class/assembly names and newing up objects using Sitecore’s ReflectionUtil:

var instance = ReflectionUtil.CreateObject(
                item["Assembly"],
                item["Class"],
                new object[] { }) as IMyInterface;

Now let’s say someone went against class naming conventions and created a class called JSONParser and then down the line that class got renamed to JsonParser, well the publishing service would not promote that change to your publishing targets and object instantiation on the delivery servers would begin to fail.

The issue is due to the collation of the database (Latin1_General_CI_AS).

They did supply a working patch that is pretty straight forward to apply, you only need to modify the publishing service itself (1 dll, 1 xml config). Support ticket number 290996.

5 Ways to Learn Sitecore Development

Even though Sitecore Symposium was 3 weeks ago I’m still processing the experience and the takeaways I got from the event, every year is always different in this regard. One of the things that struck me this year was the broad levels of experience within the development community. Many of us have been doing this for over 10 years, some of us 15+ but as the platform continues to grow and thrive there’s a whole wave of developers relatively new to the platform coming through. Which basically inspired this back to basics guide on what I find to be the best ways to grow your Sitecore development skills. If you’re getting started or looking to upgrade those skills, I hope this helps!

1. Learn from Sitecore

So basically Sitecore is built on Sitecore, which means chances are the customisation/extension that you are trying to build already has solid examples in terms of technical approach and best practices that Sitecore are using themselves. This is true of:

  • Pipelines
  • Event handlers
  • Scheduled tasks
  • DI patterns
  • Ribbon buttons
  • Workflow actions
  • Patch config files
  • Rules engine conditions and actions

This is certainly not an exhaustive list and of course this means that a decompiler is your friend. Planning on extending a pipeline with a custom processor? Take a look at an existing processor in the pipeline, what is it doing? What is it NOT doing? What kind of properties are available on the pipeline args?

2. Patch All The Things

Or to put it another way, know what you changed and have an easy way to turn it off if you need to. It can’t be stressed enough that any modifications you make to Sitecore should be applied to Sitecore using patch files. This has huge benefits in terms of troubleshooting, change control, environment specific transforms, transparency, upgrades and so on. There are plenty of resources out there that break down Sitecore’s patching syntax. Why is this about learning Sitecore development? Because starting off with best practices is a lot easier than “un-learning” bad practices and modifying Sitecore’s configuration files directly is up there with worst of bad practices.

3. Master Sitecore YouTube Channel

We all learn in different ways, some people are visual, some prefer to read and of course some prefer to digest video content. There are so many great ways to consume Sitecore training materials in video format including:

  • The Sitecore Virtual User Group Conference
  • Live streams of user groups from all over the globe
  • Sitecore’s official YouTube channel

I single out Master Sitecore because it seems increasingly that video content from other sources is being aggregated on this channel, for example live recordings from this years SUGCON in Berlin. I highly recommend you check out some of the material on here as it’s contributed from a wide range of sources in the community.

4. Sitecore Documentation

Ok, this might seem like an obvious thing to mention BUT Sitecore have come on leaps and bounds in the last few years in the quality of the documentation they provide and we all acknowledge the awesome work being done by Martina Welander to make documentation a first class citizen at Sitecore. What’s interesting is that as well as the general documentation that’s available on doc.sitecore.net there are “documentation portals” for specialised subjects such as:

And the information provided in those is definately well worth a read. We have also recently had the promise from Sitecore that the move to two releases a year means that every release will be a “complete” release, which includes full documentation.

One other thing to pay some attention to from a documentation perspective is the release notes on dev.sitecore.net for the core CMS and optional modules, this in combination with the known issues sections can provide some much needed insight to what you can expect from the platform.

5. The Sitecore Community!

Last but by no means least, if you haven’t noticed already Sitecore has an incredible network of community contributors all over the globe on various channels:

  • Individual blogs
  • Sitecore Stack Exchange
  • Twitter (#Sitecore, #SitecoreUG, #SitecoreSym, #SitecoreMVP…)
  • Open source projects
  • Slack
  • User groups
  • Conferences

The insights being shared by the community are invaluable, get involved! One way I found to get involved in the community initially was to think of parts of Sitecore that I wanted to learn more about and build a small module that does something useful and uses those parts of the platform… and then.. well, share it. It’s a good confidence booster, which can also launch you into things like speaking at user groups. The community is always looking to hear from new speakers (as well as the familiar faces).

Happy Sitecore… ing.