Sitecore and Azure Durable Functions

In this post I will show how Azure Durable functions can complement your sitecore solution and help enhance performance.

Problem

We took over a Sitecore solution and its content management server was running very slowly and intermittently the sitecore client would be unresponsive and crash

The problem was caused a number a lot of CPU/Data/Bandwidth intensive schedule tasks that were running to retrieve a wide range of data from a number of web services, then aggregate the data and perform complicated calculations, of which a small sub set of the result were presented on the website.

Solution

As the solution was already hosted in Azure, a perfect solution was to off load the heavy lifting from the Content Management server to Azure Functions, to do the data retrieval, calculations and provide the results for the website. Firstly, a very brief overview of the pro’s and con’s of Azure functions.

Pro’s

  • Serverless execution model
  • Dynamic Scaling
  • Micro pricing
  • Security
  • Wide range of triggers
    • Https, Timer (CRON), Azure storage changes, Azure Queue, Message from Service bus, etc.

Con’s

  • Stateless
  • Execution time limit (default 5 mins, max 10)
  • Concurrency

The main challenge with Azure functions is that most of the schedule tasks could take more than 10 minutes to complete and require state management. But not to worry as Azure Durable Functions came to the rescue.

Azure Durable Functions

Durable Functions are an extension of Azure Functions and Azure WebJobs that lets you write stateful functions in a serverless environment. The extension manages state, checkpoints, and restarts for you, so it is possible to implement code that run for a long time.

In addition if an Azure function fails, for example the web request times out, you can define if the durable function should wait and retry X times, before failing. Behind the scenes, the Durable Functions extension is built on top of the Durable Task Framework, an open-source library on GitHub for building durable task orchestrations.

Advantages of Durable Functions

  • They define workflows in code. No JSON schemas or designers are needed.
  • They can call other functions either synchronously or asynchronously.
  • Output from called functions can be saved to local variables.
  • They automatically checkpoint their progress whenever the function awaits.
  • Local state is never lost, even if the process recycles or the VM reboots.
  • Easy to Unit Test
  • Can run for a very long time, in theory forever
  • Cost effective, as you do not pay for execution time whilst waiting for tasks to complete.

Here is a brief introduction to the most common Durable Functions patterns

Pattern 1 – Function chaining

Function chaining refers to the pattern of executing a sequence of functions in a particular order. Often the output of one function needs to be applied to the input of another function.

function chaining

The code below is an example of how you would achieve this

chaining code

Pattern 2 – Fan-out/fan-in

Fan-out/fan-in refers to the pattern of executing multiple functions in parallel, and then waiting for all to finish. Often some aggregation work is done on results returned from the functions. This is perfect when you want to do a lot of things in parallel, to reduce the time taken to complete the task and then aggregate/process all the results.

Below is an example of how the code could look

Pattern 3 – Monitoring

The monitor pattern refers to a flexible recurring process in a workflow – for example, polling until certain conditions are met. A regular timer-trigger can address a simple scenario, such as a periodic clean-up job, but its interval is static and managing instance lifetimes becomes complex. Durable Functions enables flexible recurrence intervals, task lifetime management, and the ability to create multiple monitor processes from a single orchestration.

An example could be instead of exposing an endpoint for an external client to monitor a long-running operation, the long-running monitor consumes an external endpoint, waiting for some state change. See the example below.

Is Replaying

One thing that catches people out is that the code is re-run from the start of the function after each await completes, therefore for example with Logging and other code you need to check for IsReplaying so you only log once.

Durable Functions – Orchestrator code constraints

There are a number code constraints, that must be adhered to when using Durable function orchestration.

  • Code must be deterministic.
    • It will be replayed multiple times and must produce the same result each time.
    • For example, no direct calls to get the current date/time, get random numbers, generate random GUIDs, or call into remote endpoints.
  • Non-deterministic operations must be done in activity functions
    • This includes any interaction with other input or output bindings. This ensures that any non-deterministic values will be generated once on the first execution and saved into the execution history. Subsequent executions will then use the saved value automatically.
  • Orchestrator code should be non-blocking.
    • For example, that means no I/O and no calls to Thread.Sleep or equivalent APIs
    • Orchestrator code must never initiate any async operation, except by using the IIDurableOrchestrationContext API.
    • For example, no Task.Run, Task.Delay or HttpClient.SendAsync.
    • The Durable Task Framework executes orchestrator code on a single thread and cannot interact with any other threads that could be scheduled by other async APIs.
  • Infinite loops should be avoided
    • Because the Durable Task Framework saves execution history as the orchestration function progresses, an infinite loop could cause an orchestrator instance to run out of memory.
    • For infinite loop scenarios, use APIs such as ContinueAsNew to restart the function execution and discard previous execution history.

Result

By migrating all the long running CPU/data/bandwidth intensive tasks to Azure Durable Functions, the performance of the Sitecore solution went from painful to fantastic.

Unfortunately it is very common that Sitecore solutions assume responsibility for task that are not the websites responsibility, but pairing with Azure functions can help mitigate this issue.

An additional benefit was that the website was isolated/protected from 3rd party system changes, as when an external system changes only the Azure functions had to be modified and deployed – therefore no down time for the sitecore solution.

Anyway I hope sitecore develops will consider Azure functions to enhance their sitecore solutions.

 

Swagger – An item with the same key has already been added. Key: 400

We had a solution that used azure functions provide a rest API, to help reduce the load on the sitecore content delivery servers.

We used AzureExtensions.Swashbuckle  which makes it so easy to add swagger documentation to Azure functions.

Problem

Suddenly the swagger UI no longer worked and we got the An item with the same key has already been added. Key: 400 ” exception.

After going through all the commits, one by one I could not see how the code changes could introduce this error?

Solution

Until I noticed that there where 2 ProducesResponseType attributes with the same status code (i.e. 400).

It was soo obvious when I noticed it, but when I look at C# code for changes I guess I ignore comments and attributes.

Therefore I hope by drawing attention to my stupid mistake I can help others that experience this issue, Alan

hack attibute

Hack Attribute to the Rescue

The challenge

At some point in a web project a underlying system, rest API, etc. will not be available and you must make some hacks and or fake code to enable development to continue.

But how do you ensure that the hacks and or fake code never make it into production? Typically  TODO’s, comments, PBI’s etc. are used, but to be honest I have never liked doing that.

Especially in this case as the OAuth Authorization flow was not available, so I had to fake the authentication.

Solution

Introducing the [Hack] custom attribute. It adds the ability to add the hack attribute to classes, properties, and methods.

  • When the code is compiled to Debug–  it generates a warning,
    • So the code compiles and can be deployed
  • When the code is compiled to Release – it generates an errorSo the code cannot compile, and can’t be deployed to production.

In out setup for local development, continues integration and our internal development & test server we build for debug. For pre-production & production we build for release.

Therefore I could relax knowing that the hacks could not make it into production.

The Code

namespace Foundation.Diagnostics.Infrastructure
{
#if !DEBUG
[Obsolete("Hack code is still present", true)]
#endif
#if DEBUG
[Obsolete("Hack code is still present")]
#endif

[AttributeUsage(AttributeTargets.All)]
public class HackAttribute : Attribute
{
public HackAttribute(string message)
{
Message = message;
}

public string Message { get; }
}
}

It was actually quite simple to achieve by using the Obsolete attribute and the using preprocessor directives, to control if the Obsolete attribute should couase a warning or error, depending if the build was debug or release.

Hope this helps, Alan

Name Value List Field

Name Value List – To the Rescue

Challenge

To provide a content API that depending on the path and language would return a different set of key value pairs. The client wanted the ability to define new keys without changing the template and or introducing new templates.

Typical Sitecore solution

Introduce a content item that could have any number of “Key Value” sub items which had a key and value field.

Unfortunately, the customer wanted to have as flat a structure as possible and different languages would have different keys.

Solution – Name Value List Field

I was surprised that I had never noticed that Sitecore has a field type called Name Value List.

The Name Value list field provides a key/value pair interface where you can add pairs dynamically, see below.

Name Value List Field

How to use with Synthesis

It is easy as the field is mapped to a IDictionaryField interface which provides basic functionality for working with Key/Values out of the box.

If you need some more advance features you can cast it to a DictionaryField, which is the underlying implementation.

How to use with vanilla Sitecore

The values are stored as query string, see image below.

So then you can use use Sitecore.Web.WebUtil.ParseUrlParameters to convert the raw value to a NameValueCollection to access the key/value pairs.

I was shocked that after working with sitecore I had missed this field (well maybe I never needed it) but at any rate I hope this blog post will help, Alan

 

Untangling the Sitecore Search LINQ to SolR queries

Problem

It can be very difficult to identify why you do not get the search results you expected from Sitecore Search, but there is a simple way to help untangle what is going on.

Solution

It is possible to see the query that Sitecore generates and sends to SolR and then use the query on the SolR instance to see what data is returned to Sitecore.

This is such a huge help when trying to understand why your queries do not work!

Step 1 – Find the Query that was sent to SolR from Sitecore

Sitecore logs all the queries it sends to SolR in the standard sitecore log folder, look for files named Search.log.xxx.yyy.txt .

Step 2 – Execute the query in your SolR instance

Go to your Solr instance, and use the core selector drop down to select the index your Sitecore Search query is being executed against.

Select Query, from the menu

Then paste the query from the sitecore log, and you can see the result that is returned to Sitecore.

This has helped me a lot, so I hope this helps others untangling their search results using Sitecore Search 🙂

 

 

 

How IQueryable and Take can kill your Sitecore Solution

We had a solution that had serve performance issue when it got a lot of visitors. Sitecore was casting the following exception and SolR had a similar errors in its logs:

Unable to read data from the transport connection: The connection was closed

We identified that the problem was caused by hitting the network bandwidth in Azure!
Yes, there were a lot of visitors, but enough to hit the bandwidth limit, the customer upgraded the plan to get more network bandwidth, but still the issues continued.

But what could cause this issue?

I started to review the SolR implementation and found the issue quite quickly.

return IQueryable<Result>
            .Where(result => result.Date < DateTime.UtcNow)
            .OrderByDescending(result => result.Date)
            .GetResults()
	    .Take(count)
            .ToList();

The Take() was made after GetResults() was called, so the entire data set is returned to Sitecore from SolR, then the take was applied to get the top 5 results.

This simple mistake was what caused all the network and performance issues.

Solution

return IQueryable<Result>
            .Where(result => result.Date < DateTime.UtcNow)
            .OrderByDescending(result => result.Date)
	    .Take(count)
            .GetResults()
            .ToList();

It was a simple fix (in 150+ places) to move the Take before GetResults!

This is why I believe that you should always Introduce a (SolR) Sitecore Search Abstraction, please read my post on this very subject, instead of returning the IQueryable interface.

Hope this helps, Alan

 

Reduce Technical Debt Part 3 – Test driven code and PBI tasks

In this blog post I am going to outline how test-driven design/unit test and having a code removal task for each Product Backlog Item (PBI) can help reduce technical debt.

If you have not already done so, please read part 1 and part 2 in this series on reducing technical debt, as they set the scene for what this blog post is trying to address.

Test-driven design/unit test

It is a fact that all developers (myself included) tend ignore and not correct/change comments when modifying code. Therefore, test driven code will reduce technical debt!

Unit tests decoratively define what code should do and are especially useful in describing exceptions, that would otherwise lead to miss understandings.

I could go on all day about the virtues of test drive design! But I am only going to focus on how it can help reduce technical debt by describing exceptions and or confusing code.

Unit Test for exceptions and or confusing code

We have all come across code where we think WTF! and then spend hours refactoring and or trying to determine why it does what it does.

For example, In France certain types of furniture have a different VAT rate a Chaiselong and a sofa have a different VAT rate.
Therefore, having a test call EnsureChaiselongAndSofaHasdifferentVatRateInFrance will help explain/document why this complexity and or strange functionality is in the code.

Now whilst it does not directly reduce the technical debt and or code size, it helps explain this code and therefore reduce the cost to maintain the code.

Having a test that confirms the strange code is in fact correct and required and why it is required has value and will reduce maintenance code and future bugs.

How do you ensure that each PBI raises the quality of the code?

Ensure that there is a code removal task for each Product Backlog Item.

This ensures that everyone involved in the project is aware that it takes time to identify and then remove redundant code and it is an essential part of all new development/modifications.

There should always be a task defined for every PBI even where it is 100% new functionality and there is defiantly no redundant code to be removed.

Therefore, the premise is that the team must prove and establish for every PBI that there is no redundant code.

 

Reduce Technical Debt Part 2 – Empty Try Catch

Here is a the second in the series on how to reduce Technical Debt, please read part one as it gives an insight into the scale and challenges we faced, and outlines what this blog post is trying to address.

As you are aware the first part introduced a few code examples to help remove redundant code, this blog will continue to focus on how to remove redundant code by introducing the EmptyTryCatchService class and the IgnoreEmptyTryCatch Custom attribute .

But before that I just briefly want to mention integrations, in my experience this is where a lot of redundant and or unnecessary code can hide.

Integrations

Therefore, an important concept to reduce technical debt, is to identify, separate and isolate dependencies on external systems, especially complex and or legacy systems.

I have already written a blog series about this, so if you missed please read it.

Integrations Platform

I believe in an ideal world, most integrations and especially complex and or legacy system specific code should be move out of the website solution to an integration platform!

Most issues, difficulties, problems and cost relating to code maintenance and technical debt for website is due to being responsible for stuff they should not be.

For example, the website is responsible for aggregation data from several systems to provide a unified view of their data, NO this is the job of an Integrations/aggregation platform

Empty Try Catch

So, let me start by stating – ignoring exceptions is a bad idea, because you are silently swallowing an error condition and then continuing execution.

Occasionally this may be the right thing to do, but often it’s a sign that a developer saw an exception, didn’t know what to do about it, and so used an empty catch to silence the problem.

It’s the programming equivalent of putting black tape over an engine warning light.

It’s best to handle exceptions as close as possible to the source, because the closer you are, the more context you have to achieve doing something useful with the exception.

Ignore Empty Try Catch – Custom attribute

In some rare cases the empty try catch can be valid, in which case you can use the custom attribute to mark the function and explain why it is OK, and check one last time is there not a TryParse version of the function and or code you are calling.

Performance

Slightly off topic, but still a type of technical debt, do not use exceptions for program flow!

Throwing exceptions is very expensive (must dump the registries, call stack, etc. and whilst doing this it blocks all threads) so it has a big impact on performance.

I have seen sites brought to their knees because of the number exceptions being thrown.

Redundant Code

In the solution we took over there were over 300 empty try-catch statements ☹

But how can it hide redundant code?

When an exception is thrown it can jump over lots of code, which is therefore never called.

Therefore, all the code after the exception is redundant.

Below is the classic Hello World program it works as expected, it prints out “Hello World”.

But there is a lot of technical debt, now this might look like a funny example, but I have seen a lot of similar examples in real world, usually with a lot more code in the try catch, and usually found most often around big complex integrations!

try catch redunant code

Solution – EmptyTryCatchService

For empty try catches I would not recommend you use Sitecore’s standard logging, as it can create enormous log files which is enough to kill your sitecore solution, if the empty try catch is called a lot.

For tracking down empty try catches, it is good to have a dedicated log file and a way to limit the amount of data written to the log file.

EmptyTryCatchService class provides the following features:

  • Report interval – the interval between exceptions with the same owner, name and exception message are written to the log file.
  • Max Log limit – when the number exceptions with the same owner, name and exception message is exceed no more data is written to the log file.
  • Dedicated log file for each day
  • Disable all logging via configuration.

EmptyTryCatchService class is a simple class that, relies on the MaxUsageLog for most of its functionality (see the code below).

In addition to finding redundant code the EmptyTryCatchService will track down hidden errors and problems in your solution, which will result in a reduction of the technical debt.

You must be careful when reviewing the exceptions logged and deciding how best to deal with the exceptions. See part 3 in the series, to reduce technical debt.

public class EnsureIsObsoleteService
{
private readonly MaxUsageLog _maxUsageLog =
new MaxUsageLog(10000, "EnsureIsObsoleteService",1000);
public void EnsureIsObsolete(object owner, string message)
{
_maxUsageLog.Log(owner, message);
}
}
public class MaxUsageLog
{

public MaxUsageLog(int maxLogLimit,
string fileNamePrefix,
int reportCountInterval=1000000)
{
_maxLogLimit = maxLogLimit;
_fileNamePrefix = !string.IsNullOrEmpty(fileNamePrefix) ? fileNamePrefix : "MaxUsageLog";
_reportCountInterval = reportCountInterval;
}

public void Log(object owner, string message, Exception ex = null)
{
if (!IsEnabled())
return;

string type = string.Empty;
if (owner != null)
{
if (owner is Type typeObj)
{
type = typeObj.FullName;
}
else
{
type = owner.GetType().FullName;
}
}
string key = GenerateKey(type, message, ex);
if (!ShouldLog(owner, key))
return;
var count = Count(key);
WriteToFile(owner, type, message, ex, count);
}

private int Count(string key)
{
return Usage.ContainsKey(key) ? Usage[key] : 0;
}

private void WriteToFile(object owner, string type, string message, Exception exceptionToLog, int count)
{
try
{
StreamWriter log = File.Exists(FileName) ? File.AppendText(FileName) : File.CreateText(FileName);
try
{
log.AutoFlush = true;
log.WriteLine($"{DateTime.Now.ToUniversalTime()}: Type:'{type}' Message:'{message}' Count:{count}");
if (exceptionToLog != null)
{
log.WriteLine($"Exception:{exceptionToLog}");
}
log.Close();
}
finally
{
log.Close();
}
}
catch (Exception ex)
{
if (!Sitecore.Configuration.ConfigReader.ConfigutationIsSet)
return;
Sitecore.Diagnostics.Log.Error(
$"Failed writing log file {FileName}. The following text may be missing from the file: Type:{type} Message:{message}",
ex, owner);
}
}
private bool ShouldLog(object owner, string key)
{
if (!Usage.ContainsKey(key))
{
Usage.Add(key, 1);
return true;
}
var count = Usage[key] = Usage[key] + 1;

if (count % _reportCountInterval == 0)
{
WriteToFile(owner, "******** Report Count Interval ******", $"Key:'{key}'", null,count);
}

if (count &gt; _maxLogLimit)
return false;
if (count == _maxLogLimit)
{
WriteToFile(owner, "******** Usage Max Exceeded ******", $"Key:'{key}' Max Limit:{_maxLogLimit}",null,count);
return false;
}
return true;
}
private string GenerateKey(string type, string message, Exception ex)
{
return ex != null ?
$"{_fileNamePrefix}_{type}_{message}_{ex.HResult}" :
$"{_fileNamePrefix}_{type}_{message}";
}

private string FileName
{
get
{
DateTime date = DateTime.Now;
string fileName = $@"\{_fileNamePrefix}.{date.Year}.{date.Month}.{date.Day}.log";

if (!Sitecore.Configuration.ConfigReader.ConfigutationIsSet)
return ConfigurationManager.AppSettings[Constants.Configuration.Key.LogFolderForApplications] + fileName;

return Sitecore.MainUtil.MapPath(Sitecore.Configuration.Settings.LogFolder) + fileName;
}
}

private bool IsEnabled()
{
if (!Sitecore.Configuration.ConfigReader.ConfigutationIsSet)
return StringToBool(ConfigurationManager.AppSettings[Constants.Configuration.Key.MaxUsageLogEnabled],false);

return Sitecore.Configuration.Settings.GetBoolSetting(Constants.Configuration.Key.MaxUsageLogEnabled, true);
}

private bool StringToBool(string value, bool defaultValue)
{
if (value == null)
return defaultValue;
bool result;
if (!bool.TryParse(value, out result))
return defaultValue;
return result;
}

private readonly int _maxLogLimit;
private readonly string _fileNamePrefix;
private readonly int _reportCountInterval;

// this is to ensure we can count how many times a message has been logged across all threads
private static readonly Dictionary&lt;string, int&gt; Usage = new Dictionary&lt;string, int&gt;();
}

Hope this was of help, Alan

 

Reduce Technical Debt and Redundant Code

A while ago, we at Pentia took over a massive Sitecore solution, which after 16 years of upgrades and development the maintenance cost consumed the entire digital budget of the customer.

In other words, the client was at a crossroad – to build new or renovate.

For this client the answer was relatively easy:

  • Firstly, the number of features and functionalities in the platform is vast, and just to scope and specify the entire platform was a massive, if not impossible, undertaking – and one which would claim a large number of resources internally and externally.
  • Secondly, while building a new platform (a massive task), the existing platform would have to be kept alive and slowly (painfully slowly) phased out over time. This means double resources for development, maintenance and operations.
  • Thirdly – and probably the most deterring factor – the change management involved in retraining the thousands of staff involved in and around the platform and across departments was substantial and disruptive to the entire organization.

Therefore, a renovation project was established, and the first task was to reduce technical debt for the solution.

Reducing maintenance cost

One of the best ways to reduce technical debt is to reduce the code base, less code == less maintenance cost. In this case we managed to delete 33% of the code base, here are a few key figures for the solution when we took it over.

  • 900+ sites (over ½ million items)
  • 15 years old (multiple upgrades from Sitecore 4.x to 8.2 and single migration)
  • 15 integrations
  • 600+ Layouts/sub layouts
  • Many JavaScript applications (Angular/React/Backbone/knockout/native/JQuery)
  • Code
    • 294030 lines code
    • Cyclomatic Complexity – main project 9662 average 1200
    • Depth of Inheritance – main project 17 average 8
    • Class Coupling – main project 1400, average 500
  • Single solution multiple roles
    • Content management
    • Content delivery
    • Publishing
    • Utility/API
    • Bot
  • No Access to production (apart from Sitecore client)
  • Manual deploys to Production
  • 2 separate solutions (Intranet & Websites) merged into a single solution 4 years ago
  • Not Helix compliant (sort of n-tier where projects had numbers)

The Challenge

Due to the sheer size of the solution, no one in the client’s organization knew which features were used and how much and no access to. There were many clear indications of code not being used or referred.

So, the initial task was to identify and remove unnecessary parts of the solution.

But how to you identify redundant code?

Visual studio has tools for that, unfortunately Sitecore/web application introduce additional challenges as un-referenced C# code can still be executed due to the following:

  • Configuration – pipelines, event handlers, custom configuration, etc.
  • Sitecore content – items that define that specific functions on a class should be executed i.e. WFFM.
  • Sitecore rendering engine that renders the presentation using web controls, layouts, sub layouts, controllers, code, etc.

In addition, then we must identify if the code used by the following is ever called

  • Layouts
  • Sub Layouts
  • Controllers
  • Web Controls
  • XSLT’s
  • Rest APi’s
  • Soap Web Services

Solution

As in most renovation projects, there is no silver bullet, it requires a longsighted plan, structured methodology, concepts, code, tools and continuous effort to reduce technical debt.

Ironically to reduce the code base you must introduce more code.

Custom Attributes

We introduced several custom attributes to help mark up the code and help identify issues to be address.

  • Obsolete
  • Used
  • Refactor
  • Ignore Empty Try Catch (see part 2)

Used

The point of this attribute is to clearly mark that a loosely referenced class, method or interface is indeed needed by the solution.

In other words, it indicates that a class, method or interface is used, even though it has no references. It’s possible to add a text to explain how and where it is used.

Obsolete

Whilst .net provides the Obsolete custom attribute; there are some missing options to indicate that the code is obsolete, and can be removed when a condition is met:

  • Specific date
  • Specific release is in production
  • 3rd party system is updated to a specific version

The point of this attribute was therefore to allow us to plan the renovation project in stages and remove code when the referring parts were cleaned up.

Refactor

During this project we ran into many pieces of code, classes and structures which were in dire need of refactoring. But because of constraints in time, code not deployed, lack of knowledge, dependencies, multiple version of 3rd party system, or for some other reason it was not possible at that time.

Therefore, the best we could do was add this attribute and define why it should be refactored, and why it hasn’t been refactored.

The purpose of this attribute was therefore documentation and planning of the renovation process

Introducing a “Ensure Code Is Obsolete” Service

It is very difficult to ensure that code is obsolete and is never called and that is why it is so difficult to delete code.

What we needed was a somewhat conclusive measurement if the running code was being executed.

What we decided to do was to introduce code in the solution that collected data on code executed across all running solution instances and aggregated the data and presented the results, to ensure that the code was not required.

The IIncrementCountService interface was introduced to provide the ability to count how often the code is executed and then send the results to be aggregated with the other instances, by the content management server.

public interface IIncrementCountService
{
  bool IncrementCount(Type type,string name);
}

Implementation Challenges

The Content Management, Content Delivery, Utility & API instances are in different network zones without access to each other.

The implementation must have a minimum impact on performance, network traffic, database storage.

Not introduce any new databases and or tables.

As we do not have access to production environment apart from the Sitecore Client, it is not possible log the data the file system.

Sitecore Remote Events

Remote events (see this blog for a good introduction) provide the perfect mechanism to allow all instances to send their counter data to the Content Management service which is responsible for aggregating the data and presenting the results.

You must be careful with events as if you flood the event queue table it can kill the performance of ALL your sitecore instances.

The following configuration was introduced (see my blog post on Type Safe Settings) so the IncrementCount function will only raises an event when one of the following is true:

  • The count exceeds 1000
  • The threshold of 15 minutes is reached
  • A new day starts

This ensures that the event queue is not overloaded and will minimize performance impact, network & database usage.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:environment="http://www.sitecore.net/xmlconfig/environment" xmlns:role="http://www.sitecore.net/xmlconfig/role/"&gt;
	<sitecore>
		<feature>
			<Diagnostics>
				<CounterSettings type="Feature.Diagnostics.Infrastructure.CounterSettings, Feature.Diagnostics" singleInstance="true">
					<ThresholdCount>1000</ThresholdCount>
					<ThresholdTime>15</ThresholdTime>
					<Enabled>true</Enabled>
				</CounterSettings>
			</Diagnostics>
		</feature>
	</sitecore>
</configuration>

The IncrementLocalCountService class is responsible for incrementing the count, caching it locally and raising the event to notify the Content Management server, when one of the afore mention threshold is met.

   public class IncrementLocalCountService : IIncrementCountService
    {
        private readonly IList&lt;Counter&gt; _counters = new List&lt;Counter&gt;();
        private readonly CounterFactory _counterFactory;
        private readonly CounterUpdateRemoteEventFactory _counterUpdateRemoteEventFactory;
        private readonly CounterSettings _counterSettings;

        public IncrementLocalCountService([NotNull]CounterFactory counterFactory,
            [NotNull]CounterUpdateRemoteEventFactory counterUpdateRemoteEventFactory,
            [NotNull]CounterSettings counterSettings)
        {
            Assert.ArgumentNotNull(counterFactory, nameof(counterFactory));
            Assert.ArgumentNotNull(counterUpdateRemoteEventFactory, nameof(counterUpdateRemoteEventFactory));
            Assert.ArgumentNotNull(counterSettings, nameof(counterSettings));
            _counterFactory = counterFactory;
            _counterUpdateRemoteEventFactory = counterUpdateRemoteEventFactory;
            _counterSettings = counterSettings;
        }

        public bool IncrementCount(Type type,string name)
        {
            if (string.IsNullOrWhiteSpace(name))
                return false;
            if (_counterSettings == null || !_counterSettings.Enabled)
                return false;

            DateTime today = DateTime.Now.Date;
            // any from yesterday Flush
            Counter counter = _counters.FirstOrDefault(c =&gt; c.Name == name &amp;&amp; c.Date == today &amp;&amp; c.Type == type);
            if (counter == null)
            {
                counter = _counterFactory.Create(name, today, 0);
                _counters.Add(counter);
            }
            counter.Count++;
            Flush(today);
            return true;
        }

        private void Flush(DateTime today)
        {
            //iterate over all counters, flush that exceed the threshold count or time restriction
            foreach (var counter in GetThresholdExceeded())
            {
                RaiseEvent(counter);
                _counters.Remove(counter);
            }
        }

        private IEnumerable&lt;Counter&gt; GetThresholdExceeded()
        {
            DateTime timeLimit = DateTime.Now.Subtract(new TimeSpan(0, _counterSettings.ThresholdTime, 0));
            return _counters.Where(c =&gt; c.Created &lt; timeLimit || c.Count &gt;= _counterSettings.ThresholdCount).ToList();
        }

        private void RaiseEvent(Counter counter)
        {
            if (counter == null)
                return;
            var counterUpdateRemoteEvent = _counterUpdateRemoteEventFactory.Create(counter.Name, counter.Date, counter.Count);
            Sitecore.Eventing.EventManager.QueueEvent(counterUpdateRemoteEvent,true,true);
        }
    }

Who is responsible for aggregating the results?

The content Management is responsible for aggregating the results. It requires some extra configuration, to register that it will subscribe to handle remote events, raise the event and it then handle the remote event (see blog for more information).

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/" xmlns:role="http://www.sitecore.net/xmlconfig/role/"&gt;
	<sitecore role:require="Standalone OR ContentManagement"&gt;
		<events>
			<event name="counter:update:remote">
				<handler type="Feature.Diagnostics.Infrastructure.Events.Counter.CounterUpdateRemoteEventHandler, Feature.Diagnostics" method="Update" />
			</event>
		</events>
		<pipelines>
			<initialize>
				<processor type="Feature.Diagnostics.Infrastructure.Pipelines.Counter.SubscribeToCounterRemoteEventService, Feature.Diagnostics" />
			</initialize>
		</pipelines>
	</sitecore>
</configuration>

The code associated with the configuration above.

    public class CounterUpdateRemoteEventHandler
    {
        public void Update(object sender, EventArgs args)
        {
            if (args == null)
                return;

            try
            {
                var countRemoteEventArgs = args as RemoteEventArgs<CounterUpdateRemoteEvent>;
                Assert.IsNotNull(countRemoteEventArgs, $"Unexpected event args: {args.GetType().FullName}");
                Assert.IsNotNull(countRemoteEventArgs.Event, $"Event is nul: {args.GetType().FullName}");

                var counterRepository = ServiceLocator.ServiceProvider.GetService<CounterRepository>();
                Assert.IsNotNull(counterRepository, $"Could not resolve type:{typeof(CounterRepository).FullName}");

                var counterFactory = ServiceLocator.ServiceProvider.GetService<CounterFactory>();
                Assert.IsNotNull(counterFactory, $"Could not resolve type:{typeof(CounterFactory).FullName}");

                var @event = countRemoteEventArgs.Event;
                var counter = counterFactory.Create(@event.Name, @event.Date, @event.Count);
                if (counter == null)
                    return;
                counterRepository.Update(counter);
            }
            catch (Exception exception)
            {
                Log.Error($"CounterUpdateRemoteEventHandler.Update - failed", exception);
            }
        }
    }

    public class SubscribeToCounterRemoteEventService
    {
        public void Process(PipelineArgs args)
        {
            Sitecore.Diagnostics.Log.Info("SubscribeToCounterRemoteEventService.Initialize Called",this);
            var action = new Action<CounterUpdateRemoteEvent>(RaiseRemoteEvent);
            EventManager.Subscribe(action);
        }

        public void RaiseRemoteEvent(CounterUpdateRemoteEvent counterUpdateRemoteEvent)
        {
            if (counterUpdateRemoteEvent == null)
                return;
            RemoteEventArgs<CounterUpdateRemoteEvent> remoteEventArgs = new RemoteEventArgs<CounterUpdateRemoteEvent>(counterUpdateRemoteEvent);
            Event.RaiseEvent(counterUpdateRemoteEvent.EventName, remoteEventArgs);
        }
    }

Where is the Data Saved?

Ideally it should be saved in its own SQL database.

Unfortunately, we were not allowed to introduce and new databases and or tables, so we had to use the sitecore IDTable. The CounterRepository is responsible for retrieving, updating and  persisting the counters in the IDTable.

    public class CounterRepository
    {
        private readonly CounterFactory _counterFactory;
        private readonly GenerateKeyService _generateKeyService;

        public CounterRepository([NotNull] CounterFactory counterFactory, 
            [NotNull]GenerateKeyService generateKeyService)
        {
            Assert.ArgumentNotNull(counterFactory, nameof(counterFactory));
            Assert.ArgumentNotNull(generateKeyService, nameof(generateKeyService));
            _counterFactory = counterFactory;
            _generateKeyService = generateKeyService;
        }

        public bool Update([NotNull] Counter counter)
        {
            Assert.ArgumentNotNull(counter, nameof(counter));

            var counterInDatabase = Get(counter.Name, counter.Date);
            if (counterInDatabase == null)
                return Add(counter);
            counter.Count += counterInDatabase.Count;
            Delete(counterInDatabase);
            return Add(counter);
        }

        public IEnumerable<Counter> Get()
        {
            var idTableEntries = IDTable.GetKeys(Constants.IdTable.Prefix);
            return idTableEntries == null ? new List<Counter>() : _counterFactory.Create(idTableEntries);
        }

        private bool Add(Counter counter)
        {
            if (counter == null)
                return false;
            var idTableEntry = IDTable.Add(Constants.IdTable.Prefix,
                _generateKeyService.GenerateKey(counter.Name, counter.Date),new ID(Guid.NewGuid()),
                ID.Null,counter.Count.ToString());
            return idTableEntry != null;
        }

        private void Delete(Counter counter)
        {
            if (counter == null)
                return;

            IDTable.RemoveKey(Constants.IdTable.Prefix, _generateKeyService.GenerateKey(counter.Name, counter.Date));
        }

        private Counter Get(string name, DateTime date)
        {
            if (string.IsNullOrWhiteSpace(name))
                return null;

            var idTableEntry = IDTable.GetID(Constants.IdTable.Prefix, _generateKeyService.GenerateKey(name, date));
            if (idTableEntry == null)
                return null;
            if (!long.TryParse(idTableEntry.CustomData, out var count))
                count = 0;
            return _counterFactory.Create(name, date, count);
        }

      }

Presenting the results

No magic here a simple counter.aspx pages, which reads from the CounterRepository and displays it in a table, with the option to clear the database. Also some code to ensure that only Sitecore administrators can access the page. See Part 2 in the series.