How SQL Index Fragmentation will kill Sitecore’s Performance

Thought I wrote a blog post about this years ago, but apparently I didn’t.

Problem

Poor index maintenance is a major cause of decreased SQL Server performance, which in turn will impact your Sitecore’s performance. The Sitecore databases contains tables with numerous entries, that get updated frequently, therefore high index fragmentation will occur.

Detecting SQL Server index fragmentation

The following script displays the average fragmentation, and as a help generates the SQL query to fix it.

SELECT OBJECT_NAME(ind.OBJECT_ID) AS TableName,
ind.name AS IndexName, indexstats.index_type_desc AS IndexType,
indexstats.avg_fragmentation_in_percent,
'ALTER INDEX ' + QUOTENAME(ind.name) + ' ON ' +QUOTENAME(object_name(ind.object_id)) +
CASE WHEN indexstats.avg_fragmentation_in_percent>30 THEN ' REBUILD '
WHEN indexstats.avg_fragmentation_in_percent>=5 THEN 'REORGANIZE'
ELSE NULL END as [SQLQuery] -- if <5 not required, so no query needed
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats
INNER JOIN sys.indexes ind ON ind.object_id = indexstats.object_id
AND ind.index_id = indexstats.index_id
WHERE
--indexstats.avg_fragmentation_in_percent , e.g. >10, you can specify any number in percent
ind.Name is not null
ORDER BY indexstats.avg_fragmentation_in_percent DESC

Below you can see the typical result of running the script above. I was shocked as the majority of indexes on my local SQL server where over 99%.

Solution

The script above generates the SQL statements needed to defragment the affected indexes, so you can automate the defragmentation process, using SQL Server Maintenance Plans.

Anyway I hope this helps keeping you sitecore solution running at its best, Alan

Sitecore and Azure Durable Functions

1 Reply

In this post I will show how Azure Durable functions can complement your sitecore solution and help enhance performance.

Problem

We took over a Sitecore solution and its content management server was running very slowly and intermittently the sitecore client would be unresponsive and crash

The problem was caused a number a lot of CPU/Data/Bandwidth intensive schedule tasks that were running to retrieve a wide range of data from a number of web services, then aggregate the data and perform complicated calculations, of which a small sub set of the result were presented on the website.

Solution

As the solution was already hosted in Azure, a perfect solution was to off load the heavy lifting from the Content Management server to Azure Functions, to do the data retrieval, calculations and provide the results for the website. Firstly, a very brief overview of the pro’s and con’s of Azure functions.

Pro’s

Serverless execution model
Dynamic Scaling
Micro pricing
Security
Wide range of triggers
- Https, Timer (CRON), Azure storage changes, Azure Queue, Message from Service bus, etc.

Con’s

Stateless
Execution time limit (default 5 mins, max 10)
Concurrency

The main challenge with Azure functions is that most of the schedule tasks could take more than 10 minutes to complete and require state management. But not to worry as Azure Durable Functions came to the rescue.

Azure Durable Functions

Durable Functions are an extension of Azure Functions and Azure WebJobs that lets you write stateful functions in a serverless environment. The extension manages state, checkpoints, and restarts for you, so it is possible to implement code that run for a long time.

In addition if an Azure function fails, for example the web request times out, you can define if the durable function should wait and retry X times, before failing. Behind the scenes, the Durable Functions extension is built on top of the Durable Task Framework, an open-source library on GitHub for building durable task orchestrations.

Advantages of Durable Functions

They define workflows in code. No JSON schemas or designers are needed.
They can call other functions either synchronously or asynchronously.
Output from called functions can be saved to local variables.
They automatically checkpoint their progress whenever the function awaits.
Local state is never lost, even if the process recycles or the VM reboots.
Easy to Unit Test
Can run for a very long time, in theory forever
Cost effective, as you do not pay for execution time whilst waiting for tasks to complete.

Here is a brief introduction to the most common Durable Functions patterns

Pattern 1 – Function chaining

Function chaining refers to the pattern of executing a sequence of functions in a particular order. Often the output of one function needs to be applied to the input of another function.

The code below is an example of how you would achieve this

Pattern 2 – Fan-out/fan-in

Fan-out/fan-in refers to the pattern of executing multiple functions in parallel, and then waiting for all to finish. Often some aggregation work is done on results returned from the functions. This is perfect when you want to do a lot of things in parallel, to reduce the time taken to complete the task and then aggregate/process all the results.

Below is an example of how the code could look

Pattern 3 – Monitoring

The monitor pattern refers to a flexible recurring process in a workflow – for example, polling until certain conditions are met. A regular timer-trigger can address a simple scenario, such as a periodic clean-up job, but its interval is static and managing instance lifetimes becomes complex. Durable Functions enables flexible recurrence intervals, task lifetime management, and the ability to create multiple monitor processes from a single orchestration.

An example could be instead of exposing an endpoint for an external client to monitor a long-running operation, the long-running monitor consumes an external endpoint, waiting for some state change. See the example below.

Is Replaying

One thing that catches people out is that the code is re-run from the start of the function after each await completes, therefore for example with Logging and other code you need to check for IsReplaying so you only log once.

Durable Functions – Orchestrator code constraints

There are a number code constraints, that must be adhered to when using Durable function orchestration.

Code must be deterministic.
- It will be replayed multiple times and must produce the same result each time.
- For example, no direct calls to get the current date/time, get random numbers, generate random GUIDs, or call into remote endpoints.
  - For current time, creating times, random number etc. use the special functions provided by the IDurableOrchestrationContext.
Non-deterministic operations must be done in activity functions
- This includes any interaction with other input or output bindings. This ensures that any non-deterministic values will be generated once on the first execution and saved into the execution history. Subsequent executions will then use the saved value automatically.
Orchestrator code should be non-blocking.
- For example, that means no I/O and no calls to Thread.Sleep or equivalent APIs
- Orchestrator code must never initiate any async operation, except by using the IIDurableOrchestrationContext API.
- For example, no Task.Run, Task.Delay or HttpClient.SendAsync.
- The Durable Task Framework executes orchestrator code on a single thread and cannot interact with any other threads that could be scheduled by other async APIs.
Infinite loops should be avoided
- Because the Durable Task Framework saves execution history as the orchestration function progresses, an infinite loop could cause an orchestrator instance to run out of memory.
- For infinite loop scenarios, use APIs such as ContinueAsNew to restart the function execution and discard previous execution history.

Result

By migrating all the long running CPU/data/bandwidth intensive tasks to Azure Durable Functions, the performance of the Sitecore solution went from painful to fantastic.

Unfortunately it is very common that Sitecore solutions assume responsibility for task that are not the websites responsibility, but pairing with Azure functions can help mitigate this issue.

An additional benefit was that the website was isolated/protected from 3^rd party system changes, as when an external system changes only the Azure functions had to be modified and deployed – therefore no down time for the sitecore solution.

Anyway I hope sitecore develops will consider Azure functions to enhance their sitecore solutions.

Swagger – Description & Remarks not shown

Leave a reply

We used AzureExtensions.Swashbuckle which makes it so easy to add swagger documentation to Azure functions, but we noticed that the description and remarks were missing for a number of functions?

We identified that it was caused by & in remarks section of the comment.

Hope this helps, Alan

Swagger – An item with the same key has already been added. Key: 400

Leave a reply

We had a solution that used azure functions provide a rest API, to help reduce the load on the sitecore content delivery servers.

We used AzureExtensions.Swashbuckle which makes it so easy to add swagger documentation to Azure functions.

Problem

Suddenly the swagger UI no longer worked and we got the “An item with the same key has already been added. Key: 400 ” exception.

After going through all the commits, one by one I could not see how the code changes could introduce this error?

Solution

Until I noticed that there where 2 ProducesResponseType attributes with the same status code (i.e. 400).

It was soo obvious when I noticed it, but when I look at C# code for changes I guess I ignore comments and attributes.

Therefore I hope by drawing attention to my stupid mistake I can help others that experience this issue, Alan

Hack Attribute to the Rescue

2 Replies

The challenge

At some point in a web project a underlying system, rest API, etc. will not be available and you must make some hacks and or fake code to enable development to continue.

But how do you ensure that the hacks and or fake code never make it into production? Typically TODO’s, comments, PBI’s etc. are used, but to be honest I have never liked doing that.

Especially in this case as the OAuth Authorization flow was not available, so I had to fake the authentication.

Solution

Introducing the [Hack] custom attribute. It adds the ability to add the hack attribute to classes, properties, and methods.

When the code is compiled to Debug– it generates a warning,
- So the code compiles and can be deployed
When the code is compiled to Release – it generates an errorSo the code cannot compile, and can’t be deployed to production.

In out setup for local development, continues integration and our internal development & test server we build for debug. For pre-production & production we build for release.

Therefore I could relax knowing that the hacks could not make it into production.

The Code

namespace Foundation.Diagnostics.Infrastructure
{
#if !DEBUG
[Obsolete("Hack code is still present", true)]
#endif
#if DEBUG
[Obsolete("Hack code is still present")]
#endif

[AttributeUsage(AttributeTargets.All)]
public class HackAttribute : Attribute
{
public HackAttribute(string message)
{
Message = message;
}

public string Message { get; }
}
}

It was actually quite simple to achieve by using the Obsolete attribute and the using preprocessor directives, to control if the Obsolete attribute should couase a warning or error, depending if the build was debug or release.

Hope this helps, Alan

Name Value List – To the Rescue

Leave a reply

Challenge

To provide a content API that depending on the path and language would return a different set of key value pairs. The client wanted the ability to define new keys without changing the template and or introducing new templates.

Typical Sitecore solution

Introduce a content item that could have any number of “Key Value” sub items which had a key and value field.

Unfortunately, the customer wanted to have as flat a structure as possible and different languages would have different keys.

Solution – Name Value List Field

I was surprised that I had never noticed that Sitecore has a field type called Name Value List.

The Name Value list field provides a key/value pair interface where you can add pairs dynamically, see below.

How to use with Synthesis

It is easy as the field is mapped to a IDictionaryField interface which provides basic functionality for working with Key/Values out of the box.

If you need some more advance features you can cast it to a DictionaryField, which is the underlying implementation.

How to use with vanilla Sitecore

The values are stored as query string, see image below.

So then you can use use Sitecore.Web.WebUtil.ParseUrlParameters to convert the raw value to a NameValueCollection to access the key/value pairs.

I was shocked that after working with sitecore I had missed this field (well maybe I never needed it) but at any rate I hope this blog post will help, Alan

Untangling the Sitecore Search LINQ to SolR queries

2 Replies

Problem

It can be very difficult to identify why you do not get the search results you expected from Sitecore Search, but there is a simple way to help untangle what is going on.

Solution

It is possible to see the query that Sitecore generates and sends to SolR and then use the query on the SolR instance to see what data is returned to Sitecore.

This is such a huge help when trying to understand why your queries do not work!

Step 1 – Find the Query that was sent to SolR from Sitecore

Sitecore logs all the queries it sends to SolR in the standard sitecore log folder, look for files named Search.log.xxx.yyy.txt .

Step 2 – Execute the query in your SolR instance

Go to your Solr instance, and use the core selector drop down to select the index your Sitecore Search query is being executed against.

Select Query, from the menu

Then paste the query from the sitecore log, and you can see the result that is returned to Sitecore.

This has helped me a lot, so I hope this helps others untangling their search results using Sitecore Search 🙂

How IQueryable and Take can kill your Sitecore Solution

Leave a reply

We had a solution that had serve performance issue when it got a lot of visitors. Sitecore was casting the following exception and SolR had a similar errors in its logs:

Unable to read data from the transport connection: The connection was closed

We identified that the problem was caused by hitting the network bandwidth in Azure!
Yes, there were a lot of visitors, but enough to hit the bandwidth limit, the customer upgraded the plan to get more network bandwidth, but still the issues continued.

But what could cause this issue?

I started to review the SolR implementation and found the issue quite quickly.

return IQueryable<Result>
            .Where(result => result.Date < DateTime.UtcNow)
            .OrderByDescending(result => result.Date)
            .GetResults()
	    .Take(count)
            .ToList();

The Take() was made after GetResults() was called, so the entire data set is returned to Sitecore from SolR, then the take was applied to get the top 5 results.

This simple mistake was what caused all the network and performance issues.

Solution

return IQueryable<Result>
            .Where(result => result.Date < DateTime.UtcNow)
            .OrderByDescending(result => result.Date)
	    .Take(count)
            .GetResults()
            .ToList();

It was a simple fix (in 150+ places) to move the Take before GetResults!

This is why I believe that you should always Introduce a (SolR) Sitecore Search Abstraction, please read my post on this very subject, instead of returning the IQueryable interface.

Hope this helps, Alan

Reduce Technical Debt Part 3 – Test driven code and PBI tasks

1 Reply

In this blog post I am going to outline how test-driven design/unit test and having a code removal task for each Product Backlog Item (PBI) can help reduce technical debt.

If you have not already done so, please read part 1 and part 2 in this series on reducing technical debt, as they set the scene for what this blog post is trying to address.

Test-driven design/unit test

It is a fact that all developers (myself included) tend ignore and not correct/change comments when modifying code. Therefore, test driven code will reduce technical debt!

Unit tests decoratively define what code should do and are especially useful in describing exceptions, that would otherwise lead to miss understandings.

I could go on all day about the virtues of test drive design! But I am only going to focus on how it can help reduce technical debt by describing exceptions and or confusing code.

Unit Test for exceptions and or confusing code

We have all come across code where we think WTF! and then spend hours refactoring and or trying to determine why it does what it does.

For example, In France certain types of furniture have a different VAT rate a Chaiselong and a sofa have a different VAT rate.
Therefore, having a test call EnsureChaiselongAndSofaHasdifferentVatRateInFrance will help explain/document why this complexity and or strange functionality is in the code.

Now whilst it does not directly reduce the technical debt and or code size, it helps explain this code and therefore reduce the cost to maintain the code.

Having a test that confirms the strange code is in fact correct and required and why it is required has value and will reduce maintenance code and future bugs.

How do you ensure that each PBI raises the quality of the code?

Ensure that there is a code removal task for each Product Backlog Item.

This ensures that everyone involved in the project is aware that it takes time to identify and then remove redundant code and it is an essential part of all new development/modifications.

There should always be a task defined for every PBI even where it is 100% new functionality and there is defiantly no redundant code to be removed.

Therefore, the premise is that the team must prove and establish for every PBI that there is no redundant code.

Reduce Technical Debt Part 2 – Empty Try Catch

2 Replies

Here is a the second in the series on how to reduce Technical Debt, please read part one as it gives an insight into the scale and challenges we faced, and outlines what this blog post is trying to address.

As you are aware the first part introduced a few code examples to help remove redundant code, this blog will continue to focus on how to remove redundant code by introducing the EmptyTryCatchService class and the IgnoreEmptyTryCatch Custom attribute .

But before that I just briefly want to mention integrations, in my experience this is where a lot of redundant and or unnecessary code can hide.

Integrations

Therefore, an important concept to reduce technical debt, is to identify, separate and isolate dependencies on external systems, especially complex and or legacy systems.

I have already written a blog series about this, so if you missed please read it.

Integrations Platform

I believe in an ideal world, most integrations and especially complex and or legacy system specific code should be move out of the website solution to an integration platform!

Most issues, difficulties, problems and cost relating to code maintenance and technical debt for website is due to being responsible for stuff they should not be.

For example, the website is responsible for aggregation data from several systems to provide a unified view of their data, NO this is the job of an Integrations/aggregation platform

Empty Try Catch

So, let me start by stating – ignoring exceptions is a bad idea, because you are silently swallowing an error condition and then continuing execution.

Occasionally this may be the right thing to do, but often it’s a sign that a developer saw an exception, didn’t know what to do about it, and so used an empty catch to silence the problem.

It’s the programming equivalent of putting black tape over an engine warning light.

It’s best to handle exceptions as close as possible to the source, because the closer you are, the more context you have to achieve doing something useful with the exception.

Ignore Empty Try Catch – Custom attribute

In some rare cases the empty try catch can be valid, in which case you can use the custom attribute to mark the function and explain why it is OK, and check one last time is there not a TryParse version of the function and or code you are calling.

Performance

Slightly off topic, but still a type of technical debt, do not use exceptions for program flow!

Throwing exceptions is very expensive (must dump the registries, call stack, etc. and whilst doing this it blocks all threads) so it has a big impact on performance.

I have seen sites brought to their knees because of the number exceptions being thrown.

Redundant Code

In the solution we took over there were over 300 empty try-catch statements ☹

But how can it hide redundant code?

When an exception is thrown it can jump over lots of code, which is therefore never called.

Therefore, all the code after the exception is redundant.

Below is the classic Hello World program it works as expected, it prints out “Hello World”.

But there is a lot of technical debt, now this might look like a funny example, but I have seen a lot of similar examples in real world, usually with a lot more code in the try catch, and usually found most often around big complex integrations!

Solution – EmptyTryCatchService

For empty try catches I would not recommend you use Sitecore’s standard logging, as it can create enormous log files which is enough to kill your sitecore solution, if the empty try catch is called a lot.

For tracking down empty try catches, it is good to have a dedicated log file and a way to limit the amount of data written to the log file.

EmptyTryCatchService class provides the following features:

Report interval – the interval between exceptions with the same owner, name and exception message are written to the log file.
Max Log limit – when the number exceptions with the same owner, name and exception message is exceed no more data is written to the log file.
Dedicated log file for each day
Disable all logging via configuration.

EmptyTryCatchService class is a simple class that, relies on the MaxUsageLog for most of its functionality (see the code below).

In addition to finding redundant code the EmptyTryCatchService will track down hidden errors and problems in your solution, which will result in a reduction of the technical debt.

You must be careful when reviewing the exceptions logged and deciding how best to deal with the exceptions. See part 3 in the series, to reduce technical debt.

public class EnsureIsObsoleteService
{
private readonly MaxUsageLog _maxUsageLog =
new MaxUsageLog(10000, "EnsureIsObsoleteService",1000);
public void EnsureIsObsolete(object owner, string message)
{
_maxUsageLog.Log(owner, message);
}
}
public class MaxUsageLog
{

public MaxUsageLog(int maxLogLimit,
string fileNamePrefix,
int reportCountInterval=1000000)
{
_maxLogLimit = maxLogLimit;
_fileNamePrefix = !string.IsNullOrEmpty(fileNamePrefix) ? fileNamePrefix : "MaxUsageLog";
_reportCountInterval = reportCountInterval;
}

public void Log(object owner, string message, Exception ex = null)
{
if (!IsEnabled())
return;

string type = string.Empty;
if (owner != null)
{
if (owner is Type typeObj)
{
type = typeObj.FullName;
}
else
{
type = owner.GetType().FullName;
}
}
string key = GenerateKey(type, message, ex);
if (!ShouldLog(owner, key))
return;
var count = Count(key);
WriteToFile(owner, type, message, ex, count);
}

private int Count(string key)
{
return Usage.ContainsKey(key) ? Usage[key] : 0;
}

private void WriteToFile(object owner, string type, string message, Exception exceptionToLog, int count)
{
try
{
StreamWriter log = File.Exists(FileName) ? File.AppendText(FileName) : File.CreateText(FileName);
try
{
log.AutoFlush = true;
log.WriteLine($"{DateTime.Now.ToUniversalTime()}: Type:'{type}' Message:'{message}' Count:{count}");
if (exceptionToLog != null)
{
log.WriteLine($"Exception:{exceptionToLog}");
}
log.Close();
}
finally
{
log.Close();
}
}
catch (Exception ex)
{
if (!Sitecore.Configuration.ConfigReader.ConfigutationIsSet)
return;
Sitecore.Diagnostics.Log.Error(
$"Failed writing log file {FileName}. The following text may be missing from the file: Type:{type} Message:{message}",
ex, owner);
}
}
private bool ShouldLog(object owner, string key)
{
if (!Usage.ContainsKey(key))
{
Usage.Add(key, 1);
return true;
}
var count = Usage[key] = Usage[key] + 1;

if (count % _reportCountInterval == 0)
{
WriteToFile(owner, "******** Report Count Interval ******", $"Key:'{key}'", null,count);
}

if (count &gt; _maxLogLimit)
return false;
if (count == _maxLogLimit)
{
WriteToFile(owner, "******** Usage Max Exceeded ******", $"Key:'{key}' Max Limit:{_maxLogLimit}",null,count);
return false;
}
return true;
}
private string GenerateKey(string type, string message, Exception ex)
{
return ex != null ?
$"{_fileNamePrefix}_{type}_{message}_{ex.HResult}" :
$"{_fileNamePrefix}_{type}_{message}";
}

private string FileName
{
get
{
DateTime date = DateTime.Now;
string fileName = $@"\{_fileNamePrefix}.{date.Year}.{date.Month}.{date.Day}.log";

if (!Sitecore.Configuration.ConfigReader.ConfigutationIsSet)
return ConfigurationManager.AppSettings[Constants.Configuration.Key.LogFolderForApplications] + fileName;

return Sitecore.MainUtil.MapPath(Sitecore.Configuration.Settings.LogFolder) + fileName;
}
}

private bool IsEnabled()
{
if (!Sitecore.Configuration.ConfigReader.ConfigutationIsSet)
return StringToBool(ConfigurationManager.AppSettings[Constants.Configuration.Key.MaxUsageLogEnabled],false);

return Sitecore.Configuration.Settings.GetBoolSetting(Constants.Configuration.Key.MaxUsageLogEnabled, true);
}

private bool StringToBool(string value, bool defaultValue)
{
if (value == null)
return defaultValue;
bool result;
if (!bool.TryParse(value, out result))
return defaultValue;
return result;
}

private readonly int _maxLogLimit;
private readonly string _fileNamePrefix;
private readonly int _reportCountInterval;

// this is to ensure we can count how many times a message has been logged across all threads
private static readonly Dictionary&lt;string, int&gt; Usage = new Dictionary&lt;string, int&gt;();
}

Hope this was of help, Alan