Category Archives: Performance

Sitecore and Azure Durable Functions

In this post I will show how Azure Durable functions can complement your sitecore solution and help enhance performance.

Problem

We took over a Sitecore solution and its content management server was running very slowly and intermittently the sitecore client would be unresponsive and crash

The problem was caused a number a lot of CPU/Data/Bandwidth intensive schedule tasks that were running to retrieve a wide range of data from a number of web services, then aggregate the data and perform complicated calculations, of which a small sub set of the result were presented on the website.

Solution

As the solution was already hosted in Azure, a perfect solution was to off load the heavy lifting from the Content Management server to Azure Functions, to do the data retrieval, calculations and provide the results for the website. Firstly, a very brief overview of the pro’s and con’s of Azure functions.

Pro’s

  • Serverless execution model
  • Dynamic Scaling
  • Micro pricing
  • Security
  • Wide range of triggers
    • Https, Timer (CRON), Azure storage changes, Azure Queue, Message from Service bus, etc.

Con’s

  • Stateless
  • Execution time limit (default 5 mins, max 10)
  • Concurrency

The main challenge with Azure functions is that most of the schedule tasks could take more than 10 minutes to complete and require state management. But not to worry as Azure Durable Functions came to the rescue.

Azure Durable Functions

Durable Functions are an extension of Azure Functions and Azure WebJobs that lets you write stateful functions in a serverless environment. The extension manages state, checkpoints, and restarts for you, so it is possible to implement code that run for a long time.

In addition if an Azure function fails, for example the web request times out, you can define if the durable function should wait and retry X times, before failing. Behind the scenes, the Durable Functions extension is built on top of the Durable Task Framework, an open-source library on GitHub for building durable task orchestrations.

Advantages of Durable Functions

  • They define workflows in code. No JSON schemas or designers are needed.
  • They can call other functions either synchronously or asynchronously.
  • Output from called functions can be saved to local variables.
  • They automatically checkpoint their progress whenever the function awaits.
  • Local state is never lost, even if the process recycles or the VM reboots.
  • Easy to Unit Test
  • Can run for a very long time, in theory forever
  • Cost effective, as you do not pay for execution time whilst waiting for tasks to complete.

Here is a brief introduction to the most common Durable Functions patterns

Pattern 1 – Function chaining

Function chaining refers to the pattern of executing a sequence of functions in a particular order. Often the output of one function needs to be applied to the input of another function.

function chaining

The code below is an example of how you would achieve this

chaining code

Pattern 2 – Fan-out/fan-in

Fan-out/fan-in refers to the pattern of executing multiple functions in parallel, and then waiting for all to finish. Often some aggregation work is done on results returned from the functions. This is perfect when you want to do a lot of things in parallel, to reduce the time taken to complete the task and then aggregate/process all the results.

Below is an example of how the code could look

Pattern 3 – Monitoring

The monitor pattern refers to a flexible recurring process in a workflow – for example, polling until certain conditions are met. A regular timer-trigger can address a simple scenario, such as a periodic clean-up job, but its interval is static and managing instance lifetimes becomes complex. Durable Functions enables flexible recurrence intervals, task lifetime management, and the ability to create multiple monitor processes from a single orchestration.

An example could be instead of exposing an endpoint for an external client to monitor a long-running operation, the long-running monitor consumes an external endpoint, waiting for some state change. See the example below.

Is Replaying

One thing that catches people out is that the code is re-run from the start of the function after each await completes, therefore for example with Logging and other code you need to check for IsReplaying so you only log once.

Durable Functions – Orchestrator code constraints

There are a number code constraints, that must be adhered to when using Durable function orchestration.

  • Code must be deterministic.
    • It will be replayed multiple times and must produce the same result each time.
    • For example, no direct calls to get the current date/time, get random numbers, generate random GUIDs, or call into remote endpoints.
  • Non-deterministic operations must be done in activity functions
    • This includes any interaction with other input or output bindings. This ensures that any non-deterministic values will be generated once on the first execution and saved into the execution history. Subsequent executions will then use the saved value automatically.
  • Orchestrator code should be non-blocking.
    • For example, that means no I/O and no calls to Thread.Sleep or equivalent APIs
    • Orchestrator code must never initiate any async operation, except by using the IIDurableOrchestrationContext API.
    • For example, no Task.Run, Task.Delay or HttpClient.SendAsync.
    • The Durable Task Framework executes orchestrator code on a single thread and cannot interact with any other threads that could be scheduled by other async APIs.
  • Infinite loops should be avoided
    • Because the Durable Task Framework saves execution history as the orchestration function progresses, an infinite loop could cause an orchestrator instance to run out of memory.
    • For infinite loop scenarios, use APIs such as ContinueAsNew to restart the function execution and discard previous execution history.

Result

By migrating all the long running CPU/data/bandwidth intensive tasks to Azure Durable Functions, the performance of the Sitecore solution went from painful to fantastic.

Unfortunately it is very common that Sitecore solutions assume responsibility for task that are not the websites responsibility, but pairing with Azure functions can help mitigate this issue.

An additional benefit was that the website was isolated/protected from 3rd party system changes, as when an external system changes only the Azure functions had to be modified and deployed – therefore no down time for the sitecore solution.

Anyway I hope sitecore develops will consider Azure functions to enhance their sitecore solutions.

 

Sitecore client and logon is very slow (properties table AGAIN)

The problem

I was at a customer to help identify why their Sitecore 7.2 client and logon was so slow!

Investigation

There are a number of things I look at when the Sitecore client starts to run slowly:

  1. Slow Event handlers (save, rename, create, etc)
  2. Slow pipeline processors (usually don’t check that it is a sitecore specific requests)
  3. History, Publish queue or Event queue have to many entries, see my blog on how to fix this.
  4. Property change events flooding the Event Queue in the Core database, see my blog on how to fix this

But after ensuring all the previous issues were not the problem, I found a new issue with the properties table.

Properties Table flooded with SC_Ticket entries

The properties table in the core database had over 500000 entries, it was filled with SC_TICKET_xxx entries.

tickets

Unfortunately the properties table does not have a created date column, so I could not write an SQL script to purge all the entries that where more than X days old.

I noticed that in the value column there was a time-stamp embedded in the Value field. My initial solution was to could create an sitecore agent to do the following:

  1. iterate over all the entries in the properties table
  2. Parse the value for SC_Ticket entries
  3. Remove all the entries that were older than X days.

I knew Sitecore must have a class, which had created all these entries. So using DotPeek I started my search and found the TicketManager class. The TicketManager even had a IsTicketExpired function.

is expired function

Solution

I found that there is already a Sitecore agent that checks for any tickets that are expired and removes them. It is called the CleanupAuthenticationTicketsAgent for some reason this was not in the web.config, but it is easy enough to add see below.

agent

But the important step is to reduce the number of days to keep the tickets as the default is 180. The Authentication.ClientPersistentLoginDuration setting is responsible for determining how long before the ticket should expire (see the IsTicketExpired function in the image above).

I set Authentication.ClientPersistentLoginDuration to 5 and it reduced the number of entries in the properties table to around 500, and then sitecore client and logon was much faster.

Prior to writing this post I wasn’t aware but the is a blog about how sitecore sessions can expire.

Sitecore – property change events flooding the Event Queue in the Core database

I had an issue that the EventQueue table in the Core database which was being flooded with events, which in turn was responsible for causing bad performance.

This was caused by the fact that each time the Lucene indexes are updated Sitecore raises the database:propertychanged event.

event queue core datbase

 

The default implementation subscribes to the event and adds it to the EventQueue table in the core database. I set the “Days To Keep” events to 1 day – but it still inserts over 100000 events a day 😦

Sitecore have registered this as a bug, but as yet there is no fix for 7.5 or 8.

Solution

This is not very nice at all, but  it was the best we could do as a workaround for this issue. The credit for this solution must go to my colleague Peter Wind (@peterwind) who is created the hack!

Use reflector to get the code for the default Sitecore.Eventing.Remote.RemoteEventMap class (defined in Sitecore.Kernel.dll), and create your own class, see below.

own class

Modify the SetupGlobalEventSubscribers function, so you subscribe to the database:propertychanged event – but do not add the event to the EventQueue, see below.

code change

The last step is to update the configuration in the web.config to use your new and improved class.

config

note: I have the support case number in the namespace & DLL name – so when the bug is fixed, I can remove this code and in addition any other developer at Pentia can see the history related to this support issue and why I have done such a nasty fix.

Anyway I hope this helps somebody out there 🙂

 

Sitecore Fast Query Syntax – Can kill your SQL Server or website

Over the years I have worked on lots of websites that have performance issue caused by sitecore queries that iterate over too many items, usually searching through descendants.

In development where there are not too many items the query performs OK. The site goes live and they start to add content and soon there are 1000’s of items and the queries becomes slower and slower over time. A typical example is searching for the 10 latest news, articles, blog’s etc.

At this point a fix is made by changing the queries to use the “fast:” syntax. There are a lot of articles already explaining this in detail, so this is just a brief intro, the fast syntax translates the query directly into an SQL server database query, and therefore for some queries it can perform faster, use less memory and less CPU.

But a warning it bypasses all caching that sitecore provides and make a request directly to the database every time the query is executed; in development this tends to outperform the standard queries that would have to iterate over a lot of items.

In development you usually test the normal query against the fast syntax query to establish if it is quicker and if it’s quicker you use it and everybody is happy… but that is not the full story.

Let’s assume each page generates 10-20 queries that rely on fast syntax to retrieve their items. Therefore each page request generates 10-20 calls to the SQL database, I’ve seen sites that generate 100’s of SQL request per page 😦

In development this would typically not be an issue and or not noticed, as only one page at a time is requested, but in a production on a website with lots of requests it can kill the SQL database and or slow the site down as sitecore itself cannot retrieve items form the database, as the SQL server is busy with all the fast queries.

Therefore you have to be VERY VERY CAREFUL with the use of fast queries.

I would say in 97% of cases – if your queries slow and it is caused by iterating over to many items the correct solution is to use Sitecore search i.e. lucene, Solr, Coveo or another indexer to retrieve the items.