Tag Archives: SolR

Sitecore SolR Sorting Challenge

As I promised in my last post (please read it first) here is a solution to address the SolR sorting issues.

The Problem

The issue is that different pages, usually have different date fields to represent how they should be sorted and if we want to adhere to the Helix principles, the Solr feature must NOT KNOW ABOUT PAGE TYPES.

For example, a news page will have a news date, calendar event might use the start date and an some page will not have a date field and therefore will have to use created and or updated.

Typically, I see solutions that deal with this issue at retrieval time i.e. index all the different fields and then have a specific “order by” clause for each page type.

The biggest disadvantages of this approach is that you cannot sort a list with different page types i.e. get the 10 latest items that are either news, event or articles.
In addition, you have to manage all the different order by clauses. Which will destroy the Indexing/SolR abstraction as you will have to expose the IQueryable<T> in order to apply the order by clause.

Solution

I prefer to deal with the sorting issue at indexing time and have a single dedicated SolR field which is used to sort all item types. This allows you to sort news, articles, calendar events, etc. in the same way.

You still must deal with the issue that the SolR implementation should not know about which field to use for a give item type. To overcome this issue we use a configuration file that defines the mapping between an item of a specific type and which field to use for sorting.

Template to Field Mapping

The following configuration defines which field should be stored for sorting for each item template, if a field mapping is not defined, the item updated value is used.

In sitecore,  it is easy to map the configuration below to a C# class (i.e. SortFieldMappingRepository) for more information, about how to do this see my blog post on Structured, Type Safe Settings in Sitecore.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:environment="http://www.sitecore.net/xmlconfig/environment" xmlns:role="http://www.sitecore.net/xmlconfig/role/">
	<sitecore>
		<feature>
			<SolRIndexing>
				<SortFieldMappingRepository type="Feature.SolRIndexing.Infrastructure.ComputedFields.Sorting.SortFieldMappingRepository, Feature.SolRIndexing" singleInstance="true">
					<mappings hint="raw:Add">
						<!--News, NewsDate -->
						<sortFieldMapping templateId="{AE6B4DF2-DF36-4C6D-ABDA-742EE6B85DE9}" sortFieldIdId="{3D43D709-DFAE-4B4F-8CB2-DF80D9B83857}"/>
						<!-- Calander, StartDate-->
						<sortFieldMapping templateId="{A8DD1F59-08AB-4BF0-BE76-8873A8F00628}" sortFieldIdId="{6369AC75-036B-48D8-95E2-F16998F8E777}"/>
						<!-- Video, VideoDate -->
						<sortFieldMapping templateId="{3D9D8B7A-FCB2-459B-908B-1E31F0C975FB}" sortFieldIdId="{E9993C21-1EF0-4C30-83D4-5F69923CEC3E}"/>
						<!-- Article, ModifiedDate Field -->
						<sortFieldMapping templateId="{F6B599F4-11C4-4C65-B253-95F3C40EBA18}" sortFieldIdId="{DC6C0E49-1705-4F3E-80EF-83176E482DBC}"/>
					</mappings>
				</SortFieldMappingRepository>
			</SolRIndexing>
		</feature>
	</sitecore>
</configuration>

Define the SolR index Field

Then we define the SolR index field used for sorting and specify that the SortComputedIndexField class is responsible for adding the sort date to the index.

<sitecore>
	<contentSearch>
		<indexConfigurations>
			<defaultSolrIndexConfiguration>
				<documentOptions>
					<fields hint="raw:AddComputedIndexField">
							<!-- Sorting-->
							<field fieldName="_sort" returnType="datetime" >Feature.SolRIndexing.Infrastructure.ComputedFields.Sorting.SortComputedIndexField, Feature.SolRIndexing</field>

					</fields>
				</documentOptions>
			</defaultSolrIndexConfiguration>
		</indexConfigurations>
	</contentSearch>
</sitecore>

The SortComputedIndexField class is responsible for providing the value for the sort field and it calls the CalculateSortDateService to determine the sort value.

namespace Feature.SolRIndexing.Infrastructure.ComputedFields.Sorting
{
    public class SortComputedIndexField : AbstractComputedIndexField
    {
        private readonly CalculateSortDateService _calculateSortDateService;

        public SortComputedIndexField(CalculateSortDateService calculateSortDateService)
        {
            _calculateSortDateService = calculateSortDateService;
        }

        public SortComputedIndexField()
        {
            _calculateSortDateService = ServiceLocator.ServiceProvider.GetRequiredService<CalculateSortDateService>();
        }

        public override object ComputeFieldValue(IIndexable indexable)
        {
            Item item = indexable as SitecoreIndexableItem;
            if (item == null)
                return null;

            if (!item.Paths.FullPath.StartsWith(Constants.SitecoreContentRoot))
                return null;
            return _calculateSortDateService.CalculateSortDate(item);
        }
    }
}

The CalculateSortDateService class iterates over the field mappings, defined in the configuration and uses the field value for the date if the field is found, otherwise the updated value for the item is used.

namespace Feature.SolRIndexing.Infrastructure.ComputedFields.Sorting
{
    public class CalculateSortDateService
    {
        private readonly SortFieldMappingRepository _sortFieldMappingRepository;

        public CalculateSortDateService([NotNull]SortFieldMappingRepository sortFieldMappingRepository)
        {
            Assert.ArgumentNotNull(sortFieldMappingRepository, nameof(sortFieldMappingRepository));
            _sortFieldMappingRepository = sortFieldMappingRepository;
        }

 
        public DateTime CalculateSortDate([NotNull] Item item)
        {
            Assert.ArgumentNotNull(item, nameof(item));
            var mappings = _sortFieldMappingRepository.Get();
            if (mappings == null)
                return item.Statistics.Updated;

            foreach (var sortFieldMapping in mappings.Where(m => m != null))
            {
                if (item.TemplateID != sortFieldMapping.TemplateId)
                    continue;

                Field dateField = item.Fields[sortFieldMapping.SortFieldId];
                if (dateField == null || string.IsNullOrWhiteSpace(item[sortFieldMapping.SortFieldId]))
                    continue;

                return new DateField(dateField).DateTime;
            }
            return item.Statistics.Updated;
        }
    }
}

Sorting Extensions

The last part is to provide the ability to sort the result set and for this we introduce the SortDateSearchResultItem class and a few extensions methods to add sort ascending & descending.

namespace Feature.SolRIndexing.Infrastructure
{
    public class SortDateSearchResultItem : SearchResultItem
    {
        [IndexField("_sort")]
        [DataMember]
        public virtual DateTime SortDate { get; set; }
    }
}

namespace Feature.SolRIndexing.Infrastructure.ComputedFields.Sorting
{
    public static class SortingQueryableExtensions
    {
        public static IQueryable<T> SortDescending<T>(this IQueryable<T> query) where T : SortDateSearchResultItem
        {
            return query.OrderByDescending(item => item.SortDate);
        }
        public static IQueryable<T> SortAscending<T>(this IQueryable<T> query) where T : SortDateSearchResultItem
        {
            return query.OrderBy(item => item.SortDate);
        }
    }
}

I hope this post will help, Alan

SolR

Introduce a (SolR) Sitecore Search Abstraction

After my previous post on Supporting Integrations, I received a few comments asking why was SolR was in the integration’s module group, as it is part of the sitecore API.

In this blog i will explain why and in more detail how to isolate a SolR integration.

Yes Sitecore Search is part of the Sitecore API, but it relies on an 3rd party system! Please read my previous post about why you need to identify, separate and isolate modules with external dependencies, as Sitecore Search API faces exactly the same challenges.

With the bonus that there are 3 supported implementations (Lucene, SolR and Azure Search) which are almost the same, but not quite!

Sitecore Search Issues

In most of the helix-based solution I have seen indexing is implemented in the framework layer which provides some helper extensions. Then each feature uses indexing module with Sitecore Search API to implement their requirements. This typically leads to the following issues:

  • Duplicated code across features
  • No clear definition of the indexing/constraints/sorting requirements for the solution.
  • Non-consistent implementation across the solution i.e. Predicate builder vs LINQ.
  • Optimization is difficult.

With each feature implement their indexing requirements, it leads to duplicated code as it feature needs to build the query to add sitecore root item, base templates, language etc. for each request, before adding the feature specific part of the query.

Therefore when fixing a bug or performance issues you must track down all the places where Search is used and then determine if they require the same fix and or the optimization.

How to abstract away the SolR Search Implementation

  • Identify the indexing requirements
    • Introduce an abstraction in the foundation layer (Indexing).
  • Create the implementation (Solr Indexing) that implements the abstraction define by Indexing in the foundation layer.
    • Address the sorting issues (i.e. different items templates have different date fields)
  • Let the features use the indexing abstractions (i.e. Course, News, Calendar, etc.)

Identify the indexing requirements

There are 3 main components to define the indexing requirements constraints, pagination & sorting.

Constraints

Constraints define what the filters can be applied to reduce the number of items that are returned. In this example it will be possible to apply the following constraints:

  • Location in tree sitecore (i.e. site specific news folder, all content, etc.)
  • Language (i.e. return items with an English language version)
  • Template, i.e. does the item inherit from a specific template (i.e. news, calendar, etc.)
  • Taxonomy – return items based on their categorization (i.e. football, skiing, etc.)

Pagination

Defines the number of search result per page and which page you require.

Sorting

Is responsible for defining what is used to sort the result items and the direction (ascending or descending), for example using date to get the 10 latest news.

If you want to sort by date, one challenge is to determine how to sort he results, as different pages will have different fields. Some pages have no date apart from created/updated, news normally has a specific news date and calendar events have start/end dates.

The SolR implementation must NOT KNOW ABOUT PAGE TYPES, see my blog post with a solution.

The following code defines the indexing requirements.

public interface IConstraint
{
    Item RootItem { get; }
    Language Language { get; }
    ID BaseTemplate { get; }
    IEnumerable<Category> Categories { get; }
}
public interface IPagination
{
    int Number { get; }
    int Size { get; }
}
public enum SortDirection
{
    Ascending,
    Descending
}

Then we need to define the result of making a search and a repository to make the search

public interface IPagedSearchResult
{
    IEnumerable<Item> Results { get; }
    IPagination Pagination { get; }
    int TotalHits { get; }
    bool HasMoreResults { get; }
}
public interface IPagedSearchResultRepository
{
    IPagedSearchResult Get([NotNull] IConstraint searchConstraint, [NotNull] IPagination pagination, SortDirection sortDirection);
}

The definition of the search result could of been type safe, i.e. return a model of type T instead of the Item, but I wanted to keep the example simple and not use a specific binding framework.

Anyway I hope this post will help, Alan