Category Archives: SolR

Untangling the Sitecore Search LINQ to SolR queries

Problem

It can be very difficult to identify why you do not get the search results you expected from Sitecore Search, but there is a simple way to help untangle what is going on.

Solution

It is possible to see the query that Sitecore generates and sends to SolR and then use the query on the SolR instance to see what data is returned to Sitecore.

This is such a huge help when trying to understand why your queries do not work!

Step 1 – Find the Query that was sent to SolR from Sitecore

Sitecore logs all the queries it sends to SolR in the standard sitecore log folder, look for files named Search.log.xxx.yyy.txt .

Step 2 – Execute the query in your SolR instance

Go to your Solr instance, and use the core selector drop down to select the index your Sitecore Search query is being executed against.

Select Query, from the menu

Then paste the query from the sitecore log, and you can see the result that is returned to Sitecore.

This has helped me a lot, so I hope this helps others untangling their search results using Sitecore Search 🙂

 

 

 

How IQueryable and Take can kill your Sitecore Solution

We had a solution that had serve performance issue when it got a lot of visitors. Sitecore was casting the following exception and SolR had a similar errors in its logs:

Unable to read data from the transport connection: The connection was closed

We identified that the problem was caused by hitting the network bandwidth in Azure!
Yes, there were a lot of visitors, but enough to hit the bandwidth limit, the customer upgraded the plan to get more network bandwidth, but still the issues continued.

But what could cause this issue?

I started to review the SolR implementation and found the issue quite quickly.

return IQueryable<Result>
            .Where(result => result.Date < DateTime.UtcNow)
            .OrderByDescending(result => result.Date)
            .GetResults()
	    .Take(count)
            .ToList();

The Take() was made after GetResults() was called, so the entire data set is returned to Sitecore from SolR, then the take was applied to get the top 5 results.

This simple mistake was what caused all the network and performance issues.

Solution

return IQueryable<Result>
            .Where(result => result.Date < DateTime.UtcNow)
            .OrderByDescending(result => result.Date)
	    .Take(count)
            .GetResults()
            .ToList();

It was a simple fix (in 150+ places) to move the Take before GetResults!

This is why I believe that you should always Introduce a (SolR) Sitecore Search Abstraction, please read my post on this very subject, instead of returning the IQueryable interface.

Hope this helps, Alan

 

SolR

Introduce a (SolR) Sitecore Search Abstraction

After my previous post on Supporting Integrations, I received a few comments asking why was SolR was in the integration’s module group, as it is part of the sitecore API.

In this blog i will explain why and in more detail how to isolate a SolR integration.

Yes Sitecore Search is part of the Sitecore API, but it relies on an 3rd party system! Please read my previous post about why you need to identify, separate and isolate modules with external dependencies, as Sitecore Search API faces exactly the same challenges.

With the bonus that there are 3 supported implementations (Lucene, SolR and Azure Search) which are almost the same, but not quite!

Sitecore Search Issues

In most of the helix-based solution I have seen indexing is implemented in the framework layer which provides some helper extensions. Then each feature uses indexing module with Sitecore Search API to implement their requirements. This typically leads to the following issues:

  • Duplicated code across features
  • No clear definition of the indexing/constraints/sorting requirements for the solution.
  • Non-consistent implementation across the solution i.e. Predicate builder vs LINQ.
  • Optimization is difficult.

With each feature implement their indexing requirements, it leads to duplicated code as it feature needs to build the query to add sitecore root item, base templates, language etc. for each request, before adding the feature specific part of the query.

Therefore when fixing a bug or performance issues you must track down all the places where Search is used and then determine if they require the same fix and or the optimization.

How to abstract away the SolR Search Implementation

  • Identify the indexing requirements
    • Introduce an abstraction in the foundation layer (Indexing).
  • Create the implementation (Solr Indexing) that implements the abstraction define by Indexing in the foundation layer.
    • Address the sorting issues (i.e. different items templates have different date fields)
  • Let the features use the indexing abstractions (i.e. Course, News, Calendar, etc.)

Identify the indexing requirements

There are 3 main components to define the indexing requirements constraints, pagination & sorting.

Constraints

Constraints define what the filters can be applied to reduce the number of items that are returned. In this example it will be possible to apply the following constraints:

  • Location in tree sitecore (i.e. site specific news folder, all content, etc.)
  • Language (i.e. return items with an English language version)
  • Template, i.e. does the item inherit from a specific template (i.e. news, calendar, etc.)
  • Taxonomy – return items based on their categorization (i.e. football, skiing, etc.)

Pagination

Defines the number of search result per page and which page you require.

Sorting

Is responsible for defining what is used to sort the result items and the direction (ascending or descending), for example using date to get the 10 latest news.

If you want to sort by date, one challenge is to determine how to sort he results, as different pages will have different fields. Some pages have no date apart from created/updated, news normally has a specific news date and calendar events have start/end dates.

The SolR implementation must NOT KNOW ABOUT PAGE TYPES, see my blog post with a solution.

The following code defines the indexing requirements.

public interface IConstraint
{
    Item RootItem { get; }
    Language Language { get; }
    ID BaseTemplate { get; }
    IEnumerable<Category> Categories { get; }
}
public interface IPagination
{
    int Number { get; }
    int Size { get; }
}
public enum SortDirection
{
    Ascending,
    Descending
}

Then we need to define the result of making a search and a repository to make the search

public interface IPagedSearchResult
{
    IEnumerable<Item> Results { get; }
    IPagination Pagination { get; }
    int TotalHits { get; }
    bool HasMoreResults { get; }
}
public interface IPagedSearchResultRepository
{
    IPagedSearchResult Get([NotNull] IConstraint searchConstraint, [NotNull] IPagination pagination, SortDirection sortDirection);
}

The definition of the search result could of been type safe, i.e. return a model of type T instead of the Item, but I wanted to keep the example simple and not use a specific binding framework.

Anyway I hope this post will help, Alan