Index dependencies in Sitecore 7.1

I came across an interesting set of challenges recently when working on a Sitecore 7.1 project (version 140130 to be precise) and using the ContentSearch indexing API.

Typically a Sitecore index will be set up to crawl a subtree of items to pull in its data. Using the IntervalAsynchronousStrategy ensures that any changes to items within your specified subtree cause an incremental update to your index, ensuring that it is kept up-to-date. What if your index depends on items outside of this subtree? In one of our indexes we used some computed fields which pulled in data from items which referred to the item being indexed (using the Links database to find the references). The referrer items would always be outside of the index's subtree of items - so a modification to one of these items would not update the index in question (actually it would - but only due to a bug which means the provided crawler would ignore the root item set in the config - use the advice here to fix this!)

The <indexing.getDependencies> pipeline looks initially like it might help here. Sitecore's documentation states that "This pipeline is designed to address issues when a search document is built from the data coming from more than one item." The idea is that when an item within an index is updated, you have the opportunity to flag that certain other items within the index need to be updated too. This is no use in our scenario however; The item change we need to intercept happens to an item which is outside of the subtree of the indexed item. We need almost the reverse - to be able to trigger an index update when an non-indexed item is modified. The <indexing.getDependencies> pipeline might be of more benefit in parent -> child relationships within an index - for instance, if any given item stores some data about its children in a computed field (for example), a change to a child item could be caught by the pipeline, and we could request an index update to the parent item.

Luckily, ContentSearch provides an IndexCustodian utility class which makes it easy to trigger a standalone update to an indexed item (there's an UpdateIndex method) - so let's hook into the good old item:saved event, check whether the item being saved is one relating to our index, and - if it is - find related index items (using the links database) and trigger an index update on all of those items. Note that the IndexCustodian also has other useful methods, such as DeleteItem, PauseIndexing and ResumeIndexing (the actions of which are hopefully self-explanatory.)

Problem solved? Not quite - due to a subtlety which is not perculiar to indexing scenarios. You see, the "auxiliary" item can be linked to zero-to-many index items, via a TreeListEx field. The desired behavior is that when the auxiliary item is linked to a new index item, the index item's relevant computed fields are updated with extra data from the newly linked auxiliary item. Conversely, if the link to an index item is removed, the index item should have the linked data removed when the crawler re-computes the relevant fields. The item:saved event knows nothing of links which have been removed however (as the event triggers when the item save has completed), so the index item continues to contain data from the auxiliary item which it should no longer have (at least until the index gets rebuilt.) You could try using the item:saving event instead, which has data from both before and after the item save completion - however, triggering an index update prior to the item being saved doesn't work either: the index item will update but because the computed fields use the links database to find related auxiliary items (and the item save is not yet done) the link has not yet been removed - the computed field still re-pulls data from the computed field.

For our project, the solution came from the saveUI pipeline - here we still have access to field-level changes after the item save has completed. We can now collect linked index items (both current and past) and trigger index updates for all of them. Some indicative code follows - note that the processor delegates to the "worker" class below where all Sitecore classes have been wrapped for testability :) but you should get the gist of what is being done:

    public class UpdateIndex
    {
        private readonly IDatabase _masterDatabase;
        public UpdateIndex(IDatabase masterDatabase)
        {
            _masterDatabase = masterDatabase;
        }
        public void Update(ISaveArgs args)
        {
            if (args.IsNull)
            {
                return;
            }
            foreach (var saveItem in args.Items)
            {
                foreach (var saveField in saveItem.Fields)
                {
                    if (saveField.Id == FieldIds.FieldIdYouCareAbout)
                    {
                        var linkedItemIdsBefore = saveField.OriginalValue;
                        var linkedItemIdsAfter = saveField.Value;
                        UpdateReferencedIndexItems(linkedItemIdsBefore, linkedItemIdsAfter);
                    }
                }
            }
        }
        private void UpdateReferencedIndexItems(string linkedItemIdsBefore, string linkedItemIdsAfter)
        {
            // This assumes that the item can always be found in the database ...
            var linkedItems = linkedItemIdsBefore.Split('|')
                .Concat(linkedItemIdsAfter.Split('|'))
                .Where(s => !String.IsNullOrEmpty(s))
                .Distinct()
                .Select(s => _masterDatabase.GetItem(new ID(s)));
            foreach (var item in linkedItems)
            {
                // Code to update each index item
            }
        }
    }
    

 

The only downside is that - as the name suggests - the pipeline is only run when an item is saved via the UI (rather than, for example, changed programatically via the Sitecore Content API.) This isn't an issue for our project (although it did make obsolete a few basic integration tests scripts which operated as simple aspx pages triggering item updates programatically.)

 


By James at 3 Jul 2015, 21:37 PM


Comments

Post a comment