Few weeks ago my colleagues and I have concluded a project for one of our clients to migrate their search hosting indexing solution from the Coveo Enterprise Search on-premises platform to the modern Coveo Cloud platform. The search solution on the website was implemented using the Coveo for Sitecore module, version 4. In this blog post I am going to share the details of an issue that we have experienced with the security identities permissions synchronization and the lessons learned while troubleshooting it.

The symptoms of the issue

The steps to migrate the search index hosting solution from On-Premises to the Cloud were clear and easy to implement, as described on the official Coveo documentation portal here. One final step that was not documented, but was pretty intuitive was that the Coveo indexes (coveo_master_index and coveo_web_index in our solution) needed to be rebuilt using the Indexing Manager in the Sitecore Control Panel. This final rebuild step is needed to create the new sources and their indexes in the Coveo Cloud platform. This is when we started to experience the following symptoms of our issue:

  • The coveo_master_index rebuild process was never starting to index any Sitecore item. A source for the coveo_master_index was successfully getting created in Coveo Cloud, but the indexing process was hanging forever in its initialization phase. The coveo_web_index was getting rebuilt without any problems instead.
  • Some security identities for the Expanded Sitecore Security Provider for the website application were in the “In Error” state in the Coveo Cloud platform. The last update result in their detailed view was just saying: “Entity is unavailable“, while these identities existed in the Sitecore application instead. This information was reported in the “Security Identities” page under the Content section on the Coveo Cloud platform.
  • No errors were found in the Sitecore logs, and for this reason it was very difficult to troubleshoot the issue.

Coveo Support to the rescue

I usually like to investigate issues on my own before reaching out to product support teams. This time I knew that I didn’t have enough elements to work with and I immediately decided to open a ticket with Coveo Support. It was the right move.

After reviewing and confirming that our implementation was done correctly, the Coveo Support team asked me to add a different logging configuration to collect additional logs during the index rebuild process.

The logging configuration consisted in two new loggers, that were not documented in the Coveo documentation portal. The Coveo for Sitecore module writes records in the logs using many log4net loggers named differently. These loggers are also different if using the on-premises solution or the cloud solution of the product. The correct name of a logger in the logging configuration is fundamental to collect logs successfully.

This was the additional logging configuration that I added to the Sitecore application:

<appender name="CoveoPermissionsLogger" type="log4net.Appender.SitecoreLogFileAppender, Sitecore.Logging">
	<file value="$(dataFolder)/logs/Coveo.Permissions.{date}.txt" />
	<appendToFile value="true" />
	<layout type="log4net.Layout.PatternLayout">
		<conversionPattern value="%4t %d{ABSOLUTE} %l %-5p %m%n" />
	</layout>
</appender>
<logger name="Coveo.CloudPlatformClient" additivity="false">
	<level value="INFO" />
	<appender-ref ref="CoveoPermissionsLogger" />
</logger>
<logger name="Coveo.SearchProviderBase" additivity="false">
	<level value="INFO" />
	<appender-ref ref="CoveoPermissionsLogger" />
</logger>

This configuration change was the turning point to identify the root cause of this issue!

The root cause

The security identities permissions synchronization consists in two consecutive phases. First the process synchronizes the roles, collecting the list of users in each role. Then the process synchronizes the aggregated entire collection of users returned by the roles synchronization.

The additional logging revealed that the synchronization process was failing when the system was trying to retrieve the list of users in a particular domain role, called “Everyone“, a role that every user of the web application was part of. The website was using a custom SQL membership provider and the code responsible to retrieve the list of users in a role was not optimized to perform well with hundreds of thousands of users, causing a timeout error.

After a proper refactoring of the custom membership provider code, the security identities permissions synchronization finally succeeded and no security identities were marked as “In Error” state at the end of the process anymore.

Why not before?

This issue was not caused by switching to the Coveo Cloud platform, but this change exposed it. Why? The Coveo Support team has helped me to answer this question.

The Coveo on-premises solution and the Coveo Cloud solution don’t gather the permissions of security identities from Sitecore in the same way. With Coveo Cloud, the security identities and their permissions are sent to the cloud indexing platform alongside the indexed documents. Instead with the Coveo Enterprise Search on-premises solution, the security provider connects back to the Sitecore instance to retrieve the security identities permissions only at the end of an index rebuild, and not as part of the rebuild process itself.

When using the Coveo Cloud solution, the coveo_master_index rebuild triggers the security identities permissions synchronization at the beginning of the indexing process. This is why its rebuild process was hanging forever in the initialization phase, after failing silently.

Conclusions

If you are experiencing an issue after migrating to the Coveo Cloud platform, I recommend to check if you have the correct logging configuration in place in your Sitecore solution to collect the information needed to troubleshoot it. And of course, contact and involve the Coveo Support team, if you can. I would not have fixed this issue without their precious help.

Thank you for reading!

One thought on “Migration to Coveo Cloud: Lessons Learned While Troubleshooting an Issue with the Security Identities Permissions Synchronization

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s