Monitoring Sitecore on AKS – Application Insights Instrumentation for Distributed Tracing and Sitecore Logs Collection

In a microservice architecture like the containerized Sitecore XP1 topology, the ability to track the progression of a single request through different services is very important to investigate problems, optimize our application code and build more reliable services. This method, called distributed tracing, helps to clearly show the dependencies and relationships among various services and components of a system. In complex distributed systems, where each service generates its own logs, it is also important to be able to correlate application logs to specific web request traces. This method, called contextual logging, automates the association of logs to traces and spans and adds new logging properties (like error messages or error classes) in the group of tracing dimensions, that can be used in querying and troubleshooting processes when trying to identify a specific pattern or clue.

In the Azure cloud world, these two methods can be implemented instrumenting the Sitecore services with Application Insights, a feature of Azure Monitor that provides extensible application performance management and monitoring capabilities.

In this blog post, I will describe how to procure Application Insights in an existing AKS cluster and the steps needed to instrument all services of a containerized Sitecore 10.2 XP1 solution to collect distributed traces and correlated Sitecore logs, targeting the latest Application Insights SDK version (2.20).

Monitoring Sitecore on AKS – Enhanced Logs Collection with Structured Logging and Source Filtering in Query

There are several ways to collect logs in Kubernetes. For Sitecore applications running in a Kubernetes cluster, the official Kubernetes specifications and the Sitecore images support the following two approaches to expose logs outside of containers: using the default stdout container logs or using a persistent volume to share logs outside containers.

The first approach relies on the fact that Sitecore images have been instrumented with LogMonitor, a Microsoft container tool responsible to monitor log sources inside the container and to pipe a formatted output to the stdout container output. Log sources can include Windows system event logs, IIS logs and Sitecore application logs. The collected logs are all piped together and it can be challenging to filter and analyze them, because LogMonitor doesn’t enrich the logs with a source property. This challenge becomes even bigger when the Sitecore application generates multiline logs, for example when an error with its stack trace is logged: each line of a multiline log record is collected separately and they can get separated and mingled in the piped stdout output with other source logs (for example IIS logs).

The second approach consists instead in storing logs in a persistent volume, so that can be made available and stored outside the containers, in an Azure File storage resource. This configuration has been introduced recently with Sitecore 10.2 Kubernetes specifications and it has been configured for Sitecore application logs only. The same configuration could be applied for other file sources inside the containers, like for example IIS logs. This approach helps to keep different source logs separated outside the containers.

Once the logs are available outside the containers, there are different solutions that can be implemented to collect them in the desired log analytics platform. In Azure Kubernetes Service, the simplest solution is to rely on the out-of-the-box capabilities of Container Insights, that automatically collects stdout and stderr output container logs in the associated Azure Log Analytics workspace resource thanks to its containerized monitoring agents. Alternative solutions to collect and forward logs outside Azure to other analytics platforms (like for example Elasticsearch) are more complex. They usually require to use a log forwarder application (like for example FluentBit) that can be setup to run as daemon set with a pod in each cluster node to collect and forward stdout logs or as deployment to read logs from a persistent volume storage and forward them.

In the next sections I am going to describe how to enhance the collection of stdout logs of AKS cluster containers to avoid a fragmented collection of multiline logs and how to query them in the Azure Monitor targeting each source separately.

Monitoring Sitecore on AKS – Grafana Data Sources Configuration and Dashboards Collection

In the previous post of this series I described the steps to create a new monitoring namespace in an existing Sitecore AKS cluster and to install two important tools for metrics collection and data visualization: Prometheus and Grafana. In this blog post I will describe how to enhance and configure Grafana data sources to collect metrics from internal and external AKS cluster resources. I will also describe and share a comprehensive collection of Grafana dashboards to visualize the collected metrics.

Monitoring Sitecore on AKS – Prometheus and Grafana Installation and Windows Node Host Metrics Collection

When working with a containerized distributed platform like Sitecore, it is important to be able to monitor metrics of all critical resources in its AKS cluster. The Container Insights monitoring feature available in AKS provides valuable data for the platform and the containerized application, but lacks deeper metric collection at workload level. For this reason, a couple of years ago Microsoft launched a native capability to integrate Azure Monitor with Prometheus, a popular open-source metric monitoring solution.

Prometheus collects metrics as time series: a series of timestamped values belonging to the same metric and indexed in time order. The metrics collection occurs via a pull model over HTTP from collection targets identified dynamically through service discovery or via static configuration. Metrics from a target can be exposed in Prometheus metrics data format implementing a custom HTTP endpoint using a client library (there are many available in different languages, including .NET / C#) or using a third-party exporter that collects, converts and exposes the metrics over HTTP.

Prometheus metrics data can be queried in the Prometheus Web UI using its own PromQL query language. Or, for a much more intuitive and robust experience, they can be visualized in Grafana, an open source data visualization solution, that allows to query, visualize and alert on data collected from multiple sources, including the Prometheus API.

Both Prometheus and Grafana are two important must-have additions to the monitoring stack of an AKS cluster. In the next sections, I will describe how to install and configure Prometheus and Grafana in an existing AKS cluster, how to export host metrics from a Windows node in the cluster and how to access them in the Azure Log Analytics workspace of the cluster and in Grafana.

Monitoring Sitecore on AKS – Initial Overview

With the release of the Sitecore 10 platform a couple of years ago, Sitecore started to support running applications in containers and provided guidance on how to deploy a containerized Sitecore application to a Kubernetes cluster. Since then, Kubernetes specification files are now included in each new Sitecore release and a full example of the deployment process has been shared in the newly launched Sitecore MVP site GitHub repository.

The containerized approach is meant to compete with the established Sitecore PaaS approach, that instead consists in running the Sitecore platform in Microsoft Azure cloud platform using PaaS resources (like for example app services and app service plans). The path to adopt Sitecore containerized applications in production is still in its early phase, but it is gradually growing and progressing.

An area that could speed up the adoption and help with building confidence in running Sitecore on Kubernetes in production is monitoring. For Sitecore PaaS, monitoring has always been an integrated part of the delivered solution, thanks to the Sitecore Application Level Monitoring module (distributed as ARM templates on the Sitecore Azure Quickstart Templates repository) and the embedded instrumentation for Application Insights for all Sitecore role instances. With the introduction of containers, monitoring assets have been removed (or disabled) from the Sitecore application and there is no direct documentation support on how to procure and configure monitoring tools in Kubernetes. This choice has made the Sitecore containerized application agnostic of a particular hosting platform (AKS, EKS, GKE, …), but at the same time has removed a must-have feature for managing a complex application in production.

How to Deploy a Module to an Existing Sitecore 10 Instance on AKS: The Init Jobs and The Data Initialization Containers

Have you already tried to install a Sitecore module package in a Sitecore instance running on Azure Kubernetes Service or in your local containerized environment? If you have, you probably already know that the installation process fails, because the application user identity doesn’t have the permission to write the uploaded module package in the packages data folder. The Sitecore documentation describes the recommended approach to add a Sitecore module to the images of a containerized solution using the module asset images, and this approach works well for a development environment where external services (SQL server, Solr, Redis) run in containers. But for a production environment running in Azure Kubernetes Service, the external services will likely not run in a container but they will use PaaS or IaaS resources instead, like for example databases running in an Azure SQL Server elastic pool resource. In this production-like scenario, a different solution is required to deploy data assets of a module to an existing external service layer and this solution consists in using data initialization containers that run in Kubernetes initialization jobs. In this blog post I will share my learnings about initialization jobs and data initialization containers and the steps needed to deploy a Sitecore module to an existing Sitecore instance running on Azure Kubernetes Service.

Jenkins Pipelines for Sitecore 10 on AKS – Part 2: The Deploy Pipeline

This post is the second blog post of a mini series where I describe the steps needed to implement a set of Jenkins pipelines to build and deploy the code of a containerized Sitecore 10 custom solution to an existing Azure Kubernetes Services cluster resource. In the first blog post I explored in details how to setup the Jenkins build pipeline. In this blog post I am going to illustrate the second and last part of this automated deployment process: the deploy pipeline.

Jenkins Pipelines for Sitecore 10 on AKS – Part 1: The Build Pipeline

In my last blog post I shared my experience with the installation process of a clean Sitecore 10 solution on Azure Kubernetes Services. The next natural step is to explore how to automatically deploy a custom containerized solution to my AKS cluster, using Jenkins CI/CD tool. This blog post is the first of two posts, where I will describe the steps needed to implement Jenkins pipelines to build and deploy a containerized solution to Azure Kubernetes Services. Let’s start in this post with the build pipeline.

Sitecore 10 on Azure Kubernetes Service: My Learning Journey, the Installation Process and a Simple Solution to Save Money

At the end of last summer Sitecore 10 has been released with a very detailed installation guide on how to deploy a containerized Sitecore application to the Azure Kubernetes Service (AKS) and with a complete deployment package with Kubernetes specification files. In this blog post I am going to share how I approached my learning journey with Kubernetes and AKS, I will describe an issue that I encountered during the Sitecore installation and its resolution, and finally how you can save your money starting and stopping an existing Sitecore AKS cluster when needed.

Sitecore Docker: How to Restore Databases Using Backups in a SQL Container

I have been experimenting with running Sitecore on Docker containers for a couple of months now and I am having a lot of fun with it. If you work with many clients at the same time like I do, running Sitecore on Docker brings the big benefit to setup and spin up a client environment in few minutes, simplifying the local environment setup process and eliminating the conflicts that might rise hosting multiple versions of Sitecore instances on a single host machine. If you haven’t tried it yet, I highly recommend to do it.

In this blog post I will describe how to override the default command executed in a Docker container running a sitecore-<topology>-sql image to restore Sitecore databases using SQL database backups generated from a non-Docker SQL server.

Eureka!

A blog about my personal discoveries with Sitecore, .NET, DevOps and more.

Category: Docker & Kubernetes