Application insight real life, avoid sending garbage to AI instance

Application Insight is a real nice azure service that allows you to instrument your application to have a complete set of  telemetries and problem diagnostics. Usually when you start using Application Insight, you also have some log in place (log4net, serilog) and you want to integrate logs with AI so some kind of logs (Ex errors) will be automatically send to AI.

All these log libraries allows you to write a sink / appender to send logs to your chosen destination, thus, sending all your serilog / log4net logs to application insight is usually a matter of minutes.

Integrating existing log library with Application Insight is a matter of few minutes of code, or just referencing some existing libraries

Even if this seems a perfect solution it is not without problem, as an example it is possible to have too much logs sent to AI from log library. In a software I’m developing there is a poller with 50ms interval on a collection in MongoDb database. For various reason a wrong record got in the collection, we have deserialization error in some of the clients, and Boom, we have an exception log each 50 ms for 4 machines, and it is 288k of logs for each hour. Net result is that the instance of Application Insight was full of unnecessary and identical logs.

Another potential problem is logging every mongo query (polling will flood ai with lots of identical query with less than 1 ms execution time), this will simply pollute your AI instance of rubbish data.

Luckily enough, AI has the ability to intercept any logs to filter it out before it is sent to azure, the secret is implementing ITelemetryProcessor interface, because it allows you to filter out unwanted logs.

public void Process(ITelemetry item)
{
    if (item is DependencyTelemetry dependencyTelemetry)
    {
        if (!OkToSend(dependencyTelemetry))
        {
            return;
        }
    }
    else if (item is ExceptionTelemetry exceptionTelemetry)
    {
        if (!OkToSend(exceptionTelemetry))
        {
            return;
        }
    }
    else if (item is EventTelemetry eventTelemetry)
    {
        if (!OkToSend(eventTelemetry))
        {
            return;
        }
    }
  _next.Process(item);

As you can see, implementing process method leaves complete freedom on filtering the telemetry, because if you do not want it to be sent to AI, simple do not call _next.Process. 

Application insight is really extendible and have tons of extensibility points to customize information. In this article I’m dealing with the ITelemetryProcessor that allow you to manipulate and optionally completely discard a telemetry before it is sent to the server.

Once you create your telemetry processor you simply need to add to the default chain. For this example I have a simple component that does a couple of things, it will avoid flooding AI with identical errors and avoid to track any mongo dependencies that last for less than a certain number of ms.

var builder = TelemetryConfiguration.Active.TelemetryProcessorChainBuilder;

Int32 limitQueryLogInMs = 50;
if (Int32.TryParse(MongoDependencyMinDurationInMilliseconds ?? String.Empty, out Int32 limitDuration))
{
    limitQueryLogInMs = limitDuration;
}

builder.Use((next) => new JarvisApplicationInsightFilter(next, limitQueryLogInMs));
builder.Build();

This solution was developed because in this project, the customer is starting using AI mainly to monitor for problems so we are interested only in slow mongo dependencies.

The logic inside OkToSend is 100% custom for your need. In this example we want to filter out any dependencies that have some special string in the name, we want also to filter out all dependencies that have url custom property with a certain string. The first two line assures that any dependency that failed will be sent to AI without any further filtering (we do not want to miss a failed dependency because it falls under duration filter)

private bool OkToSend(DependencyTelemetry dependencyTelemetry)
{
    //always log dependency that failed,
    if (dependencyTelemetry.Success != true)
        return true;

    //ok filter out
    if (_forbiddenTelemetryStrings.Any(u => dependencyTelemetry.Data.IndexOf(u, StringComparison.OrdinalIgnoreCase) >= 0))
    {
        return false;
    }

    if (dependencyTelemetry.Properties.TryGetValue("Url", out var url))
    {
        if (_forbiddenUrl.Any(u => url.IndexOf(u, StringComparison.OrdinalIgnoreCase) >= 0))
        {
            return false;
        }
    }

The second feature of this processor is avoid flooding AI if an exception trace goes south. The goal is avoiding too many identical exception in a short time range; if some part of the application is experiencing errors, we want at max 1 exception trace in a given interval. An error is symptom that something is wrong, if the Db is not reachable and I’m polling each 50ms, an exception each 5 minutes is perfectly ok and it is far better than an exception each 50ms.

It is always a good practice to limit the number of logs you send to Application Insight, to avoid using too much bandwidth and clutter the server with garbage

For this reason we have a simple antiflood component to monitor how often we are sending telemetry identified with a give string, that uniquely identify the call. As an example if I configure my antiflood with 5 minutes interval, whenever I call antiflood.ShouldPass method with a given string key, it returns true not more than one time each 5 seconds for the same string.

To have an unique string for an exception I use three properties, Loggername (the name of the class that is logging), Servicename (the name of the logical microservices that is logging) and finally User (if present is the name of the user that request a command, called api etc). The goal is avoid flooding the server, but also avoiding to miss some important exception.

private bool OkToSend(ExceptionTelemetry exceptionTelemetry)
{
    //Some exception can be rogue, like pollers, they tend to flood the log with unuseful exception.
    String exceptionKey = "EXCEPTION/";
    var propertyNames = new[] { "Loggername", "ServiceName", "User" };
    foreach (var propertyName in propertyNames)
    {
        if (exceptionTelemetry.Properties.TryGetValue(propertyName, out var propertyValue))
        {
            //a single logger can go rogue.
            exceptionKey += propertyValue + "/";
        }
    }

    exceptionKey += exceptionTelemetry.Message.Substring(0, Math.Min(exceptionTelemetry.Message.Length, 50));
    return _antiflooder.ShouldPass(exceptionKey);
}

If a single poller gone wild, I got no more than 1 exception each 5 minutes, but if the entire db is down, I got more exception because each 5 minutes I can have an exception for each logger, services and unique user. Usually this lead to hundreds of exception each minute, but this is a clear indication that something catastrophic is happening (db is down and every single component that depends from the db is failing).

Finally, the antiflooder maintains count of the number of discharged exceptions, and a timer will evict each string after the threshold time passed. This will allows me to log how many exception were really logged during the antiflood interval.

private void Antiflooder_EventEvicted(object sender, Antiflooder.AntiFlooderEntryEvictedEventArgs e)
{
    if (e.Count > 1)
    {
        TraceTelemetry traceTelemetry = new TraceTelemetry();
        traceTelemetry.SeverityLevel = e.Entry.StartsWith("EXCEPTION") ? SeverityLevel.Error : SeverityLevel.Information;
        traceTelemetry.Message = $"Antiflood got N°{e.Count} occurrences of {e.Entry} in an interval of {e.GetElapsedSeconds()} seconds";
        ApplicationInsightConfiguration.Instance.GetTelemetryClient().TrackTrace(traceTelemetry);
    }
}

If I have more than one calls, it means that the antiflood discharged some logs, but I want to trace how many error logs were really generated.

With few lines of code I was able to avoid my application to send tons of unnecessary traces (dependency and errors) to AI instance, keeping only interesting traces.

I was really amazed by the flexibility of Application Insight library, because it allows me to have full control of the trace pipeline with few line of code. This example was made with C#, but you can obtain the same result with SDK for other tecnologies.

Gian Maria

Use the right Azure Service Endpoint in build vNext

Build vNext has a task dedicated to uploading files in azure blob, as you can see from Figure 1:

Sample build vNext that has an Azure File Copy task configured.

Figure 1: Azure File Copy task configured in a vNext build

The nice parte is the Azure Subscription setting, that allows to choose one of the Azure endpoint configured for the project. Using service endpoint, you can ask to the person that has password/keys for Azure Account to configure an endpoint. Once it is configured it can be used by team members with sufficient right to access it, without requiring them to know password or token or whatever else.

Thanks to Services Endpoints you can allow member of the team to create builds that can interact with Azure Accounts without giving them any password or token

If you look around you find a nice blog post that explain how to connect your VSTS account using a service principal.

SAmple of configuration of an Endpoint for Azure with Service Principal

Figure 2: Configure a service endpoint for Azure with Service Principal Authentication

Another really interesting aspect of Service Endpoints, is the ability to choose people that can administer the account and people that can use the endpoint, thus giving you full security on who can do what.

Each Service endpoint has its security setting to specify people that can administer or read the endpoint

Figure 3: You can manage security for each Service Endpoint configured

Finally, using Service Endpoint you have a centralized way to manage accessing your Azure Subscription Resources, if for some reason a subscription should be removed and not used anymore, you can simply remove the endpoint. This is a better approach than having data and password or token scattered all over the VSTS account (builds, etc).

I’ve followed all the steps in the article to connect your VSTS account using a service principal, but when it is time to execute the Azure File Copy action, I got a strange error.

Executing the powershell script: C:\LR\MMS\Services\Mms\TaskAgentProvisioner\Tools\agents\default\tasks\AzureFileCopy\1.0.25\AzureFileCopy.ps1
Looking for Azure PowerShell module at C:\Program Files (x86)\Microsoft SDKs\Azure\PowerShell\ServiceManagement\Azure\Azure.psd1
AzurePSCmdletsVersion= 0.9.8.1
Get-ServiceEndpoint -Name 75a5dd41-27eb-493a-a4fb-xxxxxxxxxxxx -Context Microsoft.TeamFoundation.DistributedTask.Agent.Worker.Common.TaskContext
tenantId= ********
azureSubscriptionId= xxxxxxxx-xxxxxxx-xxxx-xxxx-xxxxxxxxxx
azureSubscriptionName= MSDN Principal
Add-AzureAccount -ServicePrincipal -Tenant $tenantId -Credential $psCredential
There is no subscription associated with account ********.
Select-AzureSubscription -SubscriptionId xxxxxxxx-xxxxxxx-xxxx-xxxx-xxxxxxxxxx
The subscription id xxxxxxxx-xxxxxxx-xxxx-xxxx-xxxxxxxxxx doesn't exist.
Parameter name: id
The Switch-AzureMode cmdlet is deprecated and will be removed in a future release.
The Switch-AzureMode cmdlet is deprecated and will be removed in a future release.
Storage account: portalvhdsxxxxxxxxxxxxxxxxx1 not found. Please specify existing storage account

This error is really strange because the first error line told me:

The subscription id xxxxxx-xxxxxx-xxxxxx-xxxxxxxxxxxxx doesn’t exist.

This cannot be the real error, because I’m really sure that my Azure Subscription is active and it is working everywere else. Thanks to the help of Roopesh Nair, I was able to find my mistake. It turns out that the Storage Account I’m trying to access is an old one created with Azure Classic Mode, and it is not accessible with Service Principal. A Service Endpoint using Service Principal can manage only Azure Resource Manager based entities.

Shame on me :) because I was aware of this limitation, but for some reason I completely forgot it this time.

Another sign of the problem is the error line telling me: Storage account xxxxxxxxx not found, that should ring a warning bell about the script not being able to find that specific resource, because it is created with classic mode.

The solution is simple, I could use a Blob Storage created with Azure Resource Manager, or I can configure another Service Endpoint, this time based on a management certificate. The second option is preferrable, because having two Service Endpoint, one configured with Service Principal and the other configured with Certificate allows me to manage all type of Azure Resources.

Configure an endpoint with certificate is really simple, you should only copy data from the management certificate inside the Endpoint Configuration and you are ready to go.

Configuration of an endpoint based on Certificate

Figure 4: Configure an Endpoint based on Certificate

Now my build task Azure File Copy works as expected and I can choose the right Service Endpoint based on what type of resource I should access (Classic or ARM)

Gian Maria

Where is my Azure VM using Azure CLI?

Azure command line interface, known as Azure CLI is a set of open source, cross platform commands to manage your Azure resources. The most interesting aspect of these tools is that they are Cross Platform and you can use them even on Linux boxex.

After you imported your certificate key to manage your azure account you can issue a simple command

azure vm list

To simply list all of your VM of your account.

image

Figure 1: Output of azure vm list command

If you wonder why not of all your VMs are listed, the reason is, Azure CLI default to command mode asm, so you are not able to manage resources created with the new Resource Manager. To learn more I suggest you to read this couple of articles.

Use Azure CLI with Azure Resource Manager
Use Azure CLI with Azure Service Management

If you want to use new Resource Manager you should switch with the command:

azure config mode arm

But now if you issue the vm list command, probably you will get an error telling you that you miss authentication. Unfortunately you cannot use certificate to manage your account (as you can do  with Azure PowerShell or Azure CLI in config mode asm). To authenticate with Azure Resource Manager mode you should use the command

azure login

But you need to use an account created in your Azure Directory, and not your primary Microsoft Account (at least mine does not work). This article will guide you on creating an user that can be used to manage your account. Basically you should go in old portal, get to the Active Directory Page and create a new account. Then from the global setting pane you should add that user to the Subscription Administrator group. Please use a really strong password, because that user can do everything with your account.

When you login correctly into Azure from CLI, you can use the same command to list VMs, but now you will see all VMs that are created with the new Resource Manager.

image 

Figure 2: In arm mode you are able to list VM created with the new Resource Manager.

This happens also in standard Azure Portal GUI, because you have two distinct node for Virtual Machine, depending if they are created with Azure Service Management or Azure Resource Manager.

image

Figure 3: Even in the portal you should choose which category of VM you want to manage

Gian Maria

Where is my DNS name for Azure VM with new Resource manager?

Azure is changing management mode for resources, as you can read from this article and this is the reason why, in the new portal, you can see two different entry for some of the resources, ex: Virtual Machines.

image

Figure 1: Classic and new resource management in action

Since the new model has more control over resources, I’ve create a linux machine with the new model to do some testing. After the machine was created I opened the blade for the machine (machine create with the new model are visible only on the new portal) and I noticed that I have no DNS name setting.

image

Figure 2: Summary of my new virtual machine, computer name is not a valid DNS address

Compare Figure 2 with Figure 3, that represents the summary of a VM created with the old resource management. As you can see the computer name is a valid address in domain cloudapp.net

image

Figure 3: Summary of VM created with old resource management, it has a valid DNS name.

Since these are test VM that are off most of the time, the IP is chaning at each reboot, and I really want a stable friendly name to store my connection in Putty/MremoteNG.

From the home page of the portal, you should see the resource manager group where the machine was created into. If you open it, you can see all the resources that belongs to that group. In the list you should see your virtual machine as well as the IP Address resource (point 2 in Figure 4) that can be configured to have a DNS name label. The name is optional, so it is not automatically setup for you during machine creation, but you can specify it later.

SNAGHTML280a6d

Figure 4: Managing ip addresses of network attached to the Virtual Machine

Now I set the name of my machine to docker.westeurope.cloudapp.azure.com to have a friendly DNS for my connection.

Enjoy.