Troubleshoot “service unavailable” in TFS

Yesterday I’ve started an old virtual machine with an old version of TFS and when I try to access the instance I got a “Service Unavailable” error.


Figure 1: General Service Unavailable error for TFS Web interface

This error happens most of the time if you have wrong user credentials in the worker process used by IIS to run the TFS Application. To verify this assumption, you can simply open IIS Manager console and verify the status of the worker process that is used to run IIS (Figure 2)


Figure 2: The application pool in IIS is stopped.

As you can verify from Figure 2, the IIS app pool used to run service is stopped, if I tried to start again the pool, it immediately stopped again. This is usually the symptom of  bad authentication, this means that the pool is running with wrong user credentials. You can verify this in Event Viewer log, but absolutely avoid messing with the setting of the Application Pools directly, this is a task that should be demanded to the administration console.

As a general rule, you should never manually edit the configuration used for TFS in your IIS instance, everything should be done through the correct command

To fix this error open the Admin console and verify the Service Account used to run your TFS Instance


Figure 3: Application tier configuration, you can view Service Account user.

You can run Service Account with standard NETWORK SERVICE account, but I prefer using specific domain account, because I have more control on how all TFS Services will authenticate on the other machine of the domain. I changed the password of that account a couple of month ago, but that specific VM was never updated with the new credentials.

This is something that can happens in a domain, especially if you care about security and you force every account to change password every certain number of months. In this scenario my TFS instance cannot start again because it was still configured with the old password, but you can fix it with a couple of clicks.


Figure 4: Reapplying account can save your days when password of service user account of TFS was changed.

If you look in figure 4, the solution is really simple, because the Reapply Account command gives you the ability to re-enter the new password for the account, use the test function to verify that it is correct and once you press OK, the administration console takes care of everything.


Figure 5: Result of re-applying the account.

As you can see in Figure 5, the account is used not only in the application Tier, but also for Message Queue, TFSJobAgent and so on. This is the reason why I warned you not to fix the credential in IIS manually, doing this does not fix every place where wrong authentication are used.

Everything was green now green in the console, so I immediately tried to access the instance again, to verify that indeed everything is up and running again.


Figure 6: Everything is up and running again.

Gian Maria.

VSTS / TFS: use Wildcards for continuous integration in Git

If you setup a build in VSTS / TFS against a git repository, you can choose to trigger the build when some specific branch changed. You can press plus button and a nice combobox appears to select the branch you want to monitor.


Figure 1: Adding a branch as trigger in VSTS / TFS Build

This means that if you add feature/1312_languageSelector, each time a new commit will be pushed on that branch, a new build will trigger. Actually if you use GitFlow you are interested in building each feature branch, so you do not want to add every feature branch in the list of monitored branches.

If you look closely, it turns out that the combo you use to choose branches admit wildcards. If you look at Figure 2, you can verify that, pressing the add button, a new line is added and you have a textbox that allow you to “Filter my branches”.


Figure 2: Filter branch text

Actually the name is misleading, because you can simply write feature/* and press enter, to instruct VSTS to monitor each branch that starts with feature/.

Using wildcards in trigger list allows you to trigger a build for every feature branch.

For people not used to VSTS, this feature goes often unnoticed, because they think that the textbox can be used only to filter one branch. With this technique you can trigger a build for each change in each standard branch in GitFlow process.


Figure 3: Trigger with wildcards

As you can see in Figure 3, I instructed TFS / VSTS to build develop, master and all hotfix, feature or release branches.

Gian Maria

Increase RAM for ElasticSearch for TFS Search

If you are experiencing slow search in TFS with the new Search functionality based on ElasticSearch a typical suggestion is to give more RAM to the machine where ES is installed. Clearly you should use HQ or other tools to really pin down the root cause but most of the time the problem is not enough RAM, or slow disks. The second cause can be easily solved moving data to an SSD disk, but giving more RAM to the machine, usually gives more space for OS File cache and can solve even the problem of slow disk.

ElasticSearch greatly benefit from high amount of RAM, both for query cache and for operating system cache of Memory Mapped Files

There are really lots of resources in internet that explain perfectly how ElasticSearch uses memory and how you can fine tune memory usage, but I want to start with a quick suggestion for all people that absolutely does not know ES, because one typical mistake is giving to ES machine more RAM but leaving ES settings unaltered.

Suppose I have my dedicated search machine with 4GB of RAM, the installation script of TFS configured ES instance to use a fixed memory HEAP of half of the available RAM. This is the most typical settings that works well in most of the situations. Half the memory is dedicated to Java Heap and gives ES space for caching queries, while the other 2 GB of RAM will be used by operating system to cache File access and for standard OS processes.

Now suppose that the index is really big and the response time starts to be slow, so you decided to upgrade search machine to 16 GB of RAM. What happens to ES?


Figure 1: Statistics on ES instance with HQ plugin shows that the instance is still using 2GB of heap.

From Figure 1 you can verify that ES is still using 2 GB of memory for the Heap, leaving 14 GB free for file system cache and OS. This is clearly not the perfect situation, because a better solution is to assign half the memory to java HEAP.

ElasticSearch has a dedicated settings for JAVA Heap usage, do not forget to change that setting if you change the amount of RAM available to the machine.

The amount of HEAP memory that ES uses is ruled by the ES_HEAP_SIZE environment variable, so you can simply change the value to 8G if you want ES to use 8 GB of Heap memory.


Figure 2: Elastic search uses an environment variable to set the amount of memory devoted to HEAP

But this is not enough, if you restart ElasticSearch windows service you will notice that ES still uses 1.9 GB of memory for the HEAP. The reason is: when you install ES as service, the script that install and configure the service will take the variable from the environment variable and copy it to startup script. This means that even if you change the value, the service will use the old value.

To verify this assumption, just stop the ElasticSearch service, go to ES installation folder and manually starts ElasticSearch.bat. Once the engine started, check with HQ to verify that ES is really using use 8GB of ram (Figure 2).


Figure 3: ES now uses the correct amount of RAM.

To solve this problem, open an administrator prompt in the bin directory of ElasticSearch, (you can find the location from the service as shown in previous post) and simply uninstall and reinstall the service. First remove the service with the service remove command, then immediately reinstall again with the service install command. Now start the service and verify that ES is using correctly 8GB of RAM.

When ElasticSearch is installed as a service, settings for Memory Heap Size are written to startup script, so you need to uninstall and reinstall the service again for ES_HEAP_SIZE to be taken in consideration

If you are interested in some more information about ES and RAM usage you can start reading some official article in ES site like:

I finish with a simple tip, ES does not love virtual machine dynamic memory (it uses a Fixed HEAP size), thus it is better to give the machine a fixed amount of ram, and change HEAP size accordingly, instead of simply relying on Dynamic Memory. 

Gian Maria.

Monitor TFS Search data Usage

In previous post I’ve explained how to move searches component in a different machine in a TFS 2018 installation, now it is time to understand how to monitor that instance.

First of all you should monitor the folder that physically contains data, in my installation is C:\tfs\ESDATA (it is a parameter you choose when you configure the Search Server with Configure-TFSSearch.ps1 -Operation install PowerShell script). Data folder should be placed on a fast disk, usually SSD are the best choice. If the SSD will ever fail, you can always restart ES with empty data folder and use scripts to force TFS to reindex everything.

Data in ElasticSearch can be wiped away without problem, like WareHouse database, its content can always be restored.

If you want to have a better idea on what is happening to your search instance you can install plugins to easy management. First step: connect to the machine where the Search Service is running and locate the folder where ElasticSearch was installed. A simple trick is checking corresponding Windows Service.


Figure 1: Locate ElasticSearch binaries from Windows Service

Now simply open a command prompt and change directory to the bin installation directory of ElasticSearch, from here you can install various plugin to simplify ES management, I usually starts with the HQ plugininstallable with the simple command

plugin install royrusso/elasticsearch-HQ/v2.0.3

You should check the correct version in HQ home site, version 2.0.3 is the latest version that works with ES 2.4.1, the version used by TFS Search Service. For this command to being able to run, the machine should have an internet connection and GitHub site should not be blocked.


Figure 1: HQ was correctly installed

Now you can just browse to http://localhost:9200/_plugin/HQ to open the web interface of HQ plugin and connect to the actual instance.


Figure 3: HQ plugin up and running and ready to be connected to local instance of ES.


Figure 4: Home page of HQ, where you can find the total size used by the server (1) and you can also have some Node Diagnostics (2)

From Node Diagnostics tool you can find if some statistics are not good, in my example I have search server in a server with slow disk and query time is sub-optimal. Usually ES is supposed to answer in less than 50 milliseconds, in my installation I have an average time of 660ms.


Figure 5: Statistics on search.

If you move the mouse over the number, it can give you some hint on the reason why the specific number is calculated and why is not considered to be good.


Figure 6: Help on the various number gives you an idea on what is wrong and how you could solve this.

If you are experienced slow search, there are many factors that can impact the instance of ElasticSearch and using a good plugin for diagnose the real root cause of the problem is usually the best approach.

I suggest you to navigate in HQ interface to familiarize with the various information it can gives to you, especially clicking on the name of a single node (there is only one node in my installation), you can get some interesting data on that single node, especially on RAM usage.


Figure7: Ram usage of a single node.

There are lots of other plugins that can show lots of useful information about your ElasticSearch instance, the installation is similar to HQ and you can install and use them with very few clicks.

Gian Maria.

TFS 2018 search components in different server

When it is time to design topology of a TFS installation, for small team the single server is usually the best choice in term of licensing (one one Windows Server license is  needed) and simplicity of maintenance. Traditionally the only problem that can occur is: some component (especially the Analysis and Reporting services) starts to slow down the machine if the amount of data starts to become consistent.

Single machine TFS installation is probably the best choice for small teams..

The usual solution is moving Analysis Service, cube and Warehouse database on a different server. With such a configuration if the analysis service has a spike in CPU, Disk or Memory usage, the main machine with operational DB and AT continue to work with standard performance. The trick is leaving the core services in a server that is not overloaded with other tasks, this is the reason why running a build agent in TFS machine is usually a bad idea.

With TFS 2017 a new search component is introduced, based on ElasticSearch. ES is de facto the most used and most advanced Search Engine on the market, but it tends to eat CPU RAM and Disk if the workload is big. The suggestion is to start configuring search on the very same machine with AT and DB (Single server installation) and move search components if you start to notice that ES uses too many RAM or is using too much CPU when it is indexing data. Moving search components on another machine is a really simple process.

First of all you need to remove the search feature, using the remove feature wizard as shown in Figure 1 (select the first node in administration console to find the Remove Feature Link)


Figure 1: Remove feature from TFS instance

Now you should choose to remove the search feature.


Figure 2: Removing the Search Service functionality

After the wizard finished, you should go to the folder C:\Program Files\Microsoft Team Foundation Server 2018\Search\zip and with powershell run the command

Configure-TFSSearch.ps1 -Operation remove

to completely remove Elastic Search from your TFS instance.

Do not forget to use Configure-TFSSEarch PowerShell script to completely remove any trace of Elastic Search from the TFS machine

Now search component is not working anymore, if you issue a search you should get a message like shown in Figure 3:


Figure 3: Search component is not operational anymore.

At this point you open TFS administration console and start the wizard to configure the search, but this time, instead of configuring everything on the current machine, you will choose to use an existing Search Service.


Figure 4: Use an existing search service instead of install locally the search service

If you see in Figure 4, the instructions to install search service on another computer are really simple, you need to click the “search service package” link in the wizard to open a local folder that contains everything to setup ElasticSearch and the search service. Just copy content of that folder on another machine, install java and set the JAVA_HOME environment variable and you are ready to install Search Service.

You can find a README.txt file that explain how to configure the search service, just open a PowerShell console and then run the command

Configure-TFSSearch.ps1 -Operation install -TFSSearchInstallPath C:\ES -TFSSearchIndexPath C:\ESDATA

Please change the two folder if you want to change the location of ElasticSearch binary and ElasticSearch data. After the script ran without any error, it is time to verify that ElasticSearch is correctly installed and started. As usual, please create a DNS entry to avoid using the real name of the machine where you installed the service. In my installation I’ve configured the name tfssearch.cyberpunk.local to point to the real machine where I configured search services. Just open a browser and issue a request to http://tfssearch.cyberpunk.local:9200


Figure 5: Try to access local instance of ElasticSearch using DNS name

Please pay attention at firewall configuration, because ElasticSearch has no security in base installation and everyone can mess up and read data just browsing in port 9200

Now you should open your firewall to allow connection to port 9200 from every machine where an Application Tier is running. In my situation the machine with TFS installation has IP number Remember that Elastic Search has NO Authentication (it is a module that is not free, called shield), thus it is really better to allow connections only from the TFS machine. All you need to do is creating a rule to open port 9200, but allowing connection only from the IP of the TFS machines.


Figure 6: Firewall configuration, the machine with the search service opens port 9200 only from the IP of the TFS machine.

Now remote desktop to TFS machine, and verify that you are indeed able to browse http://tfssearch.cyberpunk.local:9200, this confirm that configuration of the firewall allows TFS to contact the search service. Then try to access the very same address from another computer in the network and verify that it CANNOT access ElasticSearch instance. This guarantees that no one in the network can access Elastic Search directly and mess up with its data.

Now you can proceed with the Configuration Wizard in TFS instance, specifying the address of your new Search Server


Figure 7: Configure new search service in TFS.

Proceed and finish the wizard. At the end TFS machine will re-index all the data in the new search server, just wait some minutes and you will be able to use again search, but now all Search Components are on a dedicated server.

Gian Maria.