Limit maximum memory of SQL Server in TFS environment

If you are a small shop using on-premise TFS, probably you’ll have a single machine installation for Data and App tier for your TFS. While installing a Build server on App or Data tier is highly discouraged, using a single machine for Data and App tier is a viable solution for small and medium team, and with virtualization is quite easy to move machine to a more powerful hardware or give it more RAM if the usage or TFS increase performances starts to degrade.

If you are interested in a really good article about how to configure TFS for performance, this post cover all you need.

In this post I want to point out one aspect that is quite often underestimated but is really critical on single server installation.

SQL Server tends to consume all the RAM of the system, and if you are not limiting maximum memory that it can use, you usually have problem ranging from poor performances to malfunction. On a customer site, suddenly, build controller was disconnected from the server; looking at event viewer in TFS Machine we found that some WCF services failed to start because there is less than 5% of free RAM. Obviously almost all RAM in the system was used by SQL Server. In that installation Sql server max memory limit was not set, and the server slowed down gradually until some part (the build in this situation) stopped working.

If your TFS is a Single Server installation, start limiting SQL Server Memory size to half the RAM, then after some real usage, verify if the system still has free RAM, and gradually give more memory to SQL. This method will prevent SQL From stealing RAM to App Tier.

Limiting memory is crucial even if SQL Server is in a dedicated machine. This article : How much memory does my Sql Server actually need is a good article on the subject. Remember also that if Reporting Services are installed on the same machine you should take this in consideration. Even if Sql Server Database is the only role on the machine limiting is needed. A rough formula given by Grant is the following one.

reserve: 1 GB of RAM for the OS, 1 GB for each 4 GB of RAM installed from 4–16 GB, and then 1 GB for every 8 GB RAM installed above 16 GB RAM

If your SQL Server is on a 32 GB RAM machine, you should configure it to use 25 GB Max, with16 GB  the right value is 11 GB, with 8 GB limit is 5 GB and finally if you have 4 GB of RAM the right value is 2 GB.

Gian Maria.

Tfs2015 Build agent error: Access denied: xxxxx\yyyyy needs Listen permissions for pool zzzzz to perform the action

Tfs 2015 introduces a completely new and redesigned build system and one of the most important change is new lightweight agent system. Instead of installing TFS and then configure Build, to create a new agent you only need to download a zip file, uncompress and launch a PowerShell script. Another great advantage is the ability to run the agent as a service, or running it interactively in a simple console application.

If you configure a new agent you can check that everything is ok in TFS Control panel, in the new Agent pools tab. The new agent should be listed and it is Red if not active, Green if up and running.

image

Figure 1: Management of Pool and Agent in TFS Configuration

If the agent is red even if you launched the agent, you should check logs in the _diag folder.

image

Figure 2: Logs are placed in _diag folder

You should be able to understand and fix errors looking at the log. If you run the agent interactively, it could be that your user has no right permission to listen to the pool.

17:28:46.531831 Microsoft.TeamFoundation.DistributedTask.WebApi.AccessDeniedException: Access denied. CYBERPUNK\Administrator needs Listen permissions for pool Fast to perform the action. For more information, contact the Team Foundation Server administrator.

In this situation the user Administrator is in the TFS Administrator Group and it should have any permission, but new Build System is slightly different. The user that runs the agent, must be part of the Agent Pool Service Account, or it will not be able to run the agent

image

Figure 3: Permissions for Agent Pools

Simply adding the user to the AgentPoolService account should fix authorization problem

image

Figure 4: Agent is up and running.

Gian Maria

User added to Team Project have no permission after upgrade from TFS2010 to TFS2013

I’ve performed an upgrade from TFS2010 to TFS2013 at a customer site last week. The upgrade consisted in moving to a different machine and from a Workstation to an Active Directory Domain. The operation was simple, because the customer uses only Source Control and they want to spent minimal time in the operation, so we decided for this strategy

  1. Stop TFS in the old machine
  2. Backup and restore db in the new machine
  3. Upgrade and verify that everything works correctly

They do not care about user remapping, or reporting services or other stuff, they just want to do a quick  migration to new version to use local workspaces new feature (introduced with TFS 2012). The do not care to remap old user to new user, they only care not to spend too much time in the upgrade.

The upgrade went smoothly, but we start facing a couple of problem. The first one is: after the migration, each team project has no user, because the machine is now joined to a domain with different users, but if we add users to a team project, they are not able to connect to team project, and they seems to have no permission. All the users that are Project collection Administrators can use TFS with no problem.

The reason is simple, in TFS2012 the concept of Teams was introduced in the product. Each Team Project can have multiple Teams and when you add users from the home page of the Team, you are actually adding people to a TFS Group that correspond to that Team. For each Team Project a default Team with the same name of the Team Project is automatically created.

image

Figure 1: Users added to Team through home page.

In the above picture, I’ve added two user to the BuildExperiments Team, we can verify this in the Settings page of the Team Project.

image

Figure 2: User added through the home page, are added to the corresponding Tfs Security Group

To understand the permission of that users, you should use the administration section of TFS, as you can see from Figure 3, BuildExperiments team has no permission associated.

image

Figure 3: Permission associated to the Team Group

The reason for this is: the Team is not part of the Contributors TFS Group, it can be verified from the Member Of part of group properties

image

Figure 4: Team group belongs only to the Project Valid User

When you create a new Team Project, the default team (with the same name of the Team Project) is automatically added to the Contributors group, it is that team that gives user the right to access the Team Project. To fix the above problem you can manually add the Team Tfs Group to the Contributors group using the Join Group button. Once the Team group is added to the Contributors group, all the people you add with web interface are now able to access the Team Project.

This behavior is the standard in TFS, if you create a new Team, the Ui suggests you to choose to add the new Team Group to an existing group to inherit permission.

image

Team 5: Default option for a new group is to be part of the Contributors group.

This is an optional choice, you can choose a different security group or you can choose no group, but you should then remember to explicitly add permission to the corresponding Team Group.

When people does not access TFS and you believe that they should, always double check all the groups they belong and the effective permissions associated to them.

Gian Maria.

Error TF53001: The database operation was canceled by an administrator

A customer updated his TFS 2010 to 2013 in a new machine running Windows Server 2012 R2 and Sql Server 2014. Everything went fine, until after few days they started having an error whenever he tried to do a GetLatest or a Check-in or Check-out operation.

Error TF53001: The database operation was canceled by an administrator

Actually this error is not really informative, so I asked them to verify Event Viewer on the server (an operation you should always do whenever you have wrong behavior of your TFS). For each client operation that gave error they have this Event Error logged

Log Name:      Application
Source:        MSSQL$SQL2014TFS
Date:          19/02/2015 17:15:54
Event ID:      17310
Task Category: Server
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      xxxx.xxx.local
Description:
A user request from the session with SPID 70 generated a fatal exception. SQL Server is terminating this session. Contact Product Support Services with the dump produced in the log directory.
Event Xml:

This is an internal error of Sql Server, and we verified that SQL 2014 was in RTM, with no cumulative update installed. After installing latest Cumulative Update for SQL Server 2014 everything started working again. Since Cumulative Update usually address bugs in Sql Server product, it is always a good practice to keep your Sql Server up to date, and if you are experiencing strange Sql error, it could be the solution to your problems.

Gian Maria.

Build controller and agent ready but icon shows stopped

Today I encountered a strange error during the configuration of a Build Controller in TFS. We installed and configured the first Build Controller for a TFS Instance, everything went good, but both controllers and agent are marked with stopped icon, even if status is “ready”

image

Figure 1: Controller and agents are marked as stopped even if they are in Ready State

I immediately looked into Event Viewer, but absolutely no clue of what is happening. I tried creating and scheduling a build, but it starts, then remains silent forever. The build system was not working. I remember a post by Richard where he had the same problem, but I’m not in that scenario. I checked DNS, tried to ping the server and everything is ok, but builds never starts and there are absolutely no error in event viewer.

Then I noticed that in the upper section of the Build Server there is another link called Details… that usually is not there. If I clicked on that link it told me that the controller is not able to communicate with TFS because he got a 500 internal error response.

This is extremely painful, because it means that something in the Application Tier is not working properly, so I immediately remote desktop into the TFS machine and looked at the Event Viewer of the server. This time the error is there and luckily enough it was simple to fix.

System.ServiceModel.ServiceHostingEnvironment+HostingManager/42931033
 Exception: System.ServiceModel.ServiceActivationException: The service '/tfs/queue/test/Services/v4.0/MessageQueueService2.svc' cannot be activated due to an exception during compilation.  The exception message is: Memory gates checking failed because the free memory (176160768 bytes) is less than 5% of total memory.  As a result, the service will not be available for incoming requests.  To resolve this, either reduce the load on the machine or adjust the value of minFreeMemoryPercentageToActivateService on the serviceHostingEnvironment config element.. ---> System.InsufficientMemoryException: Memory gates checking failed because the free memory (176160768 bytes) is less than 5% of total memory.  As a result, the service will not be available for incoming requests.  To resolve this, either reduce the load on the machine or adjust the value of minFreeMemoryPercentageToActivateService on the serviceHostingEnvironment config element.
   at System.ServiceModel.Activation.ServiceMemoryGates.Check(Int32 minFreeMemoryPercentage, Boolean throwOnLowMemory, UInt64& availableMemoryBytes)
   at System.ServiceModel.ServiceHostingEnvironment.HostingManager.CheckMemoryCloseIdleServices(EventTraceActivity eventTraceActivity)
   at System.ServiceModel.ServiceHostingEnvironment.HostingManager.EnsureServiceAvailable(String normalizedVirtualPath, EventTraceActivity eventTraceActivity)
   --- End of inner exception stack trace ---
   at System.ServiceModel.ServiceHostingEnvironment.HostingManager.EnsureServiceAvailable(String normalizedVirtualPath, EventTraceActivity eventTraceActivity)
   at System.ServiceModel.ServiceHostingEnvironment.EnsureServiceAvailableFast(String relativeVirtualPath, EventTraceActivity eventTraceActivity)
 Process Name: w3wp
 Process ID: 2768

This is a typical error you can encounter if you install TFS in a single machine configuration. If you follow general guidance on MSDN the single server approach is ok for groups up to 500 users, with 4 GB of ram and 1 disk at 10k. Single server maintenance is easier and for small team is probably the best configuration, but you need to be aware of one possible problem: SQL Server is greed about memory.

The problem is that SQL Server tends to use all available memory, until the system starts becoming really, really slow because it has no free memory for other processes. Whenever you install TFS in a single machine environment, is a good suggestion to limit maximum amount of memory available to SQL Server, leaving space for the AT to work properly. I have no gold number to give you, but if you have a single machine with 4 GB of RAM, usually I limit SQL Server to a maximum of 2 GB. In this specific situation I remember talking about this configuration, but it was never done; this results in SQL Server using about 3 GB of RAM in a 4 GB machine, leaving no space for WCF Service to starts.

Lesson learned: Whenever something goes wrong in TFS, always have a look at events viewer of all machines involved in the process (AT, SQL, Build, etc) because root error could originates in another machine and not in the one you are looking at. As a rule of thumb, if something went wrong, always look at the AT machine Event Viewer.

Gian Maria.