Optimize your local git repository from time to time

Git is an exceptional piece of software and I really cannot think living without it. Now with superfast SSD, you can use git without performance problem even for large repositories (maybe you converted an old Subversion or TFVC).

When your repository has thousands of object and especially if you adopt flows of work where you rebase often, probably your repository has large number of unnecessary object that can be deleted safely. Git runs for you in the background a special command called

git gc

It is a Garbage Collection command, that will cleanup your local repository, nothing changes in the remote repository. Sometimes you can also run manually a deeper cleaning of your repository with the command

 

git gc --aggressive

You can run from time to time (documentation say you can run every few hundreds commits), it will take longer that a normal git gc, but it will help keeping your local git repository fast even if you work with really large codebase.

Gian Maria.

Check pull request with build, without enforcing pull request

With TFS / VSTS Build system it is possible to configure Git to require that a specific branch is protected, and you need to use Pull Requests to push code into it, and the pull request can be accepted only if a specific build is green. Here is the typical configuration you can do in admin page for your Git repositories.

image

Figure 1: Branch policies in VSTS/TFS

In Figure 1 it is represented the configuration for branch policies; in this specific configuration I require a specific build to run whenever a member create a pull request against develop branch. The effect is: if a developer try to directly push to develop branch, the push is rejected.

image

Figure 2: Push was rejected because Branch Policies are defined for develop branch.

Branch Policies are used to force using pull requests to reintegrate code to specific branches, and sometimes it could be too restrictive. For small teams, it could be perfectly reasonable to avoid always forcing pull request and letting the owner of the feature branch to decide if the code needs a review. Such relaxed policies is also used when you start introducing Pull Request to the team, instead of telling everyone that they will be completely unable to push against a branch except from a Pull Request, you can gradually introduce Pull Request concept doing Pull Request only for some of the branches, so the team can be familiar with the tool.

Always be gentle when introducing new procedure in your process, being too restrictive usually creates resistance, unless the new procedure is enforced and requested by all  members of the team

Luckily enough it is simple to obtain such result, in Figure 1 you should see an option “Block Pull Requests Completion unless there is a valid build”. Simply unchecking that checkbox will allows you to normally push on develop branch, but if you create a pull request, you will have a build that verify the correctness of the code.

Gian Maria.

Update GitVersion for large repositories

As you know I’m a fanatic user of GitVersion in builds, and I’ve written a simple task to use it in a TFS Build automatically. This is the typical task that you write and forget, because it just works and you usually not have the need to upgrade it. But there is a build where I start to see really high execution timing for the task, as an example GitVersion needs 2 minutes to run.

image

Figure 1: GitVersion task run time it is too high, 2 minutes

This behavior is perfectly reproducible on my local machine, that repository is quite big, it has lots of tags, feature branches and seems that GitFlow needs lot of time to determine semantic versioning.

Looking in the GitHub page of the project, I read some issue regarding performance problem, the good part is that all performance problem seems to disappear with the newer version, so it is time to upgrade the task.

Thanks to the new build system, updating a task in VSTS / TFS is really simple, I just deleted the old GitVersion executable and libraries from the folder where the task was defined, I copied over the new Gitversion.exe and libraries and with a tfx build tasks upload command I can immediately push the new version on my account.

Since I changed GitVersion from version 2 to version 3, it is a good practice to update the Major number of the task, to avoid all build to automatically use the new GitVersion executables, because it could break something. What I want is all build to show me that we have a new version of the task to use, so you can try and stick using the old version if Gitversion 3.6 gives you problems.

Whenever you do major change on a TFS / VSTS Build task it is a good practice to upgrade major number to avoid all builds to automatically use the new version of the task.

Thanks to versioning, you can decide if you want to try out the new version of the task for each distinct build.

image 

Figure 2: Thanks to versioning the owner of the build can choose if  the build should be upgraded to use the new version of GitVersion task.

After upgrading the build just queue a new build and verify if the task runs fine and especially if the execution time is reduced.

image

Figure 3: New GitVersion executable have a real boost in performance.

With few minutes of work I upgraded my task and I’m able to use new version of GitVersion.exe in all the builds, reducing the execution time significantly. Comparing this with the old XAML build engine and you understand why you should migrate all of your XAML Build to the new Build System as soon as possible.

Gian Maria.

Bash pornography

Suppose you have two Git repositories that are in sync with a VSTS build, after some time the usual problem is that, branches that are deleted from the “main” repository are not deleted from mirrored repository.

The reason is, whenever someone does a push on main repository, VSTS build push the branch to the other repository, but when someone deleted a branch, there is no build triggered, so the mirrored repository still maintain all branches.

This is usually not a big problem, I can accept that cloned repository still maintain deleted branches for some time after a branch is deleted from the original repository and I can accept that this kind of cleanup could be scheduled or done manually periodically. My question is:

In a git repository with two remotes configured, how can I delete branch in remote B that are not present in remote A?

I am pretty sure that there are lots of possible solutions using node.js or PowerShell or other scripting language. After all we are programmer and it is quite simple to issue a git branch –r command, parsing the output to determine branches in repository B that are not in repository A, then issue a git push remote –delete command.

Since I use git with Bash even in Windows, I really prefer a single line bash instruction that can do everything for me.

The solution is this command and I must admit that it is some sort of Bash Pornography :D.

diff --suppress-common-lines <(git branch -r | grep origin/ | sed 1d | sed "s/origin\///")  " | sed "s/>\s*//" | xargs git push --delete vsts

This is some sort of unreadble line, but it works. It is also not really complex, you only need to consider the basic constituent, first of all I use the diff command to diff two input stream. The general syntax is diff <(1) <(2) where 1 and 2 are output of a bash command. In the above example (1) is:

(git branch -r | grep origin/ | sed 1d | sed "s/origin\///")

This command list all the branches, then keep only lines that start with origin/ (grep origin/ ), then removes the first line (sed 1d) because it contains origin HEAD pointer, and finally remove the trailing origin/ from each line (sed “s/origin\///”). If I run this I got something like this.

SNAGHTML16d0ce1

Figure 1: Output of the command to list all branches in a remote

This is a simple list of all branches that are present on remote origin. If you look at the big command line, you can notice that the diff instruction diffs the output of this command and another identical command that is used to list branches in VSTS.

The output of the diff command is one line for each branch that is in VSTS and it is not in origin, so you can pipe this list to xargs, to issue a push –delete for each branch.

Now you can use this command simply issuing a git fetch for each remotes, then paste the instruction in your bash, press return and the game is done.

image

Figure 2: Results of the command, two branches are deleted from VSTS remote because they were deleted from the origin remote

It is amazing what you can do with bash + awk + sed in a single line 😛

Gian Maria.

Keep Git repository in sync between VSTS / TFS and Git

Scenario: you have a repository in Git, both open source or in private repository and you want to keep a synchronized mirror in VSTS / TFS.

There are some legitimate reason to have a mirrored repository between Github or some external provider and an instance of VSTS / TFS, probably the most common one is keeping all development of a repository private and publish in open source only certain branches. Another reason is having all the code in Github completely in open source, but internally use VSTS Work Item to manage work with all the advanced tooling VSTS has to offer.

The solution to this problem is really simple, just use a build in VSTS that push new commits from GitHub to VSTS or the opposite. Lets suppose that you have a GitHub repository and you want it to be mirrored in VSTS.

Step 1 – install extension to manipulate variables

Before creating the build you should install Variable Toolbox extension from the marketplace. This extension allows you to manipulate build variable and it is necessary if you use GitFlow.

From the list of Build Variables available in the build system there are two variables that contains information about the branch that is to be build. They are  called Build.SourceBranch and Build.SourceBranchName, but noone of them contains the real name of the branch. The SourceBranch contains the full name refs/heads/branchname while SourceBranchName contains the last path segment in the ref. If you use gitflow and have a branch called hotfix/1.2.3 the full name of the branch is refs/heads/hotfix/1.2.3 and the variable SourceBranchName contains the value 1.2.3 …. not really useful.

Thanks to the Variable Toolbox Extension you can simple configure the task to replace the refs/heads part with null string, so you can have a simple way to have a variable that contains the real name of the build even if it contains a slash character.

Step 2 – configure the build

The entire build is composed by three simple task, the very first is a Transform Value task (from Variable Toolbox ) followed by two simple command line.

SNAGHTML3a85c0

Figure 1: The entire build is three simple tasks.

The first task is used to remove the refs/heads/ part from the $(Build.SourceBranch) and copy the result to the GitBranchName variable (you should have it defined in the variables tab).

image

Figure 2: Transformation variable configured to remove refs/heads

Now we need a first command line task that checkout the directory, because the build does not issue a checkout in git, but it simple works in detatched HEAD.

image

Figure 3: Checkout git through commandline

As you can see in Figure 3 this operation is really simple, you can invoke git in command line, issuing the command checkout $(GitBranchName) created in precedent step, finally you should specify that this command should be executed in $(Build.SourcesDirectory).

The last command line pushes the branch to a local VSTS Repository.

image

Figure 4: Git command line to push everything on VSTS

The configuration is really simple, I decided to push to address https://$(token)@myaddress.visualstudio.com. Token variable (2) is a custom secret variable where I store a valid Personal Access Token that has right to access the code. To push on remote repository the syntax $(GitBranchName):$(GitBranchName) to push local branch on remote repository with –force option to allow forcing the push.

Do not forget to make your token variable as a secret variable and configure the continuous integration to keep syncronized only the branch you are interested to.

image

Figure 5: Configure branches you want to keep syncronized

If you need also to keep tags syncronized between builds you can just add another command line git invokation that pushes all tags with the push –tags option.

The result

Thanks to this simple build, whenever we push something on GitHub, a build starts that automatically replicate that branch in VSTS without any user intervention.

image

Figure 5: Build result that shows command line in action during a build.

Thanks to the free build minutes on the hosted build, we have a complete copy in VSTS of a GitHub repository with automatic sync running in few minutes.

The very same configuration can be reversed to automatically push to GitHub some branches of your VSTS account, useful if you want to publish only some branches in open source, automatically.

Gian Maria.