Version control. A saviour of modern development and with everyone using git to manage their code nowadays (at least for the last 5 years) having an understanding of git is an essential part of being a developer. Recently I have become more and more involved in code review, using GitLab, Bitbucket and GitHub to manage code from different developers for different projects. This is made easier by using a workable workflow (something like GitFlow), however, as I am involved in more and more projects of endless dread when I start with project and it has unnecessary, large or sensitive files that have been committed into its history.

I totally understand that sometimes people make mistakes and commit things that they don’t want into a repository but how do you remove them, days, weeks, months after they were added. Version control will keep every part of your history, so you can end up with either gigantic repos (I’ve recently seen web projects with 2GB repos) and sensitive data (passwords mainly) committed and then removed from the project.

git-filter-branch

Now as we know to get rid of a file completely from version control (like it was never there) you need to rewrite history. git-filter-branch provides this letting you “rewrite Git revision history by rewriting the branches mentioned in the , applying custom filters on each revision”, however, doing this across large git repos with hundred or thousands of commits can both take a long time and also is not particularly flexible.

BFG

The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history. BFG Repo-Cleaner is a java aplication that takes so much of the effort and time out of running git-filter-branch. In fact, I never use git-filter-branch and BFG is one of my go to tools when taking over a new repo. The process is nicely outlined on the websites, but for future usage this is what I normally do. In this demo I am removing the file crazy-zip-file.zip.

Clone the repo

$ git clone --mirror git://example.com/some-big-repo.git

At this stage you might want to make a backup. Don’t say I didn’t warn you.

Run BFG to remove crazy-zip-file.zip

$ java -jar bfg.jar --delete-files crazy-zip-file.zip  my-repo.git

Expire old cache and prune repo

$ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive

Push back new repo, overwriting old repo

$ git push --mirror

As a word of warning, this will overwrite your existing repo. So anyone who has made any changes will not be able to merge them (without significant effort) and and commits since you took the clone will be removed. Also any other users who are using the repo will need to remove and re-clone the repo to continue work.