Altering Git History
The Problem
I have a personal project that I want to publish on GitHub without sensitive data, while preserving the rest of my code and my commit history. Although the project itself is long dead and its passwords are useless, there are personal email addresses, and API keys that shouldn’t be published.
Two Approaches
- Initial Approach: Since I’m moving the code to a new repository, I could just copy the files, sanitize them by hand, and push a single “initial commit” to the public repository.
- Alternate Approach: I can try to sanitize my repository by altering my
githistory.
The initial approach accomplishes my objective to publish the code without the sensitive parts, but won’t preserve commit history. Let’s investigate the alternate approach.
An Alternate Approach
The native way is a git command called git-filter-branch, used to run filters on git trees. This command is available wherever git is installed, but the documentation actually suggests a tool called BFG Repo-Cleaner for what I want to do here, so let’s look at that.
Judging by its documentation, BFG is simpler to use than git-filter-branch. BFG provides an easy way to filter strings and files from your commit history without having to write scripts yourself.
Altering History
With a tool in hand, let’s give this a try. Here’s how I did it:
- Before using BFG to clean up the project history, I had to put the repo’s latest commit into my desired state. (With filtering already done.) Use a standard
git cloneorgit pullto ensure the latest content, then make those changes. - Once the changes are made, commited, and pushed back to the git server, check out a “mirror copy” of the repository, using the
--mirrorflag:git clone --mirror {repo_url} - Finally, it’s time to run BFG:
java -jar ~/path/to/bfg.jar --replace-text {text_file_name.txt} {repo.git}. The file{text_file_name.txt}contains strings to redact, each on its own line. (See below.) - Finally, push the contents of
repo_name.gitback to the server:
$> cd repo_name.git
$> git push
Notes
Bare vs Mirror
-
A “bare” copy is your repository containing its
gitstate, but without your actual source files. Think of this as the contents of{repo_name}/.gitin{repo_name}. -
A “mirror” copy includes whatever
--baredoes, plus some extra information. (Following the BFG documentation, used--mirror.)
Customizing Text Replacements
The file passed into --replace-text may look like this:
password
someAPIKey123
aUsernameABC
By default, each string is replaced with ***REPLACED***. According to BFG’s author, you can customize replacements by adding an arrow and the replacement string, like so:
password==>***REDACTED***
someAPIKey123==>{api_key}
aUsernameABS==>
Removing Files
Remove files with:
$> java -jar ~/path/to/bfg.jar --delete-files {file_names} {repo_name.git}
Sanity Check
Double check your work by comparing any commits where a redacted string was modified or introduced. If you see the replacement strings, you’re all set.