Altering Git History
The Problem
I have a personal project that I want to publish on GitHub without sensitive data, while preserving the rest of my code and my commit history. Although the project itself is long dead and its passwords are useless, there are personal email addresses, and API keys that shouldn’t be published.
Two Approaches
- Initial Approach: Since I’m moving the code to a new repository, I could just copy the files, sanitize them by hand, and push a single “initial commit” to the public repository.
- Alternate Approach: I can try to sanitize my repository by altering my
git
history.
The initial approach accomplishes my objective to publish the code without the sensitive parts, but won’t preserve commit history. Let’s investigate the alternate approach.
An Alternate Approach
The native way is a git command called git-filter-branch
, used to run filters on git trees. This command is available wherever git
is installed, but the documentation actually suggests a tool called BFG Repo-Cleaner for what I want to do here, so let’s look at that.
Judging by its documentation, BFG is simpler to use than git-filter-branch
. BFG provides an easy way to filter strings and files from your commit history without having to write scripts yourself.
Altering History
With a tool in hand, let’s give this a try. Here’s how I did it:
- Before using BFG to clean up the project history, I had to put the repo’s latest commit into my desired state. (With filtering already done.) Use a standard
git clone
orgit pull
to ensure the latest content, then make those changes. - Once the changes are made, commited, and pushed back to the git server, check out a “mirror copy” of the repository, using the
--mirror
flag:git clone --mirror {repo_url}
- Finally, it’s time to run BFG:
java -jar ~/path/to/bfg.jar --replace-text {text_file_name.txt} {repo.git}
. The file{text_file_name.txt}
contains strings to redact, each on its own line. (See below.) - Finally, push the contents of
repo_name.git
back to the server:
$> cd repo_name.git
$> git push
Notes
Bare vs Mirror
-
A “bare” copy is your repository containing its
git
state, but without your actual source files. Think of this as the contents of{repo_name}/.git
in{repo_name}
. -
A “mirror” copy includes whatever
--bare
does, plus some extra information. (Following the BFG documentation, used--mirror
.)
Customizing Text Replacements
The file passed into --replace-text
may look like this:
password
someAPIKey123
aUsernameABC
By default, each string is replaced with ***REPLACED***
. According to BFG’s author, you can customize replacements by adding an arrow and the replacement string, like so:
password==>***REDACTED***
someAPIKey123==>{api_key}
aUsernameABS==>
Removing Files
Remove files with:
$> java -jar ~/path/to/bfg.jar --delete-files {file_names} {repo_name.git}
Sanity Check
Double check your work by comparing any commits where a redacted string was modified or introduced. If you see the replacement strings, you’re all set.