The Problem

I have a personal project that I want to publish on GitHub without sensitive data, while preserving the rest of my code and my commit history. Although the project itself is long dead and its passwords are useless, there are personal email addresses, and API keys that shouldn’t be published.

Two Approaches

  • Initial Approach: Since I’m moving the code to a new repository, I could just copy the files, sanitize them by hand, and push a single “initial commit” to the public repository.
  • Alternate Approach: I can try to sanitize my repository by altering my git history.

The initial approach accomplishes my objective to publish the code without the sensitive parts, but won’t preserve commit history. Let’s investigate the alternate approach.

An Alternate Approach

The native way is a git command called git-filter-branch, used to run filters on git trees. This command is available wherever git is installed, but the documentation actually suggests a tool called BFG Repo-Cleaner for what I want to do here, so let’s look at that.

Judging by its documentation, BFG is simpler to use than git-filter-branch. BFG provides an easy way to filter strings and files from your commit history without having to write scripts yourself.

Altering History

With a tool in hand, let’s give this a try. Here’s how I did it:

  1. Before using BFG to clean up the project history, I had to put the repo’s latest commit into my desired state. (With filtering already done.) Use a standard git clone or git pull to ensure the latest content, then make those changes.
  2. Once the changes are made, commited, and pushed back to the git server, check out a “mirror copy” of the repository, using the --mirror flag: git clone --mirror {repo_url}
  3. Finally, it’s time to run BFG: java -jar ~/path/to/bfg.jar --replace-text {text_file_name.txt} {repo.git}. The file {text_file_name.txt} contains strings to redact, each on its own line. (See below.)
  4. Finally, push the contents of repo_name.git back to the server:
$> cd repo_name.git
$> git push

Notes

Bare vs Mirror

  • A “bare” copy is your repository containing its git state, but without your actual source files. Think of this as the contents of {repo_name}/.git in {repo_name}.

  • A “mirror” copy includes whatever --bare does, plus some extra information. (Following the BFG documentation, used --mirror.)

Customizing Text Replacements

The file passed into --replace-text may look like this:

password
someAPIKey123
aUsernameABC

By default, each string is replaced with ***REPLACED***. According to BFG’s author, you can customize replacements by adding an arrow and the replacement string, like so:

password==>***REDACTED***
someAPIKey123==>{api_key}
aUsernameABS==>

Removing Files

Remove files with:

$> java -jar ~/path/to/bfg.jar --delete-files {file_names} {repo_name.git}

Sanity Check

Double check your work by comparing any commits where a redacted string was modified or introduced. If you see the replacement strings, you’re all set.