Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]

Open sourcing code into a separate git repository

March 19th, 2023

Two male Rohde's leaf frogs on a branch, one reaching over the other.
Source: Renato Augusto Martins; CC BY-SA 4.0

Suppose you have a code repository that you want to open source. You go ahead and add all your license, contributor agreement, README, and any other files, but you also have a few changes that you know will remain internal. That is, you know that your open source'd code will not be the authoritative copy, but a fork of your closed copy that you will have to track separately.

Now git is supposed to help you with this, and you know you want a local branch as well as a second remote repository, but git is notoriously unintuitive at times, so here's the quick set up for this scenario:

We start out with a simple git repository with just a single remote and our main branch:

$ ls
Makefile   README.md  doc        private    src        test
$ git branch
* main
$ git remote -v
origin  ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (fetch)
origin  ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (push)
$ 

Now let's create a new branch for open sourcing this code. This will become the authoritative open source upstream, so we want to remove the private data and add our open source files. But note: simply using git rm would leave the private file(s) still in the git metadata, and thus would be leaked to the open source repository. Instead, we need to perform some more invasive git surgery using git filter-branch:

$ git checkout -b opensource
Switched to a new branch 'opensource'
$ export FILTER_BRANCH_SQUELCH_WARNING=1
$ git filter-branch --tree-filter 'rm -fr private' HEAD 
Rewrite a98f9f39cba2be0268feee5082272405b433dc53 (1/1) (0 seconds passed, remaining 0 predicted)    
Ref 'refs/heads/opensource' was rewritten
$ $EDITOR LICENSE CONTRIBUTING
$ git add LICENSE CONTRIBUTING
$ git commit -m 'ready for open source'
[opensource 611db4c] ready for open source
 2 files changed, 2 insertions(+)
 create mode 100644 CONTRIBUTING
 create mode 100644 LICENSE
$ 

Next, we initialize the new remote repository. E.g., on GitHub, you might create a new (empty) repository. This will be the repository that we push our open source branch into. We then add the new repository as a second remote and push our branch there as the main branch:

$ git branch
  main
* opensource
$ git remote add github git@github.com:jschauma/second-remote-example.git
$ git remote -v
github  git@github.com:jschauma/second-remote-example.git (fetch)
github  git@github.com:jschauma/second-remote-example.git (push)
origin  ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (fetch)
origin  ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (push)
$ git push --set-upstream github opensource:main             # "main" here is the remote "main"
Enumerating objects: 25, done.
Counting objects: 100% (25/25), done.
Compressing objects: 100% (20/20), done.
Writing objects: 100% (25/25), 9.32 KiB | 1.86 MiB/s, done.
Total 25 (delta 3), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (3/3), done.
To github.com:jschauma/second-remote-example.git
 * [new branch]      opensource -> main
branch 'opensource' set up to track 'github/main'.
$ 

And there you go, we're all set. Your main branch remains untouched, you can sync changes from there into your opensource branch or pull in changes from others from GitHub.

On second thought, let's not go to Camelot...

Note that information in the git history / reflog easily leaks. In fact, the first draft of this blog post did just use git rm and thus leaked the private information. If you want to be on the safe side, then it's better to have two actually separate repositories:

$ ls
Makefile   README.md  doc        private    src        test
$ git branch
* main
$ git remote -v
origin  ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (fetch)
origin  ssh://git@git.our-internal.example.com:/~jschauma/whatever.git (push)
$ cd ..
$ git clone git@github.com:jschauma/second-remote-example.git
Cloning into 'second-remote-example'...
warning: You appear to have cloned an empty repository.
$ cd second-remote-example
$ cp -R ../whatever/* .
$ rm -fr private
$ $EDITOR LICENSE CONTRIBUTING
$ git add *
$ git commit -m 'open source version of whatever'
[main (root-commit) ce7145b] open source version of whatever
 13 files changed, 887 insertions(+)
[...]
$ git push
Enumerating objects: 19, done.
Counting objects: 100% (19/19), done.
Compressing objects: 100% (16/16), done.
Writing objects: 100% (19/19), 8.91 KiB | 2.23 MiB/s,
done.
Total 19 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), done.
To github.com:jschauma/second-remote-example.git
 * [new branch]      main -> main
$ 

The disadvantage is that now you have two separate repositories that you have to keep in sync, but on the upside, you can't (quite as) easily leak data from the internal repository to the outside world. It just goes to show once more that when it comes to using git, it's never as easy as one might like...

March 19th, 2023

This page mainly exists because I forget the order and syntax of the right git commands. :-)


Links:


Previous: [Who reads your email?]  -- Next: [Whose Cert Is It Anyway?]
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]