Sharing External Code using Git Submodules

This page is for anyone modifying code that is part of an external which was brought in via a git submodule.  Git submodules used for larger external projects where ACME development is expected to take place in the external repository, such as MPAS.  For other applications, see git subtree.      

Before delving into the details of the workflow, we'll explain a bit about what submodules are and some of their benefits. A submodule is in essence a nested git repository within another git repository. The parent repository stores a file which tells git that it has a child repository, what it's clone URL is, what commit the parent wants from within that repository, and where the repository should be clone to (i.e. the local path within the parent repository). For example, if the external was MPAS-O the information the parent might store is:

  1. Clone URL: git@github.com:ACME-Climate/MPAS.git
  2. Hash: 7621a8cb
  3. Local path: components/mpas-o/model

When the submodule is first cloned into the parent, the current has the parent wants will be checked out. One of the major benefits of using submodules is that the child repository is a complete git repository. This means, a developer of the code brought in via an external can cd into the submodule, add remotes, change branches, and perform their regular workflow without modifying things in the parent. One major downside of using submodules is they can be very fragile if not coupled with a through workflow.

This page will describe the workflow for working with submodules, along with how to create them, and update the code our parent repository wants.


Submodule Setup (One Time Only)

Once the parent repository is clones, the following steps can be used to setup a new submodule.

  1. Add the submodule: git submodule add --name <name> -- <clone_url> <local_path>
    1. e.g. git submodule add --name mpaso -- git@github.com:ACME-Climate/MPAS.git components/mpas-o/model
  2. Ensure the correct commit is checked out: cd <local_path>; git checkout <ref> (<ref> can be a hash, tag, or a branch)
    1. e.g. cd components/mpas-o-model; git checkout v3.3
  3. Add the submodule changes to the parent repo: cd <parent_root>; git add <local_path>
    1. e.g. cd ACME; git add components/mpas-o/model
  4. Commit the changes to the parent repo: git commit

Now, once these changes are pushed, anyone who clones the repo or checks out the branch these changes live on, there will initially be an empty directory where the submodule will go, until the submodule is initialized.


Using a submodule (i.e. ensuring you have the correct code in the submodule)

Initially after cloning a repository (i.e. ACME) you'll have submodules that don't contain any code in them. To initialize the submodules, you can use the following command:

git submodule update --init --recursive

This will recursively initialize all submodules within the parent and all of it's children repositories. You only need to use the --init flag the first time you're setting up the submodules.

Subsequently, after changing the hash in the parent repository, you will want to ensure the version of the submodules is correct for the version of the parent. To do this, you can use the following command:

git submodule update --recursive

This will ensure each submodule and their children have fetched and checked out the correct hash.


Contributing changes to the external through a submodule

One of the nice things about submodules is you can develop on branches within them, and add remotes to get changes and push changes to and from alternate repositories. However, not everyone within ACME will have access to all of the remotes you add (they won't be stored in any way that lets other people have them by default).

In general, code should be contributed through the submodule using the workflow for the external. For example, when making changes to MPAS-O, you should follow the MPAS workflow.

NOTE: When changing the code in a submodule, the parent changes that depend on the child changes cannot be permanently fixed in the parents history (i.e. merged to master) until they are permanently fixed in the childs repository.


Incorporating new changes in the submodule

Sometimes, a developer will want to update the code in the submodule to make the coupled model use a more recently (or perhaps bug-fixed) version of the child repository.

NOTE: Before adding the new changes to the submodule in the parent repo, it's best practice to ensure the default repository pointed to by the parent contains the new hash. For all MPAS components this will be ACME-Climate/MPAS to provide access to all ACME developer with new private MPAS developments, but in the future this might change to MPAS-Dev/MPAS-Release.

Again, as the submodule is a git repository itself, this is as simple as checking out the correct hash in the submodule and commiting that change to the parent repo, using the following steps.

  1. Ensure the correct commit is checked out: cd <local_path>; git checkout <ref>
    1. e.g. cd components/mpas-o/model; git checkout v3.3
  2. Add the submodule change to the parent repo: cd <parent_root>; git add <local_path>
    1. e.g. cd ACME; git add components/mpas-o/model
  3. Commit the changes to the parent repo: git commit

After these changes are pushed, anyone who checks out the newly pushed commits and runs git submodule update will have the newer version of the child repository.