Introduction
In this two-part article series, I will help you learn to experiment with and test Git submodules in Azure DevOps. You are reading Part II. I will show you how to add, update, clone, and remove submodules in Git using the lab environment I set up in Part I.
You are reading the labs I give to fellow IT Pros who are responsible for maintaining and deploying Azure Firewall Policies and rules in Infrastructure as Code (IaC). Here, I focus on working with Git submodules. If you want to read a bit more about the intricacies of getting a pipeline to play nice with submodules, read Getting Git submodules in private Azure DevOps repositories to work in a pipeline | StarWind Blog (starwindsoftware.com)
Working with submodules
We continue where we left off in Part I. Launch Visual Studio Code and open the MainProjectOne folder. Make sure to have a PowerShell terminal open.
Adding a submodule
I add a submodule when I want the code, files, and artifacts from another repository to be available transparently in a subfolder in my main repository. The cool thing is it keeps the paths locally and in the remote repository the same, so I can do local deploys without worrying about my folder structure. You can add one or more. The repositories can be from the same or different locations (Azure DevOps, GitHub, Gitlab), from the same or other organizations or projects. If it is a public or private repository to which you can authenticate, you can add it as a Git submodule. I organize my submodules in a subfolder to show we can choose the path where you create it.
How to add a submodule
Navigate to the MainRepoProjectOne root folder and run:
1 2 3 4 5 |
<strong><em>md MySubModules</em></strong> <strong><em>ls</em></strong> <strong><em>git submodule add https://workinghardinit@dev.azure.com/workinghardinit/ProjectOne/_git/SubRepoProjectOne .\MySubModules\SubRepoProjectOne</em></strong> |
You receive a prompt to authenticate if you have not done that yet.
The result of your handy work looks like the image below.
How to verify if adding the submodule was successful
Now, to check if this worked, I first run
1 |
<strong><em>git status</em></strong> |
Notice that .gitmodules has appeared and that there is now a subfolder SubRepoProjectOne in MainRepoProjectOne/MySubModules. Look at its content.
You see the content from the repository SubRepoProjectOne in Azure DevOps WokingHardInIT/ProjectOne/SubRepoProjectOne we just added. If you run git status in that subfolder, you know that it is up to date.
Secondly, navigate to the root folder and look at the contents of .gitmodules. Notice the reference there for your submodule with the path and the URL.
WARNING!
I added the submodules via its URL. That works, but as it is a private repository, I have authentication issues with Azure DevOps pipelines. For a private repository that resides in the same DevOps organization, I should add it as a relative path. I have another article on just that subject.
You can change that path in the file or add it directly using a relative path.
1 |
<strong><em>git submodule add ../SubRepoProjectOne .\MySubModules\SubRepoProjectOne</em></strong> |
If the submodule already exists, you’d have to remove it first. So, in this case, editing the file is easier.
Adding a submodule with its private repository in another project in the same DevOps Organization works very similarly. Again, use a relative path to avoid issues with the Azure DevOps pipeline later. As the repo lives in another project (ProjectTwo) than the main repository (ProjectOne), we need to specify the path to the project as follows.
git submodule add ../../../ProjectTwo/_git/SubRepoProjectTwo .\MySubModules\SubRepoProjectTwo
You cannot add submodules that refer to a project in another DevOps Organization and expect the Azure DevOps pipeline to work. See article
Finally, check out the \.git\modules folder.
Under the .git folder, a modules folder that contains a subfolder for the submodule I just added should now exist.
I am now sure I added a submodule to my git repository successfully. Now, I should play with it and check out the behavior when adding, committing, and pushing items to the remote repositories.
Updating the main repo and its submodule(s)
Navigate to the submodule folder (MainRepoProjectOne\MySubModules\SubRepoProjectOne) and create some items.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
<strong><em>ni FileCreatedInSubModulesSubRepoProjectOne.html</em></strong> <strong><em>md NewSubfolderInSubModulesSubRepoProjectOne</em></strong> <strong><em>cd NewSubfolderInSubModulesSubRepoProjectOne</em></strong> <strong><em>ni NewFileInNewSubfolderInSubModulesSubRepoProjectOne.html</em></strong> <strong><em>git add .</em></strong> <strong><em>git commit -m "Added some items to my MainRepoProjectOne folder"</em></strong> <strong><em>git status</em></strong> |
As you can see in the picture above, we have committed these files to the local repository but have not pushed them out to the remote repository.
Please navigate to the main repository folder MainRepoProjectOne. Remember we have staged files to push to our main repo?
1 2 3 |
<strong><em>git add .</em></strong> <strong><em>git commit -m "Added some items to my MainRepoProjectOne folder"</em></strong> |
We have not yet pushed these to the remote repository either. For now, they only exist locally.
When you look at Azure DevOps, these do not exist there yet.
Now, push these to its remote repository.
git push
After refreshing the view, these items should be visible in the Azure DevOps repository.
Do note the SubRepoProjectOne link under MySubModules, which contains a link to the commit version ID of the repository it refers to, not the actual content. The content lives in its own repository, SubReposProjectOne.
So far, so good. Let’s take a peek at the Azure DevOps SubRepoProjectOne repository. The files and folders we created are not there yet, as our commit and push in the main repository did not touch that.
To push those to its repository, we must navigate to MainRepoProjectOne\MySubModules\SubRepoProjectOne and run Git push there.
Cool huh!
In general, add, commit, and push the changes in your submodule first.So, before you add, commit, and push the changes in the main repository. I did not do that here to demonstrate their relationship and behavior.
In collaboration scenarios, you would probably work directly on the submodule repository, not your main project’s submodule subfolder. You’d then manually or automatically pull in the changes in that repository to your submodule in the main repository. More about that later! So, this direct editing in the submodule folder is more of a demo than a life scenario. Still, it nicely demonstrates the submodule repository’s behavior in relation to the main repository.
Working directly on the repository of the submodules
Let’s pretend we are the maintainer of the repository of the submodule references.
Open your Git repo for SubRepoProjectOne and create a file
1 2 3 4 5 6 7 |
<strong><em>ni WorkingOnThisSubRepo.html</em></strong> <strong><em>git add .</em></strong> <strong><em>git commit -m "WorkingOnThisSubRepo.html added"</em></strong> <strong><em>git push</em></strong> |
You see that the newly created file is present when looking at the local folders and the repo in Azure DevOps.
Note that the WorkingOnThisSubRepo.html is not visible locally in the main projects submodule for SubRepoOneProject.
Yes, the WorkingOnThisSubRepo.html is visible locally and remotely in the SubRepoOneProject folder of the SubRepoOneProject Git folder, not in the submodules folder of the main project. Why is this? Well, look at the combined image of the Azure DevOps repositories below. Everything is indeed up to date in the repository for SubRepoProjectOne.
But when we look at it from the main repositories’ point of view, they are not. That is because those updates made to the SubRepoProjectOne repository directly are not there. You can see why this is as the SubRepoProjectOne submodule folder in MainRepoProjectOne still refers to the previous commit ID.
Try and run
1 |
<strong><em>git submodule update</em></strong> |
You notice this does not change anything. The trick to update this is to run
1 |
<strong><em>git submodule update --remote</em></strong> |
The latter refers to the version ID it is currently referencing, while –remote will grab the latest version from the Azure DevOps SubRepoProjectOne or any other submodule. That is the key! Another option would be to run Git pull in the MySubModules/ SubRepoProjectOne. But if you have more than one, that means more jumping around and more work.
Are you confused yet? That is normal. You have to do the lab work and play with it to gain a better understanding, but that is very much worth it!
Now running git submodule update –remote does have a side effect. It causes the detached HEAD situation for the submodule repository. As you can see below
Help, it says my Head is detached
Usually, the message that the “HEAD is detached at XXXXXX…” sounds very worry some. In Git, it can seem annoying but not as deadly as it sounds.
Your issue is that a git push won’t work as you might expect, but you can still do it. Let me demonstrate what I mean.
In our git submodule repository, execute:
1 2 3 4 5 6 7 |
<strong><em>ni IsMyHeadDetached.html</em></strong> <strong><em>git add . </em></strong> <strong><em>git commit -m "Added IsMyHeadDetached.html"</em></strong> <strong><em>git push</em></strong> |
You already get a warning during Git commit about the detached HEAD, but when pushing this to the remote repository, we get an error as Git has no clue what branch to send it to. The fix is what it says:
git push origin HEAD:master
Cool, the file is now in the remote repository as well.
However, the Git submodule folder (MySubModules\SubRepoProjectOne) on my workstation still has its HEAD detached at d69a7a7. And the most recent commit ID on the remote repo (SubRepoProjectOne) is 97cbe01.
Maybe we are OK with this, as the behavior is known, and you want to run this manually. That way, no newer or latest commit version of your submodule’s master ever gets deployed without your updating it explicitly. You are in control and use a well-known version of the submodule commit history.
But that can be tiresome, and your submodule remote repository’s master branch is supposed to contain only stable, production-ready items. So, what if we want to avoid the detached HEAD behaviors and ensure that our submodule’s repository always updates to the latest master branch version of its remote repository?
What you choose here depends on what behavior you want. If you wish for any approved pull request in the remote submodule repository to be pulled into your main repository’s submodule, you can achieve this as follows.
First, fix the fact that the submodule Git repo is detached.
1 2 3 4 5 |
<strong><em>cd path/to/submodule</em></strong> <strong><em># Normally and by default, you track the master branch</em></strong> <strong><em>git checkout master</em></strong> |
OK, the submodule’s git repository is no longer detached. Now we have two options to prevent this from happening next time you run git submodule update–remote.
Option 1: Use options in the command line
Navigate to the main repo root folder and execute
1 |
<strong><em>git submodule update --remote --merge</em></strong> |
or
1 |
<strong><em>git submodule update --remote --rebase</em></strong> |
Remembering this can be hard. So, to make sure this happens automatically, I can use a Git alias like the one below:
1 |
<strong><em>git config alias.subupdate 'submodule update --remote --merge'</em></strong> |
We can now update the submodule without getting into a HEAD detached at XXXXXXX using
1 |
<strong><em>git subupdate</em></strong> |
The thing is, now we need to remember to use the alias. Maybe there is a better way? Well, see option 2.
Option 2:configure this in the .git/config file
You can change the submodule update behavior in the .gitmodule file, which makes the command git submodule update -–remote execute Git submodule update –remote –merge or git submodule update –remote –rebase. There is no need to specify this explicitly; Git reads the instructions in the config file. I.e., it happens automatically. If you use this, document it and tell your team.
You can achieve this in two ways.
Edit the .gitmodulefile
[submodule “MySubModules/SubRepoProjectOne”]
path = MySubModules/SubRepoProjectOne
url = ../SubRepoProjectOne
update = merge
Configure it through the command line,
1 |
<strong><em>git config -f .gitmodules submodule. SubRepoProjectOne.update merge</em></strong> |
Why does a Git submodule get into a head-detached situation?
That is the default behavior when one executes git submodule update –remote. It has nothing to do with what branch you are tracking or are not tracking. It is a result of what that command does, which is:
git clone https://workinghardinit@dev.azure.com/workinghardinit/ProjectOne/_git/SubRepoProjectOne.\MySubModules\SubRepoProjectOne
cd .\MySubModules\SubRepoProjectOne\
git checkout <commit-id>
When you add a submodule, you clone the repo it refers to and then check out the commit ID from that submodule repository at that time. That is what leads to the Head’s detached state.
Does it become even messier with branches?
Branches can make things messier or more confusing when used in submodules. Use them when you have a good reason and know what you are doing. The chance is that here, combined with submodules, you won’t need them for many scenarios.
When you add a Git submodule to your main repository, the default branch it tracks is origin/master. That is what you need and want in most cases. People working on the repository you added as a submodule can and should use branches to organize their development work. But in the Git submodule, I’d stick with the master branch, which should contain the code, files, and artifacts that need deploying into production. Whatever is in there should be good to go. In branches, not so much. And, just in case you are wondering, tracking different branches or not has nothing to do with the detached behavior of submodules. That is just due to the nature of what a submodule is, as we have discussed above.
Updating nested submodules
It is possible to nest submodules. If you need to do that, the –recursive option is your friend! For example:
1 |
<strong><em>git submodule update --remote –recursive</em></strong> |
Updates all submodules, including the nested one.
Cloning a repository that contains one or more submodules
Cloning a main project that has submodules is pretty straightforward.
Git clone –recurse-submodules -j4 https://workinghardinit@dev.azure.com/workinghardinit/ProjectOne/_git/MainRepoProjectOne
It is pretty much what you’d expect. There is one potential issue, however. You might want or need to update the code from the submodule’s remote repository. Just like we have already seen, the content of the submodule is determined by the commit ID it references. Suppose it is older in the main repository’s ID than the most recent commit in the submodules repository. In that case, you must execute the git submodule update remotely to get the most recent commits down into your local submodule folder.
If that’s what you want for future clones, commit and push it up to the remote repository so it has the submodule referencing its repository’s most recent commit.
What if I forgot the –recurse-submodules option? If you forget, you will see that the submodule folders are empty. No worries, you can still fix this manually. Run:
1 |
<strong><em>git submodule update --init --recursive</em></strong> |
That beats going into every potentially nested submodule folder, running Git init there, and cloning the repo.
How do we ensure submodules always get cloned?
If you need this, Git has a configuration option.
1 |
<strong><em>git config --global submodule.recurse true</em></strong> |
By setting submodule.recurse to true in the global Git configuration, you enforce that all future repository cloning initialize and fetch submodules. That way, you cannot forget to do so, which could lead to issues with automation and workflows.
Shallow submodules
You can use shallow submodules to reduce clone time and save on some disk space. That fetches only the latest commit of the submodule branch being tracled. That can be useful with large repos. In DevSecOps Infrastructure as Code, that is rarely the case unless people put tons of Docs, Powerpoints, and Visio files in the repo.
git clone –recurse-submodules –shallow-submodules https://workinghardinit@dev.azure.com/workinghardinit/ProjectOne/_git/MainRepoProjectOne
Removing a submodule
Deleting a git submodule is a three-step process. Don’t worry, it is not difficult, but you need to do all three. First, I delete the submodule from Git. Secondly, I delete the folder that refers to it in .git/modules. Last, I clean out the reference to the submodule in the .git/config file.
Delete the submodule from Git
1 |
<strong><em>git rm MySubModule/SubRepoProjectOne</em></strong> |
Delete the submodules folder under .git/modules
Now, that does not remove it from the .git/modules folder. We can delete this by running
1 |
<strong><em>rm .git/modules/MySubModules/SubRepoProjectOne -Recurse -Force </em></strong> |
clean up the submodule reference in .git/config file
The reference to the submodule is still in the config file in your .git folder. Open that file in your favorite editor and delete the block that refers to the submodule or modules you deleted via Git.
o
Alternatively, execute
1 |
<strong><em>git config --remove-section submodule.MySubModules/SubRepoProjectOne</em></strong> |
Finally
All that is left to do now is to make sure that you are at the root of your main repository and execute
1 2 3 4 5 |
<strong><em>git add .</em></strong> <strong><em>git commit -m "deleted git submodule X"</em></strong> <strong><em>git push</em></strong> |
Voila, the submodules are gone, and everything is tidy! Removing the second repository works exactly the same.
1 2 3 4 5 6 7 8 9 10 11 |
<strong><em>git rm MySubModule/SubRepoProjectTwo</em></strong> <strong><em>rm .git/modules/MySubModules/SubRepoProjectTwo -Recurse -Force</em></strong> <strong><em>git config --remove-section submodule.MySubModules/SubRepoProjectTwo</em></strong> <strong><em>git add .</em></strong> <strong><em>git commit -m "deleted git submodule X"</em></strong> <strong><em>git push</em></strong> |
Testing your work in a pipeline
To test your work in an Azure Pipeline, we create a YAML file that does nothing but check out the main and submodule repositories.
In the root folder of your main repository, execute the following
1 2 3 4 5 |
<strong><em>md Pipelines</em></strong> <strong><em>cd Pipelines</em></strong> <strong><em>ni AzureDevOpsPipeline.yaml</em></strong> |
In that yaml file paste the below text
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
#Pipeline with submodules for testing trigger: none pool: vmImage: windows-latest resources: repositories: - repository: SubRepoProjectOneID #Create a repository ID to reference this resource in uses type: git name: SubRepoProjectOne #The name of the repository in the same project as main, used for my submodule - repository: SubRepoProjectTwoID #Create a repository ID to reference this resource in uses type: git name: ProjectTwo/SubRepoProjectTwo #The name of the project/repository in another project as main , used for my submodule stages: - stage: checkout jobs: - job: steps: - checkout: self submodules: true - task: PublishBuildArtifacts@1 inputs: PathtoPublish: '$(Build.Repository.LocalPath)' ArtifactName: 'iac' publishLocation: 'Container' uses: repositories: - SubRepoProjectOneID - SubRepoProjectTwoID |
In Azure DevOps, go to Project One and select Pipelines. Click New Pipeline
Select Azure Repos Git YAML (do not bother with classic pipelines anymore)
Select the repository where this pipeline lives.
Choose to use an existing Azure Pipeline YAML file.
Select the file we just created earlier.
Click “Continue”. Your pipeline will be created, and you can save and run it. If that run succeeds, you did everything correctly. If not, see this article for guidance on what could be wrong.
If a submodule references a repository in a project other than the main project, you must set the option “Limit job authorization scope to the current project for non-release pipelines” to “Off” in your main repository’s project settings. Otherwise, you’ll experience permission issues! Also, see this article.
Some Tips for use in real life
As with any powerful tool, Git has a lot of options and capabilities to achieve what’s needed. But, it can be intimidating. So, spend some time in labs playing with the concepts. It works wonders. Getting things to work, breaking them, and fixing things is a great way to wrap your head around any technology.
When working with Git submodules, try to:
- Push submodule changes before pushing the changes to the main project. That way, you avoid surprises of missing changes in the remote repository.
- Nesting submodules can be fun but also confusing. Use that capability with care.
- Have a plan for leveraging submodules and document it in the read-me notes.
- Decide on the branch to follow and what commit ID is the “approved” one.
- When you choose a specific branch, specify it when adding the submodule.
- If possible, consume submodules in the main repository. Don’t edit and push them to the repository. In other words, work on the submodule repository directly, not via the main repository.
- Update your submodule frequently. Know and manage what code you are running.
Use common sense, and be pragmatic in your approach. You can organize it the way you want as long as you master the Git intricacies of what you are doing. In short, use Git wisely, and don’t jump around franticly between branches and commits. Stability is worth something in an Agile DevSecOps world 😉. All the above is some common-sense Git advice to adapt to your needs and environment.
Conclusion
I hope these examples of playing with Git submodules made what you can do with them more palatable. I know it can be a bit intimidating. It is a powerful concept that helped me solve several challenges when developing an IaC/DevOps workflow between multiple teams, security boundaries, etc. The instructions should help you test it out yourself. Maybe it is that extra help you need when figuring it all out. Good luck!