Search
StarWind is a hyperconverged (HCI) vendor with focus on Enterprise ROBO, SMB & Edge

Using Git submodules in your main Azure DevOps repository – Part II

  • May 23, 2024
  • 35 min read
Cloud and Virtualization Architect. Didier is an IT veteran with over 20 years of expertise in Microsoft technologies, storage, virtualization, and networking. Didier primarily works as an expert advisor and infrastructure architect.
Cloud and Virtualization Architect. Didier is an IT veteran with over 20 years of expertise in Microsoft technologies, storage, virtualization, and networking. Didier primarily works as an expert advisor and infrastructure architect.

Introduction

In this two-part article series, I will help you learn to experiment with and test Git submodules in Azure DevOps. You are reading Part II. I will show you how to add, update, clone, and remove submodules in Git using the lab environment I set up in Part I.

You are reading the labs I give to fellow IT Pros who are responsible for maintaining and deploying Azure Firewall Policies and rules in Infrastructure as Code (IaC). Here, I focus on working with Git submodules. If you want to read a bit more about the intricacies of getting a pipeline to play nice with submodules, read Getting Git submodules in private Azure DevOps repositories to work in a pipeline | StarWind Blog (starwindsoftware.com)

Working with submodules

We continue where we left off in Part I. Launch Visual Studio Code and open the MainProjectOne folder. Make sure to have a PowerShell terminal open.

Adding a submodule

I add a submodule when I want the code, files, and artifacts from another repository to be available transparently in a subfolder in my main repository. The cool thing is it keeps the paths locally and in the remote repository the same, so I can do local deploys without worrying about my folder structure. You can add one or more. The repositories can be from the same or different locations (Azure DevOps, GitHub, Gitlab), from the same or other organizations or projects. If it is a public or private repository to which you can authenticate, you can add it as a Git submodule. I organize my submodules in a subfolder to show we can choose the path where you create it.

How to add a submodule

Navigate to the MainRepoProjectOne root folder and run:

You receive a prompt to authenticate if you have not done that yet.

Git Credential Manager | Pick an account

The result of your handy work looks like the image below.

The result of your handy work looks like the image below.

How to verify if adding the submodule was successful

Now, to check if this worked, I first run

check if this worked, I first run

Notice that .gitmodules has appeared and that there is now a subfolder SubRepoProjectOne in MainRepoProjectOne/MySubModules. Look at its content.

gitmodules has appeared and that there is now a subfolder SubRepoProjectOne in MainRepoProjectOne/MySubModules.

You see the content from the repository SubRepoProjectOne in Azure DevOps WokingHardInIT/ProjectOne/SubRepoProjectOne we just added. If you run git status in that subfolder, you know that it is up to date.

Secondly, navigate to the root folder and look at the contents of .gitmodules. Notice the reference there for your submodule with the path and the URL.

Secondly, navigate to the root folder and look at the contents of .gitmodules.

WARNING!

I added the submodules via its URL. That works, but as it is a private repository, I have authentication issues with Azure DevOps pipelines. For a private repository that resides in the same DevOps organization, I should add it as a relative path. I have another article on just that subject.

You can change that path in the file or add it directly using a relative path.

If the submodule already exists, you’d have to remove it first. So, in this case, editing the file is easier.

editing the file is easier.

Adding a submodule with its private repository in another project in the same DevOps Organization works very similarly. Again, use a relative path to avoid issues with the Azure DevOps pipeline later. As the repo lives in another project (ProjectTwo) than the main repository (ProjectOne), we need to specify the path to the project as follows.

git submodule add ../../../ProjectTwo/_git/SubRepoProjectTwo .\MySubModules\SubRepoProjectTwo

You cannot add submodules that refer to a project in another DevOps Organization and expect the Azure DevOps pipeline to work. See article

Finally, check out the \.git\modules folder.

Under the .git folder, a modules folder that contains a subfolder for the submodule I just added should now exist.

Under the .git folder, a modules folder that contains a subfolder for the submodule

I am now sure I added a submodule to my git repository successfully. Now, I should play with it and check out the behavior when adding, committing, and pushing items to the remote repositories.

Updating the main repo and its submodule(s)

Navigate to the submodule folder (MainRepoProjectOne\MySubModules\SubRepoProjectOne) and create some items.

Navigate to the submodule folder (MainRepoProjectOne\MySubModules\SubRepoProjectOne) and create some items

As you can see in the picture above, we have committed these files to the local repository but have not pushed them out to the remote repository.

Please navigate to the main repository folder MainRepoProjectOne. Remember we have staged files to push to our main repo?

navigate to the main repository folder MainRepoProjectOne

We have not yet pushed these to the remote repository either. For now, they only exist locally.

pushed these to the remote repository either

When you look at Azure DevOps, these do not exist there yet.

look at Azure DevOps, these do not exist there yet

Now, push these to its remote repository.

git push

push these to its remote repository.

After refreshing the view, these items should be visible in the Azure DevOps repository.

After refreshing the view, these items should be visible in the Azure DevOps repository.

Do note the SubRepoProjectOne link under MySubModules, which contains a link to the commit version ID of the repository it refers to, not the actual content. The content lives in its own repository, SubReposProjectOne.

So far, so good. Let’s take a peek at the Azure DevOps SubRepoProjectOne repository. The files and folders we created are not there yet, as our commit and push in the main repository did not touch that.

The files and folders we created are not there yet, as our commit and push in the main repository did not touch that.

To push those to its repository, we must navigate to MainRepoProjectOne\MySubModules\SubRepoProjectOne and run Git push there.

push those to its repository, we must navigate to MainRepoProjectOne\MySubModules\SubRepoProjectOne and run Git push there.
push those to its repository, we must navigate to MainRepoProjectOne\MySubModules\SubRepoProjectOne and run Git push there.

Cool huh!

In general, add, commit, and push the changes in your submodule first.So, before you add, commit, and push the changes in the main repository. I did not do that here to demonstrate their relationship and behavior.

In collaboration scenarios, you would probably work directly on the submodule repository, not your main project’s submodule subfolder. You’d then manually or automatically pull in the changes in that repository to your submodule in the main repository. More about that later! So, this direct editing in the submodule folder is more of a demo than a life scenario. Still, it nicely demonstrates the submodule repository’s behavior in relation to the main repository.

Working directly on the repository of the submodules

Let’s pretend we are the maintainer of the repository of the submodule references.

Open your Git repo for SubRepoProjectOne and create a file

You see that the newly created file is present when looking at the local folders and the repo in Azure DevOps.

created file is present when looking at the local folders and the repo in Azure DevOps.

Note that the WorkingOnThisSubRepo.html is not visible locally in the main projects submodule for SubRepoOneProject.

Note that the WorkingOnThisSubRepo.html is not visible locally in the main projects submodule for SubRepoOneProject.

Yes, the WorkingOnThisSubRepo.html is visible locally and remotely in the SubRepoOneProject folder of the SubRepoOneProject Git folder, not in the submodules folder of the main project. Why is this? Well, look at the combined image of the Azure DevOps repositories below. Everything is indeed up to date in the repository for SubRepoProjectOne.

 WorkingOnThisSubRepo.html is visible locally and remotely in the SubRepoOneProject folder of the SubRepoOneProject Git folder

But when we look at it from the main repositories’ point of view, they are not. That is because those updates made to the SubRepoProjectOne repository directly are not there. You can see why this is as the SubRepoProjectOne submodule folder in MainRepoProjectOne still refers to the previous commit ID.

Try and run

You notice this does not change anything. The trick to update this is to run

The latter refers to the version ID it is currently referencing, while –remote will grab the latest version from the Azure DevOps SubRepoProjectOne or any other submodule. That is the key! Another option would be to run Git pull in the MySubModules/ SubRepoProjectOne. But if you have more than one, that means more jumping around and more work.

run Git pull in the MySubModules/ SubRepoProjectOne

Are you confused yet? That is normal. You have to do the lab work and play with it to gain a better understanding, but that is very much worth it!

Now running git submodule update –remote does have a side effect. It causes the detached HEAD situation for the submodule repository. As you can see below

Now running git submodule update --remote does have a side effect

Help, it says my Head is detached

Usually, the message that the “HEAD is detached at XXXXXX…” sounds very worry some. In Git, it can seem annoying but not as deadly as it sounds.

Your issue is that a git push won’t work as you might expect, but you can still do it. Let me demonstrate what I mean.

In our git submodule repository, execute:

In our git submodule repository, execute

You already get a warning during Git commit about the detached HEAD, but when pushing this to the remote repository, we get an error as Git has no clue what branch to send it to. The fix is what it says:

git push origin HEAD:master

git push origin HEAD:master

Cool, the file is now in the remote repository as well.

the file is now in the remote repository as well.

However, the Git submodule folder (MySubModules\SubRepoProjectOne) on my workstation still has its HEAD detached at d69a7a7. And the most recent commit ID on the remote repo (SubRepoProjectOne) is 97cbe01.

Git submodule folder (MySubModules\SubRepoProjectOne) on my workstation still has its HEAD detached at d69a7a7

Maybe we are OK with this, as the behavior is known, and you want to run this manually. That way, no newer or latest commit version of your submodule’s master ever gets deployed without your updating it explicitly. You are in control and use a well-known version of the submodule commit history.

But that can be tiresome, and your submodule remote repository’s master branch is supposed to contain only stable, production-ready items. So, what if we want to avoid the detached HEAD behaviors and ensure that our submodule’s repository always updates to the latest master branch version of its remote repository?

What you choose here depends on what behavior you want. If you wish for any approved pull request in the remote submodule repository to be pulled into your main repository’s submodule, you can achieve this as follows.

First, fix the fact that the submodule Git repo is detached.

OK, the submodule’s git repository is no longer detached. Now we have two options to prevent this from happening next time you run git submodule update–remote.

Option 1: Use options in the command line

Navigate to the main repo root folder and execute

or

Remembering this can be hard. So, to make sure this happens automatically, I can use a Git alias like the one below:

We can now update the submodule without getting into a HEAD detached at XXXXXXX using

The thing is, now we need to remember to use the alias. Maybe there is a better way? Well, see option 2.

Option 2:configure this in the .git/config file

You can change the submodule update behavior in the .gitmodule file, which makes the command git submodule update -–remote execute Git submodule update –remote –merge or git submodule update –remote –rebase. There is no need to specify this explicitly; Git reads the instructions in the config file. I.e., it happens automatically. If you use this, document it and tell your team.

You can achieve this in two ways.

Edit the .gitmodulefile

[submodule “MySubModules/SubRepoProjectOne”]

path = MySubModules/SubRepoProjectOne

url = ../SubRepoProjectOne

update = merge

Configure it through the command line,

Why does a Git submodule get into a head-detached situation?

That is the default behavior when one executes git submodule update –remote. It has nothing to do with what branch you are tracking or are not tracking. It is a result of what that command does, which is:

git clone https://workinghardinit@dev.azure.com/workinghardinit/ProjectOne/_git/SubRepoProjectOne.\MySubModules\SubRepoProjectOne

cd .\MySubModules\SubRepoProjectOne\

git checkout <commit-id>

When you add a submodule, you clone the repo it refers to and then check out the commit ID from that submodule repository at that time. That is what leads to the Head’s detached state.

Does it become even messier with branches?

Branches can make things messier or more confusing when used in submodules. Use them when you have a good reason and know what you are doing. The chance is that here, combined with submodules, you won’t need them for many scenarios.

When you add a Git submodule to your main repository, the default branch it tracks is origin/master. That is what you need and want in most cases. People working on the repository you added as a submodule can and should use branches to organize their development work. But in the Git submodule, I’d stick with the master branch, which should contain the code, files, and artifacts that need deploying into production. Whatever is in there should be good to go. In branches, not so much. And, just in case you are wondering, tracking different branches or not has nothing to do with the detached behavior of submodules. That is just due to the nature of what a submodule is, as we have discussed above.

Updating nested submodules

It is possible to nest submodules. If you need to do that, the –recursive option is your friend! For example:

Updates all submodules, including the nested one.

Cloning a repository that contains one or more submodules

Cloning a main project that has submodules is pretty straightforward.

Git clone –recurse-submodules -j4 https://workinghardinit@dev.azure.com/workinghardinit/ProjectOne/_git/MainRepoProjectOne

It is pretty much what you’d expect. There is one potential issue, however. You might want or need to update the code from the submodule’s remote repository. Just like we have already seen, the content of the submodule is determined by the commit ID it references. Suppose it is older in the main repository’s ID than the most recent commit in the submodules repository. In that case, you must execute the git submodule update remotely to get the most recent commits down into your local submodule folder.

If that’s what you want for future clones, commit and push it up to the remote repository so it has the submodule referencing its repository’s most recent commit.

What if I forgot the –recurse-submodules option? If you forget, you will see that the submodule folders are empty. No worries, you can still fix this manually. Run:

That beats going into every potentially nested submodule folder, running Git init there, and cloning the repo.

How do we ensure submodules always get cloned?

If you need this, Git has a configuration option.

By setting submodule.recurse to true in the global Git configuration, you enforce that all future repository cloning initialize and fetch submodules. That way, you cannot forget to do so, which could lead to issues with automation and workflows.

Shallow submodules

You can use shallow submodules to reduce clone time and save on some disk space. That fetches only the latest commit of the submodule branch being tracled. That can be useful with large repos. In DevSecOps Infrastructure as Code, that is rarely the case unless people put tons of Docs, Powerpoints, and Visio files in the repo.

git clone –recurse-submodules –shallow-submodules https://workinghardinit@dev.azure.com/workinghardinit/ProjectOne/_git/MainRepoProjectOne

Removing a submodule

Deleting a git submodule is a three-step process. Don’t worry, it is not difficult, but you need to do all three. First, I delete the submodule from Git. Secondly, I delete the folder that refers to it in .git/modules. Last, I clean out the reference to the submodule in the .git/config file.

Delete the submodule from Git

Delete the submodules folder under .git/modules

Now, that does not remove it from the .git/modules folder. We can delete this by running

clean up the submodule reference in .git/config file

The reference to the submodule is still in the config file in your .git folder. Open that file in your favorite editor and delete the block that refers to the submodule or modules you deleted via Git.

o The reference to the submodule is still in the config file in your .git folder

Alternatively, execute

Finally

All that is left to do now is to make sure that you are at the root of your main repository and execute

Voila, the submodules are gone, and everything is tidy! Removing the second repository works exactly the same.

Testing your work in a pipeline

To test your work in an Azure Pipeline, we create a YAML file that does nothing but check out the main and submodule repositories.

In the root folder of your main repository, execute the following

In that yaml file paste the below text

In Azure DevOps, go to Project One and select Pipelines. Click New Pipeline

In Azure DevOps, go to Project One and select Pipelines. Click New Pipeline

Select Azure Repos Git YAML (do not bother with classic pipelines anymore)

Select Azure Repos Git YAML (do not bother with classic pipelines anymore)

Select the repository where this pipeline lives.

Select the repository where this pipeline lives.

Choose to use an existing Azure Pipeline YAML file.

Choose to use an existing Azure Pipeline YAML file.

Select the file we just created earlier.

Select the file we just created earlier.

Click “Continue”. Your pipeline will be created, and you can save and run it. If that run succeeds, you did everything correctly. If not, see this article for guidance on what could be wrong.

If a submodule references a repository in a project other than the main project, you must set the option “Limit job authorization scope to the current project for non-release pipelines” to “Off” in your main repository’s project settings. Otherwise, you’ll experience permission issues! Also, see this article.

set the option "Limit job authorization scope to the current project for non-release pipelines" to "Off" in your main repository's project settings

set the option "Limit job authorization scope to the current project for non-release pipelines" to "Off" in your main repository's project settings

Some Tips for use in real life

As with any powerful tool, Git has a lot of options and capabilities to achieve what’s needed. But, it can be intimidating. So, spend some time in labs playing with the concepts. It works wonders. Getting things to work, breaking them, and fixing things is a great way to wrap your head around any technology.

When working with Git submodules, try to:

  • Push submodule changes before pushing the changes to the main project. That way, you avoid surprises of missing changes in the remote repository.
  • Nesting submodules can be fun but also confusing. Use that capability with care.
  • Have a plan for leveraging submodules and document it in the read-me notes.
  • Decide on the branch to follow and what commit ID is the “approved” one.
  • When you choose a specific branch, specify it when adding the submodule.
  • If possible, consume submodules in the main repository. Don’t edit and push them to the repository. In other words, work on the submodule repository directly, not via the main repository.
  • Update your submodule frequently. Know and manage what code you are running.

Use common sense, and be pragmatic in your approach. You can organize it the way you want as long as you master the Git intricacies of what you are doing. In short, use Git wisely, and don’t jump around franticly between branches and commits. Stability is worth something in an Agile DevSecOps world 😉. All the above is some common-sense Git advice to adapt to your needs and environment.

Conclusion

I hope these examples of playing with Git submodules made what you can do with them more palatable. I know it can be a bit intimidating. It is a powerful concept that helped me solve several challenges when developing an IaC/DevOps workflow between multiple teams, security boundaries, etc. The instructions should help you test it out yourself. Maybe it is that extra help you need when figuring it all out. Good luck!

Found Didier’s article helpful? Looking for a reliable, high-performance, and cost-effective shared storage solution for your production cluster?
Dmytro Malynka
Dmytro Malynka StarWind Virtual SAN Product Manager
We’ve got you covered! StarWind Virtual SAN (VSAN) is specifically designed to provide highly-available shared storage for Hyper-V, vSphere, and KVM clusters. With StarWind VSAN, simplicity is key: utilize the local disks of your hypervisor hosts and create shared HA storage for your VMs. Interested in learning more? Book a short StarWind VSAN demo now and see it in action!