A deeper look into Git

In the last Git article I gave a general overview to Git and why we are using it during our sprints. Now I want to give a more deeper insight to this tool.

What about the Git index?

Index can be thought of as an intermediate stage between the working directory and the local repository, as a pre-commit stage. As said in the last article a commit describes a specific state in the history of your project. When you make changes in your working directory, Git knows nothing about them.

By adding a file to index you tell Git what will be included in the next commit you make. For example you may not want to include all your changes in the next commit. Let’s say you’ve changed 4 files and you want to commit only the changes made in 2 of them. The first step is to add these 2 files in the index and then create the new commit (through the commit command). The new commit contains information (tree and blob objects) to recreate the repository at a state that includes the changes made in that 2 files.

Note here that you may have added changes for only two files, but a commit contains information to create the entire working directory; not just the 2 files you changed.

That is a key concept in Git. Once again: each commit in Git contains information to construct your entire project at a specific state. When you check for differences in your project between 2 commits, Git actually compares the differences between 2 project states. A commit does not contain changes you made, but project state.

What about branches?

Because of many graphical tools branches are misinterpreted and are thought as part of the Git structure. For example, somebody may think that a branch consists of all the commits that are displayed in a graphical tool from the point the branch splits from another branch. That is wrong. The right way to think of a branch is as a post-it note, used to reference a specific commit.

branches image

When you merge 2 branches you create a new commit with 2 parent commits. Then Git searches the parents of parents of parents … of these 2 commits until it finds a commit object that the 2 initial commits in the commit sequence have in common. The merge commit just describes a state of your project. A state that contains all the changes from the commit sequences on both branches since the common commit.

In the image above the purple commit does not „belong“ neither to branch1 nor to branch2. It simply contains information about a new state of the repository. We can reference it with the branch1 post-it, or with the branch2 post-it, or with both of them.

To sum up, a branch is NOT literally a branch. It is just a post-it. It has no structure – no content. It is just a way of referencing commits.

About References

Branches are another kind of Git objects that are called references. References are stored inside .git/refs folder. Inside this folder, there are 3 subfolders:

  • Branches are stored inside the ‚heads‘ folder.
  • Remote branches are stored inside the ‚remotes‘ folder.
  • Tags are stored in ‚tags‘ folder (won’t be analyzed thoroughly here).

Remote branches are tricky, as there are 2 types of them:

  • The actual remote branches that live in the remote repository.
  • The local remote branches stored in the ‚remotes‘ folder of the local repository.

The local remote branches are used from Git to represent the state of the real branches in the remote repository. Actually, they could be used for Git operations just like any other normal local branch, which means, to simply reference a commit object. The reason why they are confused with some special sort of branches is their name and how they are used from Git. However, it is not recommended to use them as normal branches, as they are used from Git to represent the state of the remote repository.

How about the communication with the remote repository?

There are only 2 commands to enable this communication: fetch and push (remember that pull is a combination of fetch and merge):

  • Fetch retrieves any new commits in the remote repository. Remember that commits have nothing to do with the working directory. They are meaningless binary objects for a developer. After fetching from the remote repository the new commits are simply added inside the folder .git/objects according to their hash codes.
  • Push sends information about new commits from the local to the remote repository. The ‚local remote branches‘ are also updated according to the real remote ones. Push operations should be the only way that the local remote branches are updated (which means indirectly through Git and not directly through the user).

How does Git know which state to construct in the working directory? This is defined with a special reference called HEAD. The working directory contains the project state defined by the commit that is pointed with the HEAD reference. HEAD is controlled only through checkout command. So when we checkout a branch or a commit we simply attach the HEAD reference to that commit. Git notes that and updates the working directory according to the blob and tree objects referenced by that commit.

Is that all about Git?

Of course not! Git provides lot’s of functionality. That was just an effort of highlighting the approach of understanding before applying. This way any further functionality of Git can be understood better, as it is based on the root concepts of Git. Moreover, in case a problem appears it is easier to get to the root of the problem and provide the appropriate solution faster.