Rebase is a strategy that is used to integrate changes from different branches. It creates a linear history of commits that comprises the branches involved. In comparison with a merge strategy, rebase improves collaboration as it allows people to edit the history before sharing it.
Let’s imagine the case of a team that follows git-flow and they want to merge their feature branch A1 into develop. We have Nacho and Pepe forming the team. Pepe and Nacho create develop branch with 4 commits and then Nacho starts working on a feature called A1 and commits only once. You could see this from different perspectives:
The first way is to understand that in a history, branches are only references to commits (or snapshots). In the above, develop branch starts with commit c0 and finishes with commit c3. The branch A1 starts with commit c0 and finishes with commit c4, but A1 is independent from develop, if you delete develop, the commits are still there, you just delete the reference.
As you can see in this image above, in a straight line, everything is connected, not because we talk about different branches we talk about different things, you can think of the commits as snapshots of code that are committed by people and the branches as tags attached to the commit which represent where that branch was last left off.
Another way to abstract git and its branches is to use "contexts". Let's talk about it: When you checkout a branch, let’s say A1, the changes effected nothing of the ‘contexts’ of development.
It is not until you merge them that develop knows nothing about what you are doing in the ‘contexts’ of A1, and vice versa. You could actually delete everything in develop, A1 ‘context’ is intact until you merge them.
So you could perform operations like resets that unless you push them and merge them, the other ‘context’ will know nothing. You could say, A1 is a copy of develop with other new commits on top, but in reality it is just a succession of the same commits with a different label on top.
So having this in mind, let's imagine that Nacho keeps implementing things in A1, Pepe keeps developing things and adding them into develop and the overall state reaches this stage:
Because they want to keep the story in a straight line, avoiding a merge commit (basically a merge that joins c10 and c7 to integrate A1 into develop), they decide to rebase.
So what they do is
- git checkout A1
- git rebase develop
- Solve conflicts (if any)
What ‘git rebase develop’ command does is take develop branch commits and A1 commits the following way
This is where all the explanations finished and I have never understood what is going on… This is the deal:
If we go back to the contexts of develop and A1, when we operate in one branch, unless you merge it with others, no branch apart the one being operated is affected by anything.
Now, because the developers did a git checkout A1, we are operating in the ‘context’ of A1 so develop branch is intact. You can now solve conflicts, force-push, finish the rebase that even if you do git checkout develop, develop will look like this:
Whereas the context of A1, A1 will look like this:
So to make it clear, when you do the process above, you are operating with A1 only. And it is not until you merge into develop that develops knows nothing. Which in that case (if you merge them) this is the result:
Now, the reason the commits from c4 to c7 have an apostrophe is because those commits were re-written. See, the commit hash in c4 is not the same as in c4’ but the content is. The reason the commits are different is because the hashes are different, and the reason for this is, the hashes are composed of different pieces of info that is hashed. One of those pieces is the parent commit.
For C4 the parent is c3, but because we are rebasing, the parent now is c10, so c4 has not the same parent, not the same info and when hashed, not the same hash. That is why we say that the history of A1 is rewritten and this can cause trouble if this branch is shared. Let's see why:
Imagine we are here again:
This is how the remote looks right now. Nacho pushed changes (c7) and Pepe has been working and pushed c10.
And once in here, in his local, Nacho has A1 up to date, commit c7 (because he worked on them and pushed them). Pepe wants to check how it is going and git checkouts A1 too. So the state of the tree for Nacho and Pepe is the same regarding A1 and it is like in the image above, but let’s focus in A1 even more:
If we really focus on A1, the image above is what Nacho and Pepe have in their local area (and remote) when they checkout the branch A1. All good.
Now, Nacho performs the rebase, solving conflicts. As we saw before, only his local (because he performed the rebase but didn't push, so nobody outside of Nacho’s local knows anything) looks like this:
If they do not communicate, we have a problem, because remote and Pepe’s local look like:
The branches, because Nacho rewrote the history of A1 , are different. And git knows this, that is why, when you rebase, you need to force-push or force-with-lease-push (we will see this later) to remote, as you acknowledge that the branches are not the same, and you want your local to be in remote. Now imagine Nacho force pushes and remote now looks like this:
If Pepe pulls this branch, he will be having an awful day as remote says that it has something that does not match Pepe’s local, and if he did any work during this time, that information will be lost as the ‘real deal’ is what remote says.
Force or Force with lease?
The idea of force is that if I have this in remote:
Imagine because of some reason c2 is not correct, so we get rid of it, we introduce C3 and C4 and that is what we want as correct:
The moment we are pushing this to remote, it is not going to allow us because of the divergence of the branches. If you want, and you are sure, you can git push –force this, and no matter what it is in develop remote, it will be overridden by your branch.
So… what if somebody else, when I was fixing my local, pushed a change in remote? Let’s say, our local looks like the image above but the remote looks like below:
So if we git push -f this, the remote will lose c5 that was added by another person. In order to avoid this situation, you can git push —force-with-lease and in this case, the push will be rejected to avoid losing that information.
Rebase and conflicts
When rebasing, the way of solving conflicts is going commit by commit dealing with them, which can be a little bit of a headache. Every time a commit is resolved, we would use git rebase –continue to confirm we are happy with the outcome of that rebased commit and we will go to the next one. Sometimes things go south and it gets complicated, you can always cancel the rebase by using git rebase –abort.
Rebase interactive, fun for everybody
This is out of the scope of this post but I just wanted to mention the existence of an interactive git rebase option via git rebase -i command. Here Git offers you flexibility to change and operate with multiple commits at the same time and squash, change messages, etc. It is a really useful tool!