The Uncherrypick Identity Crisis

Page last updated

This was a scenario presented to me at the Mercurial Sprint days that poses a subtle problem to VCSes like JJ and GitButler that choose to constantly replay history.

We start with a small commit graph that looks like:

* B
| * C
|/
* A

Let’s imagine they have the file contents as follows:

A: a
B: ab
C: abc

We can imagine this scenario as having some bug fix in BB, while CC contains both the same bug fix and an additional feature.

In our scenario, we then perform two operations:

  • Rebase CC on top of BB, to get CC'
  • Rebase CC' back on top of AA, to get CC''

After the first step, we achieve a history ABCA \larr B \larr C', where the contents of CC' are abc as expected.

The problem appears when we cherry-pick CC' back onto AA. In that operation, the merge base is still BB: one side removes b to get back to AA, while the other side adds only c to get from BB to CC'. The three-way merge combines those changes and produces ac, not abc.

That is surprising because the two rebases look like they should cancel out. We started with CC, moved it onto BB, then moved it back onto AA; but the result is no longer equal to CC.

This example shows that when performing the cherry-pick in a VCS that makes use of three-way merges, merge(l=A, base=B, r=merge(l=B, base=A, r=C)) does not always equal CC.

If we go back to imagining that the content b was some sort of bug fix, then we have potentially introduced a regression by reverting what should have otherwise been an identity operation.

This is something that affects both Git and Mercurial, but is not so much of a concern since users rarely do this by hand. However, in a VCS like JJ or GitButler we encourage users to drag and drop changes about the place with the promise of some notion of consistency and reversibility in these operations.

As a result, there is really an onus on us to take scenarios like this seriously and to try and define better primitives that can avoid pitfalls like this.