Loop inversion

Source: Wikipedia, the free encyclopedia.

In

jump instructions to reduce branch mis-prediction.[1]

Example in Java

void pre_inversion() {
  while (/* condition */) {
    /* loop body */
  }
}

is equivalent to:

void post_inversion() {
  if (/* condition */) {
    do {
      /* loop body */
    } while (/* condition */);
  }
}

No change in performance occurs for initial and non-final iterations through the loop, including when the loop is not entered because the condition is false. However, the final iteration has 2 fewer jump instructions when the loop is entered because the do-while loop does not need to jump to the start of the while loop to evaluate the loop condition.[1]

Example in C

  int i, a[100];
  i = 0;
  while (i < 100) {
    a[i] = 0;
    i++;
  }

is equivalent to:

  int i, a[100];
  i = 0;
  if (i < 100) {
    do {
      a[i] = 0;
      i++;
    } while (i < 100);
  }

Despite the seemingly greater complexity of the second example, it may actually run faster on modern

instruction pipeline. By nature, any jump in the code causes a pipeline stall, which is a detriment to performance. [citation needed
]

Additionally, loop inversion allows safe loop-invariant code motion.[citation needed][clarification needed]

Example in three-address code

[clarification needed]

      i := 0
 L1:  if i >= 100 goto L2
      a[i] := 0
      i := i + 1
      goto L1
 L2:  

If i had been initialized at 100, the instructions executed at runtime would have been:

  if i >= 100
  goto L2

Let us assume that i had been initialized to some value less than 100. Now let us look at the instructions executed at the moment after i has been incremented to 99 in the loop:

  goto L1
  if i < 100
  a[i] := 0
  i := i + 1
  goto L1
  if i >= 100
  goto L2
  <<at L2>>

Now, let's look at the optimized version:

      i := 0
      if i >= 100 goto L2
 L1:  a[i] := 0
      i := i + 1
      if i < 100 goto L1
 L2:

Again, let's look at the instructions executed if i is initialized to 100:

  if i >= 100
  goto L2

We didn't waste any cycles compared to the original version. Now consider the case where i has been incremented to 99:

  if i < 100
  goto L1
  a[i] := 0
  i := i + 1
  if i < 100
  <<at L2>>

As you can see, two gotos (and thus, two pipeline stalls) have been eliminated in the execution.

References

  1. ^ a b c d Jubb, Chae, Loop Optimizations in Modern C Compilers (PDF)