@@ -7869,7 +7869,7 @@ A comma-separated list of C expressions read by the instructions in the
@item Clobbers
A comma-separated list of registers or other values changed by the
@var{AssemblerTemplate}, beyond those listed as outputs.
-An empty list is permitted. @xref{Clobbers}.
+An empty list is permitted. @xref{Clobbers and Scratch Registers}.
@item GotoLabels
When you are using the @code{goto} form of @code{asm}, this section contains
@@ -8229,7 +8229,7 @@ The enclosing parentheses are a required part of the syntax.
When the compiler selects the registers to use to
represent the output operands, it does not use any of the clobbered registers
-(@pxref{Clobbers}).
+(@pxref{Clobbers and Scratch Registers}).
Output operand expressions must be lvalues. The compiler cannot check whether
the operands have data types that are reasonable for the instruction being
@@ -8465,7 +8465,8 @@ as input. The enclosing parentheses are a required part of the syntax.
@end table
When the compiler selects the registers to use to represent the input
-operands, it does not use any of the clobbered registers (@pxref{Clobbers}).
+operands, it does not use any of the clobbered registers
+(@pxref{Clobbers and Scratch Registers}).
If there are no output operands but there are input operands, place two
consecutive colons where the output operands would go:
@@ -8516,9 +8517,10 @@ asm ("cmoveq %1, %2, %[result]"
: "r" (test), "r" (new), "[result]" (old));
@end example
-@anchor{Clobbers}
-@subsubsection Clobbers
+@anchor{Clobbers and Scratch Registers}
+@subsubsection Clobbers and Scratch Registers
@cindex @code{asm} clobbers
+@cindex @code{asm} scratch registers
While the compiler is aware of changes to entries listed in the output
operands, the inline @code{asm} code may modify more than just the outputs. For
@@ -8589,6 +8591,110 @@ ten bytes of a string, use a memory input like:
@end table
+Rather than allocating fixed registers via clobbers to provide scratch
+registers for an @code{asm} statement, there are better techniques you
+can use which give the compiler more freedom. There are also better
+ways than using a @code{"memory"} clobber to tell GCC that an
+@code{asm} statement accesses or modifies memory. The following
+PowerPC example taken from OpenBLAS illustrates some of these
+techniques.
+
+In the function shown below, all of the function parameters are inputs
+except for the @code{y} array, which is modified by the function.
+Only the first few lines of assembly in the @code{asm} statement are
+shown, and a comment handy for checking register assignments. These
+insns set up some registers for later use in loops, and in particular,
+set up four pointers into the @code{ap} array, @code{a0=ap},
+@code{a1=ap+lda}, @code{a2=ap+2*lda}, and @code{a3=ap+3*lda}. The
+rest of the assembly is simply too large to include here.
+
+@smallexample
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+ const double *x, double *y, double alpha)
+@{
+ double *a0;
+ double *a1;
+ double *a2;
+ double *a3;
+
+ __asm__
+ (
+ "lxvd2x 34, 0, %10 \n\t" // x0, x1
+ "lxvd2x 35, %11, %10 \n\t" // x2, x3
+ "xxspltd 32, %x9, 0 \n\t" // alpha, alpha
+ "sldi %6, %13, 3 \n\t" // lda * sizeof (double)
+ "xvmuldp 34, 34, 32 \n\t" // x0 * alpha, x1 * alpha
+ "xvmuldp 35, 35, 32 \n\t" // x2 * alpha, x3 * alpha
+ "add %4, %3, %6 \n\t" // a0 = ap, a1 = a0 + lda
+ "add %6, %6, %6 \n\t" // 2 * lda
+ "xxspltd 32, 34, 0 \n\t" // x0 * alpha, x0 * alpha
+ "xxspltd 33, 34, 1 \n\t" // x1 * alpha, x1 * alpha
+ "xxspltd 34, 35, 0 \n\t" // x2 * alpha, x2 * alpha
+ "xxspltd 35, 35, 1 \n\t" // x3 * alpha, x3 * alpha
+ "add %5, %3, %6 \n\t" // a2 = a0 + 2 * lda
+ "add %6, %4, %6 \n\t" // a3 = a1 + 2 * lda
+ ...
+ "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+ "#a0=%3 a1=%4 a2=%5 a3=%6"
+ :
+ "+m" (*y),
+ "+r" (n), // 1
+ "+b" (y), // 2
+ "=b" (a0), // 3
+ "=b" (a1), // 4
+ "=&b" (a2), // 5
+ "=&b" (a3) // 6
+ :
+ "m" (*x),
+ "m" (*ap),
+ "d" (alpha), // 9
+ "r" (x), // 10
+ "b" (16), // 11
+ "3" (ap), // 12
+ "4" (lda) // 13
+ :
+ "cr0",
+ "vs32","vs33","vs34","vs35","vs36","vs37",
+ "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+ );
+@}
+@end smallexample
+
+Allocating scratch registers is done by declaring a variable and
+making it an early-clobber @code{asm} output as with @code{a2} and
+@code{a3}, or making it an output tied to an input as with @code{a0}
+and @code{a1}. You can use a normal @code{asm} output if all inputs
+that might share the same register are consumed before the scratch is
+used. The VSX registers clobbered by the @code{asm} statement could
+have used the same technique except for GCC's limit on number of
+@code{asm} parameters. It shouldn't be surprising that @code{a0} is
+tied to @code{ap} from the above description, and @code{lda} is only
+used in the fourth machine insn shown above, so that register is
+available for reuse as @code{a1}. Note that tying an input to an
+output is the way to set up an initialized temporary register modified
+by an @code{asm} statement. The example also shows an initialized
+register unchanged by the @code{asm} statement; @code{"b" (16)} sets
+up @code{%11} to 16.
+
+Rather than using a @code{"memory"} clobber, the @code{asm} has
+@code{"+m" (*y)} in the list of outputs to tell GCC that the @code{y}
+array is both read and written by the @code{asm} statement.
+@code{"m" (*x)} and @code{"m" (*ap)} in the inputs tell GCC that these
+arrays are read. At a minimum, aliasing rules allow GCC to know what
+memory @emph{doesn't} need to be flushed, and if the function were
+inlined then GCC may be able to do even better. Also, if GCC can
+prove that all of the outputs of an @code{asm} statement are unused,
+then the @code{asm} may be deleted. Removal of dead @code{asm}
+statements will not happen if they clobber @code{"memory"}. Notice
+that @code{x}, @code{y}, and @code{ap} all appear twice in the
+@code{asm} parameters, once to specify memory accessed, and once to
+specify a base register used by the @code{asm}. You won't normally be
+wasting a register by doing this as GCC can use the same register for
+both purposes. However, it would be foolish to use both @code{%0} and
+@code{%2} for @code{y} in this @code{asm} assembly and expect them to
+be the same.
+
@anchor{GotoLabels}
@subsubsection Goto Labels
@cindex @code{asm} goto labels