@@ -1,3 +1,7 @@
+2017-03-31 Alan Modra <amodra@gmail.com>
+
+ * doc/extend.texi (Extended Asm): Add OpenBLAS example.
+
2017-03-31 Matthew Fortune <matthew.fortune@imgtec.com>
* config/mips/mips-msa.md (msa_vec_extract_<msafmt_f>): Update
@@ -8516,6 +8516,84 @@ asm ("cmoveq %1, %2, %[result]"
: "r" (test), "r" (new), "[result]" (old));
@end example
+Here is a larger PowerPC example taken from OpenBLAS. The over 150
+lines of assembly have been removed except for comments added to check
+gcc's register assignments, because the assembly itself isn't that
+important. You do need to know that all of the function parameters
+are inputs except for the @code{y} array, which is modified by the
+function, and that early assembly sets up four pointers into the
+@code{ap} array, @code{a0=ap}, @code{a1=ap+lda}, @code{a2=ap+2*lda},
+and @code{a3=ap+3*lda}.
+
+Illustrated here is a technique you can use to have gcc allocate
+temporary registers for an asm, giving the compiler more freedom than
+the programmer allocating fixed registers via clobbers. This is done
+by declaring a variable and making it an early-clobber asm output as
+with @code{a2} and @code{a3}, or making it an output tied to an input
+as with @code{a0} and @code{a1}. The vsx registers used by the asm
+could have used the same technique except for gcc's limit on number of
+asm parameters. It shouldn't be surprising that @code{a0} is tied to
+@code{ap} from the above description, and @code{lda} is only used
+early so that register is available for reuse as @code{a1}. Tying an
+input to an output is the way to set up an initialised temporary
+register that is modified by an asm. The example also shows an
+initialised register unchanged by the asm; @code{"b" (16)} sets up
+@code{%11} to 16.
+
+Also shown is a somewhat better method than using a @code{"memory"}
+clobber to tell gcc that an asm accesses or modifies memory . Here we
+use @code{"+m" (*y)} in the list of outputs to tell gcc that the
+@code{y} array is both read and written by the asm. @code{"m" (*x)}
+and @code{"m" (*ap)} in the inputs tells gcc that these arrays are
+read. At a minimum, aliasing rules will allow gcc to know what memory
+@emph{doesn't} need to be flushed, and if the function were inlined
+then gcc may be able to do even better. Notice that @code{x},
+@code{y}, and @code{ap} all appear twice in the asm parameters, once
+to specify memory accessed, and once to specify a base register used
+by the asm. You won't normally be wasting a register by doing this as
+gcc can use the same register for both purposes. However, it would be
+foolish to use both @code{%0} and @code{%2} for @code{y} in your asm
+and expect them to be the same.
+
+@example
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+ const double *x, double *y, double alpha)
+@{
+ double *a0;
+ double *a1;
+ double *a2;
+ double *a3;
+
+ __asm__
+ (
+ ...
+ "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+ "#a0=%3 a1=%4 a2=%5 a3=%6"
+ :
+ "+m" (*y),
+ "+r" (n), // 1
+ "+b" (y), // 2
+ "=b" (a0), // 3
+ "=b" (a1), // 4
+ "=&b" (a2), // 5
+ "=&b" (a3) // 6
+ :
+ "m" (*x),
+ "m" (*ap),
+ "d" (alpha), // 9
+ "r" (x), // 10
+ "b" (16), // 11
+ "3" (ap), // 12
+ "4" (lda) // 13
+ :
+ "cr0",
+ "vs32","vs33","vs34","vs35","vs36","vs37",
+ "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+ );
+@}
+@end example
+
@anchor{Clobbers}
@subsubsection Clobbers
@cindex @code{asm} clobbers