From patchwork Sun Nov 14 21:08:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 1555035 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=U2OenQaR; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HslL24VShz9s0r for ; Mon, 15 Nov 2021 08:08:41 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B6993858425 for ; Sun, 14 Nov 2021 21:08:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id B0E703858D39 for ; Sun, 14 Nov 2021 21:08:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B0E703858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-ed1-x52b.google.com with SMTP id c8so62348664ede.13 for ; Sun, 14 Nov 2021 13:08:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=date:from:to:subject:message-id:user-agent:mime-version; bh=qplBQ89SoANJqdGons7xZYEfTfYS5gdBU0akBUqqWpE=; b=U2OenQaREanqjE7LpVxGBw+EqTrcGznmrWiq2EXaR1QAn9rgOxjYuxdSD5C9YYSkWn wIXIZ+XDiU5PSGygkqP1BzfSPkGYKH6N5q2gkj+V4cQJ7LVCk4gyAsiPRfkUyGxWVBsa Ob0iffMV0wdGK+AJB1N/TO6ZMaxf6yVfh8++uOjg/M2ID/8qWucTdQc4rNiDJtX4AeTm wzTlO0jQf5MUI/IPzOSia6eQzCU5+vUSTETj3aMsSL53D5XAtKel4rfPJWmxEKL80hkX 2QWtH8koxs3b7SOC7iCoV5C5PKUtCLi02UGBcCl1fEQkI8J9H/+0FQ4+k40LvNPkrOqQ xewQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:subject:message-id:user-agent :mime-version; bh=qplBQ89SoANJqdGons7xZYEfTfYS5gdBU0akBUqqWpE=; b=k3gS2faKPVhaYZArSAMLFIgsqnSiUP8y9GGmHQdLkvDLbFgBx7GjrqlBknRnj94eaK lYryJr9xyg9LImhVwJmC9OYAYWwZIM6OIcONB6oZwyU4oikAHOve7+UZceeNDu29XJHx kRd9AGR0yL3Fbvj9hVwG1VRJG9OAxepNYSkcoEXQ8stSrIVk39ua/jWu3pRWZ+jZg9Mt wltNtongmugG0O34AhwmMEr765yBdyB+JEjFxJwFb92y1AbJf/AbahAZlz33zBXwDAB4 NaNManJR3c8Arnr44jULzAIRp3sv2BJ5UcZEkdbBqJgMEuZgB96HxZbMpt5FgiUeFtw7 9dDw== X-Gm-Message-State: AOAM532inqWOK1UX9H7+fMWSXGFDQgssJ9WxeeB8cWVpgw97TzYbxUNN jPW1SKO5mAmRlKT9sWOk/GS9/XF+MaxHSw== X-Google-Smtp-Source: ABdhPJyKDjeLfYv+wiyk2LRPU5OUISSDTaObSgGxXilytTwMa4LMdmmvPBm/IGjbCCxzoyJW/9AagQ== X-Received: by 2002:a05:6402:27d3:: with SMTP id c19mr47302212ede.2.1636924092928; Sun, 14 Nov 2021 13:08:12 -0800 (PST) Received: from annie.orcam.me.uk (annie.orcam.me.uk. [2001:4190:8020::48]) by smtp.gmail.com with ESMTPSA id i5sm5446331ejw.121.2021.11.14.13.08.12 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 13:08:12 -0800 (PST) Date: Sun, 14 Nov 2021 21:08:08 +0000 (GMT) From: "Maciej W. Rozycki" To: gcc-patches@gcc.gnu.org Subject: [committed] VAX: Add the `setmemhi' instruction Message-ID: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" The MOVC5 machine instruction has `memset' semantics if encoded with a zero source length[1]: "4. MOVC5 with a zero source length operand is the preferred way to fill a block of memory with the fill character." Use that instruction to implement the `setmemhi' instruction then. Use the AP register in the register deferred mode for the source address to yield the shortest possible encoding of the otherwise unused operand, observing that the address is never dereferenced if the source length is zero. The use of this instruction yields steadily better performance, at least with the Mariah VAX implementation, for a variable-length `memset' call expanded inline as a single MOVC5 operation compared to an equivalent libcall invocation: Length: 1, time elapsed: 0.971789 (builtin), 2.847303 (libcall) Length: 2, time elapsed: 0.907904 (builtin), 2.728259 (libcall) Length: 3, time elapsed: 1.038311 (builtin), 2.917245 (libcall) Length: 4, time elapsed: 0.775305 (builtin), 2.686088 (libcall) Length: 7, time elapsed: 1.112331 (builtin), 2.992968 (libcall) Length: 8, time elapsed: 0.856882 (builtin), 2.764885 (libcall) Length: 15, time elapsed: 1.256086 (builtin), 3.096660 (libcall) Length: 16, time elapsed: 1.001962 (builtin), 2.888131 (libcall) Length: 31, time elapsed: 1.590456 (builtin), 3.774164 (libcall) Length: 32, time elapsed: 1.288909 (builtin), 3.629622 (libcall) Length: 63, time elapsed: 3.430285 (builtin), 5.269789 (libcall) Length: 64, time elapsed: 3.265147 (builtin), 5.113156 (libcall) Length: 127, time elapsed: 6.438772 (builtin), 8.268305 (libcall) Length: 128, time elapsed: 6.268991 (builtin), 8.114557 (libcall) Length: 255, time elapsed: 12.417338 (builtin), 14.259678 (libcall) (times given in seconds per 1000000 `memset' invocations for the given length made in a loop). It is clear from these figures that hardware does data coalescence for consecutive bytes rather than naively copying them one by one, as for lengths that are powers of 2 the figures are consistently lower than ones for their respective next lower lengths. The use of MOVC5 also requires at least 4 bytes less in terms of machine code as it avoids encoding the address of `memset' needed for the CALLS instruction used to make a libcall, as well as extra PUSHL instructions needed to pass arguments to the call as those can be encoded directly as the respective operands of the MOVC5 instruction. It is perhaps worth noting too that for constant lengths we prefer to emit up to 5 individual MOVx instructions rather than a single MOVC5 instruction to clear memory and for consistency we copy this behavior here for filling memory with another value too, even though there may be a performance advantage with a string copy in comparison to a piecemeal copy, e.g.: Length: 40, time elapsed: 2.183192 (string), 2.638878 (piecemeal) But this is something for another change as it will have to be carefully evaluated. [1] DEC STD 032-0 "VAX Architecture Standard", Digital Equipment Corporation, A-DS-EL-00032-00-0 Rev J, December 15, 1989, Section 3.10 "Character-String Instructions", p. 3-163 gcc/ * config/vax/vax.h (SET_RATIO): New macro. * config/vax/vax.md (UNSPEC_SETMEM_FILL): New constant. (setmemhi): New expander. (setmemhi1): New insn and splitter. (*setmemhi1): New insn. gcc/testsuite/ * gcc.target/vax/setmem.c: New test. --- Regression-tested with no change in results. Committed. --- gcc/config/vax/vax.h | 1 gcc/config/vax/vax.md | 64 ++++++++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/vax/setmem.c | 22 +++++++++++ 3 files changed, 87 insertions(+) gcc-vax-setmem.diff Index: gcc/gcc/config/vax/vax.h =================================================================== --- gcc.orig/gcc/config/vax/vax.h +++ gcc/gcc/config/vax/vax.h @@ -433,6 +433,7 @@ enum reg_class { NO_REGS, ALL_REGS, LIM_ move-instruction pairs, we will do a cpymem or libcall instead. */ #define MOVE_RATIO(speed) ((speed) ? 6 : 3) #define CLEAR_RATIO(speed) ((speed) ? 6 : 2) +#define SET_RATIO(speed) ((speed) ? 6 : 2) /* Nonzero if access to memory by bytes is slow and undesirable. */ #define SLOW_BYTE_ACCESS 0 Index: gcc/gcc/config/vax/vax.md =================================================================== --- gcc.orig/gcc/config/vax/vax.md +++ gcc/gcc/config/vax/vax.md @@ -32,6 +32,12 @@ VUNSPEC_PEM ; 'procedure_entry_mask' insn. ]) +;; UNSPEC usage: + +(define_c_enum "unspec" [ + UNSPEC_SETMEM_FILL ; 'fill' operand to 'setmem' insn. +]) + (define_constants [(VAX_AP_REGNUM 12) ; Register 12 contains the argument pointer (VAX_FP_REGNUM 13) ; Register 13 contains the frame pointer @@ -438,6 +444,64 @@ (clobber (reg:CC VAX_PSL_REGNUM))] "reload_completed" "movc3 %2,%1,%0") + +;; This is here to accept 4 arguments and pass the first 3 along +;; to the setmemhi1 pattern that really does the work. +(define_expand "setmemhi" + [(set (match_operand:BLK 0 "memory_operand" "") + (match_operand:QI 2 "general_operand" "")) + (use (match_operand:HI 1 "general_operand" "")) + (match_operand 3 "" "")] + "" + " +{ + emit_insn (gen_setmemhi1 (operands[0], operands[1], operands[2])); + DONE; +}") + +;; The srcaddr operand of MOVC5 is not dereferenced if srclen is zero, so we +;; set it to (%ap) somewhat arbitrarily chosen for the shortest encoding. +(define_insn_and_split "setmemhi1" + [(set (match_operand:BLK 0 "memory_operand" "=o") + (unspec:BLK [(use (match_operand:QI 2 "general_operand" "g"))] + UNSPEC_SETMEM_FILL)) + (use (match_operand:HI 1 "general_operand" "g")) + (clobber (reg:SI 0)) + (clobber (reg:SI 1)) + (clobber (reg:SI 2)) + (clobber (reg:SI 3)) + (clobber (reg:SI 4)) + (clobber (reg:SI 5))] + "" + "#" + "reload_completed" + [(parallel + [(set (match_dup 0) + (unspec:BLK [(use (match_dup 2))] UNSPEC_SETMEM_FILL)) + (use (match_dup 1)) + (clobber (reg:SI 0)) + (clobber (reg:SI 1)) + (clobber (reg:SI 2)) + (clobber (reg:SI 3)) + (clobber (reg:SI 4)) + (clobber (reg:SI 5)) + (clobber (reg:CC VAX_PSL_REGNUM))])] + "") + +(define_insn "*setmemhi1" + [(set (match_operand:BLK 0 "memory_operand" "=o") + (unspec:BLK [(use (match_operand:QI 2 "general_operand" "g"))] + UNSPEC_SETMEM_FILL)) + (use (match_operand:HI 1 "general_operand" "g")) + (clobber (reg:SI 0)) + (clobber (reg:SI 1)) + (clobber (reg:SI 2)) + (clobber (reg:SI 3)) + (clobber (reg:SI 4)) + (clobber (reg:SI 5)) + (clobber (reg:CC VAX_PSL_REGNUM))] + "reload_completed" + "movc5 $0,(%%ap),%2,%1,%0") ;; Extension and truncation insns. Index: gcc/gcc/testsuite/gcc.target/vax/setmem.c =================================================================== --- /dev/null +++ gcc/gcc/testsuite/gcc.target/vax/setmem.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ + +#include + +void * +memset8 (void *block, int c, size_t size) +{ + unsigned char s8 = size; + return __builtin_memset (block, c, s8); +} + +/* Expect assembly like: + + movl 4(%ap),%r6 + movzbl 12(%ap),%r7 + movc5 $0,(%ap),8(%ap),%r7,(%r6) + movl %r6,%r0 + + */ + +/* { dg-final { scan-assembler "\tmovc5 \\\$0,\\\(%ap\\\)," } } */