[RFC,PR,80689] Copy small aggregates element-wise

Hi,

I'd like to request comments to the patch below which aims to fix PR
80689, which is an instance of a store-to-load forwarding stall on
x86_64 CPUs in the Image Magick benchmark, which is responsible for a
slow down of up to 9% compared to gcc 6, depending on options and HW
used.  (Actually, I have just seen 24% in one specific combination but
for various reasons can no longer verify it today.)

The revision causing the regression is 237074, which increased the
size of the mode for copying aggregates "by pieces" to 128 bits,
incurring big stalls when the values being copied are also still being
stored in a smaller data type or if the copied values are loaded in a
smaller types shortly afterwards.  Such situations happen in Image
Magick even across calls, which means that any non-IPA flow-sensitive
approach would not detect them.  Therefore, the patch simply changes
the way we copy small BLKmode data that are simple combinations of
records and arrays (meaning no unions, bit-fields but also character
arrays are disallowed) and simply copies them one field and/or element
at a time.

"Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
(the structure in the benchmark has 32 bytes) but is subject to change
after more benchmarking and is actually zero - meaning element copying
never happens - on other architectures.  I believe that any
architecture with a store buffer can benefit but it's probably better
to leave it to their maintainers to find a different default value.  I
am not sure this is how such HW-dependant decisions should be done and
is the primary reason, why I am sending this RFC first.

I have decided to implement this change at the expansion level because
at that point the type information is still readily available and at
the same time we can also handle various implicit copies, for example
those passing a parameter.  I found I could re-use some bits and
pieces of tree-SRA and so I did, creating tree-sra.h header file in
the process.

I am fully aware that in the final patch the new parameter, or indeed
any new parameters, need to be documented.  I have skipped that
intentionally now and will write the documentation if feedback here is
generally good.

I have bootstrapped and tested this patch on x86_64-linux, with
different values of the parameter and only found problems with
unreasonably high values leading to OOM.  I have done the same with a
previous version of the patch which was equivalent to the limit being
64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
into failures of tests which assumed that structure padding was copied
in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
for example gcc.dg/vmx/varargs-4.c).

The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
8% at -O2 and -Ofast compilation levels respectively on one particular
new AMD CPU and by 6% and 3% on one particular old Intel machine.

Thanks in advance for any comments,

Martin

2017-10-12  Martin Jambor  <mjambor@suse.cz>

	PR target/80689
	* tree-sra.h: New file.
	* ipa-prop.h: Moved declaration of build_ref_for_offset to
	tree-sra.h.
	* expr.c: Include params.h and tree-sra.h.
	(emit_move_elementwise): New function.
	(store_expr_with_bounds): Optionally use it.
	* ipa-cp.c: Include tree-sra.h.
	* params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
	* config/i386/i386.c (ix86_option_override_internal): Set
	PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
	* tree-sra.c: Include tree-sra.h.
	(scalarizable_type_p): Renamed to
	simple_mix_of_records_and_arrays_p, made public, renamed the
	second parameter to allow_char_arrays.
	(extract_min_max_idx_from_array): New function.
	(completely_scalarize): Moved bits of the function to
	extract_min_max_idx_from_array.

	testsuite/
	* gcc.target/i386/pr80689-1.c: New test.
---
 gcc/config/i386/i386.c                    |   4 ++
 gcc/expr.c                                | 103 ++++++++++++++++++++++++++++--
 gcc/ipa-cp.c                              |   1 +
 gcc/ipa-prop.h                            |   4 --
 gcc/params.def                            |   6 ++
 gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++++++++++
 gcc/tree-sra.c                            |  86 +++++++++++++++----------
 gcc/tree-sra.h                            |  33 ++++++++++
 8 files changed, 233 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
 create mode 100644 gcc/tree-sra.h

Message ID	20171013161353.uvlix6gfxz7ir4y7@virgil.suse.cz
State	New
Headers	show Return-Path: <gcc-patches-return-464157-incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-464157-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=<UNKNOWN>) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="a58Gymz8"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yDCTG2qxjz9t38 for <incoming@patchwork.ozlabs.org>; Sat, 14 Oct 2017 03:14:09 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=pk84dWCWllr05D2k0mQdKkQK48AMZZsRmLiMMZ6GGPOT3KAy29 Yjq6QdaBhL4yoZzsopgQf8dRYK9DnpxfbXmKD5rX420Y42fwBWtNCOiLIwlrP42x kXOStOrNik8lcV8RLLOQ7+pERDm4BblryK8TxYF83hRK+Bz4/E9TmCwIA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=LB/s008SA3cTSWBefXtPQSxGcFY=; b=a58Gymz8wrjIhdge+Bi3 FtChpMkbzAgRIi9l5Snsc1h5JVUBD/9ksqAroiKk08WRQyA4W8zJjsYYPf2Kwttg crIbC6i6wf97FGINZQ7t4ARSN7upgSK/0ZMbzQmoyM6cC9DQTkEUUy3DSDnx+TT9 lGg19dCx7sOuoDtsEgXvBT8= Received: (qmail 78084 invoked by alias); 13 Oct 2017 16:14:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: <gcc-patches.gcc.gnu.org> List-Unsubscribe: <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org> List-Archive: <http://gcc.gnu.org/ml/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-help@gcc.gnu.org> Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 78062 invoked by uid 89); 13 Oct 2017 16:13:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 13 Oct 2017 16:13:56 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 77CC6ABCA; Fri, 13 Oct 2017 16:13:53 +0000 (UTC) Date: Fri, 13 Oct 2017 18:13:53 +0200 From: Martin Jambor <mjambor@suse.cz> To: GCC Patches <gcc-patches@gcc.gnu.org> Cc: Jan Hubicka <hubicka@ucw.cz> Subject: [RFC, PR 80689] Copy small aggregates element-wise Message-ID: <20171013161353.uvlix6gfxz7ir4y7@virgil.suse.cz> Mail-Followup-To: GCC Patches <gcc-patches@gcc.gnu.org>, Jan Hubicka <hubicka@ucw.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: NeoMutt/20170609 (1.8.3) X-IsSubscribed: yes
Series	[RFC,PR,80689] Copy small aggregates element-wise \| expand [RFC,PR,80689] Copy small aggregates element-wise

[RFC,PR,80689] Copy small aggregates element-wise

Commit Message

Comments

Patch