From patchwork Wed Aug 21 05:39:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1974636 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=KG4bGVRM; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WpZtL5N4Vz1yf6 for ; Wed, 21 Aug 2024 15:40:24 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 26D8C3850212 for ; Wed, 21 Aug 2024 05:40:22 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by sourceware.org (Postfix) with ESMTPS id 8FD953858D29 for ; Wed, 21 Aug 2024 05:39:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8FD953858D29 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8FD953858D29 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724218799; cv=none; b=pFdrVDQbb/BSTHNcasPSGWjieU22JuaaL9kUATF0JaGCnlMf4mlyAxj8sAAFHZcT/ISTJFQxUK4rAn4KiL03RFPbppC5Qmutw2wyWH1NqE2D1WP63YYjTwGE9yYAt1erNI9FsLsTakybD/+S/XIR95J1TnM7yYGlX8i7RDVJKuc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724218799; c=relaxed/simple; bh=vjzuXzWTuSSV+SCM6BZ0qe0+X9JPh7abyXHmaNWpTgs=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=K74ZwOfbzqbYV8mdQx6dJh57p289RJwp0fBEPjVhlEnEhAGhmDBB7x57lSbYaPCJQXKn7XeU3gactxAvg4FHjwdJW38JQXEeaMTkeLokAWwYEswyI0NQY/8v7cEggr8v8uwdZtHMpIm4W6I9Ft2tK6ybeW8WjnXfwpmcgNk6lL8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724218797; x=1755754797; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=vjzuXzWTuSSV+SCM6BZ0qe0+X9JPh7abyXHmaNWpTgs=; b=KG4bGVRMDh2HfjQhTgZn+o6DjivkIAbdW4o8eJnFj4P6typQ/2IBFjp9 ae8Fh9xC/HlMpRwyTgY38Ih6F/eTWoCn7FLQEapnxONDT4OcLnoJZvNTm WnyH+48mVq/WuBbj4IhLzCVXIPSBNG8ykvDRaU50HW5IRUdH9GLtx6cQT Dz/XGRKSZMgtHfZgGPiQiuLjYMwNtHFe2fEOvzUoeIb9CpmBEA72uM592 whuv8NpkoCiheENNMPIRWlQSvEo5VM6SJOOHSMhcFa5NC/w1F3PwQVplU O1qwGkOHfiS8FgfNoLrYJO7D0KkJZCVFxWa1+eHPj+dEJOvMi5VtCICRj A==; X-CSE-ConnectionGUID: QmzV+/1KQOy5ao4mYdir9A== X-CSE-MsgGUID: 0J4ah6tmRDa8Udhu7fGV2g== X-IronPort-AV: E=McAfee;i="6700,10204,11170"; a="25454080" X-IronPort-AV: E=Sophos;i="6.10,164,1719903600"; d="scan'208";a="25454080" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Aug 2024 22:39:56 -0700 X-CSE-ConnectionGUID: T0+ytFKVR2K/S2v3m8X4Tg== X-CSE-MsgGUID: Py0Ya39kSWWybnpJmCoBuw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,164,1719903600"; d="scan'208";a="60940327" Received: from shliclel4217.sh.intel.com ([10.239.240.127]) by fmviesa008.fm.intel.com with ESMTP; 20 Aug 2024 22:39:54 -0700 From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH] Align ix86_{move_max,store_max} with vectorizer. Date: Wed, 21 Aug 2024 13:39:53 +0800 Message-Id: <20240821053953.1727019-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org When none of mprefer-vector-width, avx256_optimal/avx128_optimal, avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will set ix86_{move_max,store_max} as max available vector length except for AVX part. if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) opts->x_ix86_move_max = PVW_AVX512; else opts->x_ix86_move_max = PVW_AVX128; So for -mavx2, vectorizer will choose 256-bit for vectorization, but 128-bit is used for struct copy, there could be a potential STLF issue due to this "misalign". The patch fixes that and improved 538.imagick_r by ~30% for -march=x86-64-v3 -O2. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Any comments? gcc/ChangeLog: * config/i386/i386-options.cc (ix86_option_override_internal): set ix86_{move_max,store_max} to PVW_AVX256 when TARGET_AVX instead of PVW_AVX128. gcc/testsuite/ChangeLog: * gcc.target/i386/pieces-memcpy-10.c: Add -mprefer-vector-width=128. * gcc.target/i386/pieces-memcpy-6.c: Ditto. * gcc.target/i386/pieces-memset-38.c: Ditto. * gcc.target/i386/pieces-memset-40.c: Ditto. * gcc.target/i386/pieces-memset-41.c: Ditto. * gcc.target/i386/pieces-memset-42.c: Ditto. * gcc.target/i386/pieces-memset-43.c: Ditto. * gcc.target/i386/pieces-strcpy-2.c: Ditto. * gcc.target/i386/pieces-memcpy-22.c: New test. * gcc.target/i386/pieces-memset-51.c: New test. * gcc.target/i386/pieces-strcpy-3.c: New test. --- gcc/config/i386/i386-options.cc | 6 ++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c | 2 +- gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c | 12 ++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c | 2 +- gcc/testsuite/gcc.target/i386/pieces-memset-38.c | 2 +- gcc/testsuite/gcc.target/i386/pieces-memset-40.c | 2 +- gcc/testsuite/gcc.target/i386/pieces-memset-41.c | 2 +- gcc/testsuite/gcc.target/i386/pieces-memset-42.c | 2 +- gcc/testsuite/gcc.target/i386/pieces-memset-43.c | 2 +- gcc/testsuite/gcc.target/i386/pieces-memset-51.c | 12 ++++++++++++ gcc/testsuite/gcc.target/i386/pieces-strcpy-2.c | 2 +- gcc/testsuite/gcc.target/i386/pieces-strcpy-3.c | 15 +++++++++++++++ 12 files changed, 53 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-51.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-strcpy-3.c diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index f423455b363..f79257cc764 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -3023,6 +3023,9 @@ ix86_option_override_internal (bool main_args_p, if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) opts->x_ix86_move_max = PVW_AVX512; + /* Align with vectorizer to avoid potential STLF issue. */ + else if (TARGET_AVX_P (opts->x_ix86_isa_flags)) + opts->x_ix86_move_max = PVW_AVX256; else opts->x_ix86_move_max = PVW_AVX128; } @@ -3047,6 +3050,9 @@ ix86_option_override_internal (bool main_args_p, if (TARGET_AVX512F_P (opts->x_ix86_isa_flags) && TARGET_EVEX512_P (opts->x_ix86_isa_flags2)) opts->x_ix86_store_max = PVW_AVX512; + /* Align with vectorizer to avoid potential STLF issue. */ + else if (TARGET_AVX_P (opts->x_ix86_isa_flags)) + opts->x_ix86_store_max = PVW_AVX256; else opts->x_ix86_store_max = PVW_AVX128; } diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c index 5faee21f9b9..53ad0b3be44 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mprefer-vector-width=128 -mtune=sandybridge" } */ extern char *dst, *src; diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c new file mode 100644 index 00000000000..605b3623ffc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-22.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c index 5f99cc98c47..cfd2a86cf33 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { ! ia32 } } } */ -/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mprefer-vector-width=128 -mtune=sandybridge" } */ extern char *dst, *src; diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-38.c b/gcc/testsuite/gcc.target/i386/pieces-memset-38.c index ed4a24a54fd..ddd194debd5 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memset-38.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-38.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=sandybridge" } */ +/* { dg-options "-O2 -mno-avx512f -mavx2 -mprefer-vector-width=128 -mtune=sandybridge" } */ extern char *dst; diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-40.c b/gcc/testsuite/gcc.target/i386/pieces-memset-40.c index 86358c99a83..5878876550c 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memset-40.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-40.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=sandybridge" } */ +/* { dg-options "-O2 -mno-avx512f -mavx2 -mprefer-vector-width=128 -mtune=sandybridge" } */ /* Cope with --enable-frame-pointer, Solaris/x86 -mstackrealign default. */ /* { dg-additional-options "-fomit-frame-pointer -mno-stackrealign" } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-41.c b/gcc/testsuite/gcc.target/i386/pieces-memset-41.c index d7a27f52983..27a6c8ad139 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memset-41.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-41.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge -mno-stackrealign" } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mprefer-vector-width=128 -mtune=sandybridge -mno-stackrealign" } */ /* Cope with --enable-frame-pointer. */ /* { dg-additional-options "-fomit-frame-pointer" } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-42.c b/gcc/testsuite/gcc.target/i386/pieces-memset-42.c index df0c122aae7..103da699ae5 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memset-42.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-42.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mprefer-vector-width=128 -mtune=sandybridge" } */ extern char *dst; diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-43.c b/gcc/testsuite/gcc.target/i386/pieces-memset-43.c index 2f2179c2df9..f1494e17610 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memset-43.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-43.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mprefer-vector-width=128 -mtune=sandybridge" } */ extern char *dst; diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-51.c b/gcc/testsuite/gcc.target/i386/pieces-memset-51.c new file mode 100644 index 00000000000..192ec0d1647 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-51.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 64); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-strcpy-2.c b/gcc/testsuite/gcc.target/i386/pieces-strcpy-2.c index 90446edb4f3..9bb94b7419b 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-strcpy-2.c +++ b/gcc/testsuite/gcc.target/i386/pieces-strcpy-2.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { ! ia32 } } } */ -/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mprefer-vector-width=128 -mtune=sandybridge" } */ extern char *strcpy (char *, const char *); diff --git a/gcc/testsuite/gcc.target/i386/pieces-strcpy-3.c b/gcc/testsuite/gcc.target/i386/pieces-strcpy-3.c new file mode 100644 index 00000000000..df7571b547f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-strcpy-3.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */ + +extern char *strcpy (char *, const char *); + +void +foo (char *s) +{ + strcpy (s, + "1234567890abcdef123456abcdef5678123456abcdef567abcdef678" + "1234567"); +} + +/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */