From patchwork Wed Jun 12 09:15:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1946756 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=ceJQ15gE; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=Qa+BHRfy; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=ceJQ15gE; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=Qa+BHRfy; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VzfzK1z6Qz20KL for ; Wed, 12 Jun 2024 19:15:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6924E385DC3C for ; Wed, 12 Jun 2024 09:15:55 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by sourceware.org (Postfix) with ESMTPS id BECA83858D34 for ; Wed, 12 Jun 2024 09:15:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BECA83858D34 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BECA83858D34 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718183735; cv=none; b=xiCTWDC1Y+2MGzg0MwAIlwkSfhx9SqDNjEsUharsVsuQkhlo74Sg/kdVily9iKZnjtVYipF+j6xIQhE/odTc9bug4MNet1iVzK7+vwoIeiwfKd9/Qu0ZhBJoVweKH+KEZS8XgAq95Q6WcAVTcCBsOpAsEA+wu5SAkU5ISBA+l8I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718183735; c=relaxed/simple; bh=ZRr8tY7ei06kHofPNmCc82DLzyaBxFM2ICjJphr9D5k=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=G9sgguCPtL5WQSx12t95l9vJ061aEV3XHojXVj3Fcq3crNdd9fMU42MkQhWpD5cBIPpfHWMSVUZnN+NyBw0ieoBaPwFZftJ03Gd7gaeVgCmUyQRaup8PM0s0SqkH2TTnIyRNp5IkVvq8opq9C6KrqcuWJRwDiGuumopreg7QSA0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B885B5BE70 for ; Wed, 12 Jun 2024 09:15:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1718183731; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=/k2fUJMNfTNRQ7CEbdFIZcVOqX/O2a4e1blnwLf3nZE=; b=ceJQ15gE/vxFolHliOqs3BhmAWKpYqaC0ancHbttA5QaXkgx2fMo1Q0QvRI1jVkBbixPwi 7vSYOL/spDrbJFL0+CQ3ZhiIdL8MbXFpJtUZi+2J3+PIAZYGJ/h9aac/+wi/LqdqQiilI0 UBC4qs6n3/yI540Hlz/3Wcx64LdBDx0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1718183731; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=/k2fUJMNfTNRQ7CEbdFIZcVOqX/O2a4e1blnwLf3nZE=; b=Qa+BHRfyYQ3AFHlXxtZx85zufsIEamqOCXJcm4cokGD8NnUBD/AIBt7lZjfEJKczeOQ4DK 2ZPZLZ1el8+q98DA== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1718183731; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=/k2fUJMNfTNRQ7CEbdFIZcVOqX/O2a4e1blnwLf3nZE=; b=ceJQ15gE/vxFolHliOqs3BhmAWKpYqaC0ancHbttA5QaXkgx2fMo1Q0QvRI1jVkBbixPwi 7vSYOL/spDrbJFL0+CQ3ZhiIdL8MbXFpJtUZi+2J3+PIAZYGJ/h9aac/+wi/LqdqQiilI0 UBC4qs6n3/yI540Hlz/3Wcx64LdBDx0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1718183731; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=/k2fUJMNfTNRQ7CEbdFIZcVOqX/O2a4e1blnwLf3nZE=; b=Qa+BHRfyYQ3AFHlXxtZx85zufsIEamqOCXJcm4cokGD8NnUBD/AIBt7lZjfEJKczeOQ4DK 2ZPZLZ1el8+q98DA== Date: Wed, 12 Jun 2024 11:15:31 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/3][v3] tree-optimization/114107 - avoid peeling for gaps in more cases MIME-Version: 1.0 X-Spamd-Result: default: False [-1.22 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-0.44)[-0.436]; NEURAL_HAM_SHORT(-0.18)[-0.905]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_ZERO(0.00)[0]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[] X-Spam-Score: -1.22 X-Spam-Level: X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20240612091555.6924E385DC3C@sourceware.org> The following refactors the code to detect necessary peeling for gaps, in particular the PR103116 case when there is no gap but the group size is smaller than the vector size. The testcase in PR114107 shows we fail to SLP for (int i=0; i X-Patchwork-Id: 1946757 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=lyRKagXi; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=gqg+9ZUz; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=lyRKagXi; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=gqg+9ZUz; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vzfzk4sm9z20KL for ; Wed, 12 Jun 2024 19:16:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E722F385DDDE for ; Wed, 12 Jun 2024 09:16:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id C7811385DC32 for ; Wed, 12 Jun 2024 09:15:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C7811385DC32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C7811385DC32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718183756; cv=none; b=T0O5eNi3lQxz9Uwc2spzfYiwWjMPUwg4X1bNvkFZJLam4m8zGk/uH5ru363Yozj5RTgM+3e+0MOjWaoIaNoTTmsjUqtw7x72u05FxYZDGTetzKVJ4I74tP1ZR4rFwG7jFmxGViHNKJ5Qbb5pzCP04f9Ssq4zzlkacNZ4vbuUotg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718183756; c=relaxed/simple; bh=fGAgF9SEFMKbtTmhQA4Pd9zJKfYpnhzfLCXIzOVQ6Ik=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=eaM0iBuWYtJhV7gSSmXs5RJArISWdYIIjSQLsls/H013u/fTr6WNm6sOKwA1aDr8U2kgEO5heOCOlwHlZjad8P9FEj50QgyQGjJpOyRHMELsnwPPpBES2fw8i9m9E6KRo1bYDL5Gh7ALOpoWBKaw5QAkudl9fIUf+xBxrpTPqTM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C583E22A01 for ; Wed, 12 Jun 2024 09:15:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1718183752; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=LG4p6xdg36gL0P4ZUBbQZI/PpRVBn8X31yToodteFr4=; b=lyRKagXiuVn9kSwDqFs89GRGAucLu2SE6aV1L04JeEHRop+8Jbb3SN7Ze+4+NXqqulErS8 h7A6YXu5kKA2CXCcd+i3+zo6ApIyCZOifR/+UVIdvC7i+balMZ9BNLp0SwfYQIkWEW39lg /p70lGpe90HJaHT4EUxQKlzpxh19JgQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1718183752; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=LG4p6xdg36gL0P4ZUBbQZI/PpRVBn8X31yToodteFr4=; b=gqg+9ZUzLtAoPeRvYUDsD439HCrFZYqCZ0y/DPKpBAOEaEQUJvI/satrEF0iOLQE3jIM8p tKq7LfeIPYAZcTCQ== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1718183752; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=LG4p6xdg36gL0P4ZUBbQZI/PpRVBn8X31yToodteFr4=; b=lyRKagXiuVn9kSwDqFs89GRGAucLu2SE6aV1L04JeEHRop+8Jbb3SN7Ze+4+NXqqulErS8 h7A6YXu5kKA2CXCcd+i3+zo6ApIyCZOifR/+UVIdvC7i+balMZ9BNLp0SwfYQIkWEW39lg /p70lGpe90HJaHT4EUxQKlzpxh19JgQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1718183752; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=LG4p6xdg36gL0P4ZUBbQZI/PpRVBn8X31yToodteFr4=; b=gqg+9ZUzLtAoPeRvYUDsD439HCrFZYqCZ0y/DPKpBAOEaEQUJvI/satrEF0iOLQE3jIM8p tKq7LfeIPYAZcTCQ== Date: Wed, 12 Jun 2024 11:15:52 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/3][v3] tree-optimization/115385 - handle more gaps with peeling of a single iteration MIME-Version: 1.0 X-Spam-Score: -1.19 X-Spam-Level: X-Spamd-Result: default: False [-1.19 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-0.41)[-0.408]; NEURAL_HAM_SHORT(-0.18)[-0.920]; MIME_GOOD(-0.10)[text/plain]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; ARC_NA(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; MISSING_XM_UA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; TO_DN_NONE(0.00)[]; MIME_TRACE(0.00)[0:+] X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20240612091616.E722F385DDDE@sourceware.org> The following makes peeling of a single scalar iteration handle more gaps, including non-power-of-two cases. This can be done by rounding up the remaining access to the next power-of-two which ensures that the next scalar iteration will pick at least the number of excess elements we access. I've added a correctness testcase and one x86 specific scanning for the optimization. PR tree-optimization/115385 * tree-vect-stmts.cc (get_group_load_store_type): Peeling of a single scalar iteration is sufficient if we can narrow the access to the next power of two of the bits in the last access. (vectorizable_load): Ensure that the last access is narrowed. * gcc.dg/vect/pr115385.c: New testcase. * gcc.target/i386/vect-pr115385.c: Likewise. --- gcc/testsuite/gcc.dg/vect/pr115385.c | 88 +++++++++++++++++++ gcc/testsuite/gcc.target/i386/vect-pr115385.c | 53 +++++++++++ gcc/tree-vect-stmts.cc | 44 ++++++++-- 3 files changed, 180 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr115385.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr115385.c diff --git a/gcc/testsuite/gcc.dg/vect/pr115385.c b/gcc/testsuite/gcc.dg/vect/pr115385.c new file mode 100644 index 00000000000..a18cd665d7d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr115385.c @@ -0,0 +1,88 @@ +/* { dg-require-effective-target mmap } */ + +#include +#include + +#define COUNT 511 +#define MMAP_SIZE 0x20000 +#define ADDRESS 0x1122000000 +#define TYPE unsigned char + +#ifndef MAP_ANONYMOUS +#define MAP_ANONYMOUS MAP_ANON +#endif + +void __attribute__((noipa)) foo(TYPE * __restrict x, + TYPE *y, int n) +{ + for (int i = 0; i < n; ++i) + { + x[16*i+0] = y[3*i+0]; + x[16*i+1] = y[3*i+1]; + x[16*i+2] = y[3*i+2]; + x[16*i+3] = y[3*i+0]; + x[16*i+4] = y[3*i+1]; + x[16*i+5] = y[3*i+2]; + x[16*i+6] = y[3*i+0]; + x[16*i+7] = y[3*i+1]; + x[16*i+8] = y[3*i+2]; + x[16*i+9] = y[3*i+0]; + x[16*i+10] = y[3*i+1]; + x[16*i+11] = y[3*i+2]; + x[16*i+12] = y[3*i+0]; + x[16*i+13] = y[3*i+1]; + x[16*i+14] = y[3*i+2]; + x[16*i+15] = y[3*i+0]; + } +} + +void __attribute__((noipa)) bar(TYPE * __restrict x, + TYPE *y, int n) +{ + for (int i = 0; i < n; ++i) + { + x[16*i+0] = y[5*i+0]; + x[16*i+1] = y[5*i+1]; + x[16*i+2] = y[5*i+2]; + x[16*i+3] = y[5*i+3]; + x[16*i+4] = y[5*i+4]; + x[16*i+5] = y[5*i+0]; + x[16*i+6] = y[5*i+1]; + x[16*i+7] = y[5*i+2]; + x[16*i+8] = y[5*i+3]; + x[16*i+9] = y[5*i+4]; + x[16*i+10] = y[5*i+0]; + x[16*i+11] = y[5*i+1]; + x[16*i+12] = y[5*i+2]; + x[16*i+13] = y[5*i+3]; + x[16*i+14] = y[5*i+4]; + x[16*i+15] = y[5*i+0]; + } +} + +TYPE x[COUNT * 16]; + +int +main (void) +{ + void *y; + TYPE *end_y; + + y = mmap ((void *) ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (y == MAP_FAILED) + { + perror ("mmap"); + return 1; + } + + end_y = (TYPE *) ((char *) y + MMAP_SIZE); + + foo (x, end_y - COUNT * 3, COUNT); + bar (x, end_y - COUNT * 5, COUNT); + + return 0; +} + +/* We always require a scalar epilogue here but we don't know which + targets support vector composition this way. */ diff --git a/gcc/testsuite/gcc.target/i386/vect-pr115385.c b/gcc/testsuite/gcc.target/i386/vect-pr115385.c new file mode 100644 index 00000000000..a6be9ce4e54 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-pr115385.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -msse4.1 -mno-avx -fdump-tree-vect-details" } */ + +void __attribute__((noipa)) foo(unsigned char * __restrict x, + unsigned char *y, int n) +{ + for (int i = 0; i < n; ++i) + { + x[16*i+0] = y[3*i+0]; + x[16*i+1] = y[3*i+1]; + x[16*i+2] = y[3*i+2]; + x[16*i+3] = y[3*i+0]; + x[16*i+4] = y[3*i+1]; + x[16*i+5] = y[3*i+2]; + x[16*i+6] = y[3*i+0]; + x[16*i+7] = y[3*i+1]; + x[16*i+8] = y[3*i+2]; + x[16*i+9] = y[3*i+0]; + x[16*i+10] = y[3*i+1]; + x[16*i+11] = y[3*i+2]; + x[16*i+12] = y[3*i+0]; + x[16*i+13] = y[3*i+1]; + x[16*i+14] = y[3*i+2]; + x[16*i+15] = y[3*i+0]; + } +} + +void __attribute__((noipa)) bar(unsigned char * __restrict x, + unsigned char *y, int n) +{ + for (int i = 0; i < n; ++i) + { + x[16*i+0] = y[5*i+0]; + x[16*i+1] = y[5*i+1]; + x[16*i+2] = y[5*i+2]; + x[16*i+3] = y[5*i+3]; + x[16*i+4] = y[5*i+4]; + x[16*i+5] = y[5*i+0]; + x[16*i+6] = y[5*i+1]; + x[16*i+7] = y[5*i+2]; + x[16*i+8] = y[5*i+3]; + x[16*i+9] = y[5*i+4]; + x[16*i+10] = y[5*i+0]; + x[16*i+11] = y[5*i+1]; + x[16*i+12] = y[5*i+2]; + x[16*i+13] = y[5*i+3]; + x[16*i+14] = y[5*i+4]; + x[16*i+15] = y[5*i+0]; + } +} + +/* { dg-final { scan-tree-dump "Data access with gaps requires scalar epilogue loop" "vect"} } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"} } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f8c4b33878d..701a44e44cd 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2151,11 +2151,24 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, nunits, &tem, &remain) || maybe_lt (remain + group_size, nunits))) { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "peeling for gaps insufficient for " - "access\n"); - return false; + /* But peeling a single scalar iteration is enough if + we can use the next power-of-two sized partial + access. */ + unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size; + if (!nunits.is_constant (&cnunits) + || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf) + || ((cremain = remain.to_constant (), true) + && ((cpart_size = (1 << ceil_log2 (cremain))) != cnunits) + && vector_vector_composition_type + (vectype, cnunits / cpart_size, + &half_vtype) == NULL_TREE)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "peeling for gaps insufficient for " + "access\n"); + return false; + } } /* If this is single-element interleaving with an element @@ -11597,6 +11610,27 @@ vectorizable_load (vec_info *vinfo, gcc_assert (new_vtype || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)); + /* But still reduce the access size to the next + required power-of-two so peeling a single + scalar iteration is sufficient. */ + unsigned HOST_WIDE_INT cremain; + if (remain.is_constant (&cremain)) + { + unsigned HOST_WIDE_INT cpart_size + = 1 << ceil_log2 (cremain); + if (known_gt (nunits, cpart_size) + && constant_multiple_p (nunits, cpart_size, + &num)) + { + tree ptype; + new_vtype + = vector_vector_composition_type (vectype, + num, + &ptype); + if (new_vtype) + ltype = ptype; + } + } } } tree offset From patchwork Wed Jun 12 09:16:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1946758 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=sgh/bt59; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=XwnXsZtb; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=uoh/HyWR; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=VF0xN5Y5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vzg0s3cDNz20KL for ; Wed, 12 Jun 2024 19:17:17 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 22DAE385DDF1 for ; Wed, 12 Jun 2024 09:17:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id E0CED385DDE7 for ; Wed, 12 Jun 2024 09:16:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E0CED385DDE7 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E0CED385DDE7 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718183769; cv=none; b=hDY8hphHXqoO0fTPc8IarGZ5inhQ3npfx5NNXcBtNIaBi82pBzTYgvbOwuZQ0KuiywhoOipZ2jzAczZIA6feDPqGc+7UtgYbHGrz8/2hLCUlkfHOrnx14ZCxn5E9kBjv7hJGeVHyW4fVtwFDsV02hDEKpGZKScpz28d1yg2OFAE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718183769; c=relaxed/simple; bh=McagqQwWsyWrtPoizlOwAbaHMJ45nSzsWXNMr7V76ks=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=M9d7BfX6oRFal5o9xjTzhJXEMkxwTTFRl10m7uS2z9FDWioqU8EvRH9R6XR0KzeMVfeOSVk2xtkrU3El+fjqf3fw+T7juJ++5FCfBIsuJfnnMbAuV/1JpdemlbTPR4bf3mnrK6AOjsnObjLHeRh1ZMARtx3FvTnR23xWz4Qd9mA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 00E0633FF4 for ; Wed, 12 Jun 2024 09:16:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1718183766; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=/fhvK/bV+lWUGiUHVnBNzHesR3o86Ssaok85mZKFSiM=; b=sgh/bt59MMRZBNlptCRb8pCyFcKGO7nzfhC7gObNzUt8tS/5ZQpaYW8Jswye0wr/wEwa3h TbQWlZHyUAeqvGjPKZh/vQScMPi/8OGo2UYt/lOwUCyKV2SzapThq0WwEnpcLJdFzvxupJ H1/GhQEpzusUw04WhHp83PxBZCnkT8A= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1718183766; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=/fhvK/bV+lWUGiUHVnBNzHesR3o86Ssaok85mZKFSiM=; b=XwnXsZtbSxwHiuYE8IVmATby5ihEm14CdusonhooeY5FCNaVxzs4UgOmUwih9vMOgaGzfp munvgZ7PcjA6YzAA== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1718183764; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=/fhvK/bV+lWUGiUHVnBNzHesR3o86Ssaok85mZKFSiM=; b=uoh/HyWRxdqeEWC+PUVKldOZcE/f3AcwAMPm5LoZPOL3CKmvDU94N2B3Ed2sIv7PFbY5et BCOIRxo37terC9C8F91FWVcPXNjOszLVEFj9VmFqSdy9zmG6uSJuVT8A++9nqM9JRnvRvQ QmtM1mW5mCK+yfLeyGVxXtdi/Yt8aLs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1718183764; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=/fhvK/bV+lWUGiUHVnBNzHesR3o86Ssaok85mZKFSiM=; b=VF0xN5Y5VFev5ROBQvKW0itAkyrQeYR2Jzz2c4fYXoLvC+6GNe5gP9WpDosu9mT9UoE+oU AtYO+dm6pdpkzkAg== Date: Wed, 12 Jun 2024 11:16:03 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 3/3][v3] Improve code generation of strided SLP loads MIME-Version: 1.0 X-Spamd-Result: default: False [-0.88 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_SHORT(-0.19)[-0.929]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-0.09)[-0.090]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_ONE(0.00)[1]; MISSING_XM_UA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; FROM_HAS_DN(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[murzim.nue2.suse.org:helo] X-Spam-Score: -0.88 X-Spam-Level: X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20240612091715.22DAE385DDF1@sourceware.org> This avoids falling back to elementwise accesses for strided SLP loads when the group size is not a multiple of the vector element size. Instead we can use a smaller vector or integer type for the load. For stores we can do the same though restrictions on stores we handle and the fact that store-merging covers up makes this mostly effective for cost modeling which shows for gcc.target/i386/vect-strided-3.c which we now vectorize with V4SI vectors rather than just V2SI ones. For all of this there's still the opportunity to use non-uniform accesses, say for a 6-element group with a VF of two do V4SI, { V2SI, V2SI }, V4SI. But that's for a possible followup. * gcc.target/i386/vect-strided-1.c: New testcase. * gcc.target/i386/vect-strided-2.c: Likewise. * gcc.target/i386/vect-strided-3.c: Likewise. * gcc.target/i386/vect-strided-4.c: Likewise. --- .../gcc.target/i386/vect-strided-1.c | 24 +++++ .../gcc.target/i386/vect-strided-2.c | 17 +++ .../gcc.target/i386/vect-strided-3.c | 20 ++++ .../gcc.target/i386/vect-strided-4.c | 20 ++++ gcc/tree-vect-stmts.cc | 100 ++++++++---------- 5 files changed, 127 insertions(+), 54 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/vect-strided-1.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-strided-2.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-strided-3.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-strided-4.c diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-1.c b/gcc/testsuite/gcc.target/i386/vect-strided-1.c new file mode 100644 index 00000000000..db4a06711f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-strided-1.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mno-avx" } */ + +void foo (int * __restrict a, int *b, int s) +{ + for (int i = 0; i < 1024; ++i) + { + a[8*i+0] = b[s*i+0]; + a[8*i+1] = b[s*i+1]; + a[8*i+2] = b[s*i+2]; + a[8*i+3] = b[s*i+3]; + a[8*i+4] = b[s*i+4]; + a[8*i+5] = b[s*i+5]; + a[8*i+6] = b[s*i+4]; + a[8*i+7] = b[s*i+5]; + } +} + +/* Three two-element loads, two four-element stores. On ia32 we elide + a permute and perform a redundant load. */ +/* { dg-final { scan-assembler-times "movq" 2 } } */ +/* { dg-final { scan-assembler-times "movhps" 2 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movhps" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movups" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-2.c b/gcc/testsuite/gcc.target/i386/vect-strided-2.c new file mode 100644 index 00000000000..6fd64e28cf0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-strided-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mno-avx" } */ + +void foo (int * __restrict a, int *b, int s) +{ + for (int i = 0; i < 1024; ++i) + { + a[4*i+0] = b[s*i+0]; + a[4*i+1] = b[s*i+1]; + a[4*i+2] = b[s*i+0]; + a[4*i+3] = b[s*i+1]; + } +} + +/* One two-element load, one four-element store. */ +/* { dg-final { scan-assembler-times "movq" 1 } } */ +/* { dg-final { scan-assembler-times "movups" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-3.c b/gcc/testsuite/gcc.target/i386/vect-strided-3.c new file mode 100644 index 00000000000..b462701a0b2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-strided-3.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mno-avx -fno-tree-slp-vectorize" } */ + +void foo (int * __restrict a, int *b, int s) +{ + if (s >= 6) + for (int i = 0; i < 1024; ++i) + { + a[s*i+0] = b[4*i+0]; + a[s*i+1] = b[4*i+1]; + a[s*i+2] = b[4*i+2]; + a[s*i+3] = b[4*i+3]; + a[s*i+4] = b[4*i+0]; + a[s*i+5] = b[4*i+1]; + } +} + +/* While the vectorizer generates 6 uint64 stores. */ +/* { dg-final { scan-assembler-times "movq" 4 } } */ +/* { dg-final { scan-assembler-times "movhps" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-4.c b/gcc/testsuite/gcc.target/i386/vect-strided-4.c new file mode 100644 index 00000000000..dd922926a2a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-strided-4.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.2 -mno-avx -fno-tree-slp-vectorize" } */ + +void foo (int * __restrict a, int * __restrict b, int *c, int s) +{ + if (s >= 2) + for (int i = 0; i < 1024; ++i) + { + a[s*i+0] = c[4*i+0]; + a[s*i+1] = c[4*i+1]; + b[s*i+0] = c[4*i+2]; + b[s*i+1] = c[4*i+3]; + } +} + +/* Vectorization factor two, two two-element stores to a using movq + and two two-element stores to b via pextrq/movhps of the high part. */ +/* { dg-final { scan-assembler-times "movq" 2 } } */ +/* { dg-final { scan-assembler-times "pextrq" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movhps" 2 { target { ia32 } } } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 701a44e44cd..d148e11a514 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2036,15 +2036,10 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, first_dr_info = STMT_VINFO_DR_INFO (SLP_TREE_SCALAR_STMTS (slp_node)[0]); if (STMT_VINFO_STRIDED_P (first_stmt_info)) - { - /* Try to use consecutive accesses of DR_GROUP_SIZE elements, - separated by the stride, until we have a complete vector. - Fall back to scalar accesses if that isn't possible. */ - if (multiple_p (nunits, group_size)) - *memory_access_type = VMAT_STRIDED_SLP; - else - *memory_access_type = VMAT_ELEMENTWISE; - } + /* Try to use consecutive accesses of as many elements as possible, + separated by the stride, until we have a complete vector. + Fall back to scalar accesses if that isn't possible. */ + *memory_access_type = VMAT_STRIDED_SLP; else { int cmp = compare_step_with_zero (vinfo, stmt_info); @@ -8514,12 +8509,29 @@ vectorizable_store (vec_info *vinfo, tree lvectype = vectype; if (slp) { - if (group_size < const_nunits - && const_nunits % group_size == 0) + HOST_WIDE_INT n = gcd (group_size, const_nunits); + if (n == const_nunits) { - nstores = const_nunits / group_size; - lnel = group_size; - ltype = build_vector_type (elem_type, group_size); + int mis_align = dr_misalignment (first_dr_info, vectype); + dr_alignment_support dr_align + = vect_supportable_dr_alignment (vinfo, dr_info, vectype, + mis_align); + if (dr_align == dr_aligned + || dr_align == dr_unaligned_supported) + { + nstores = 1; + lnel = const_nunits; + ltype = vectype; + lvectype = vectype; + alignment_support_scheme = dr_align; + misalignment = mis_align; + } + } + else if (n > 1) + { + nstores = const_nunits / n; + lnel = n; + ltype = build_vector_type (elem_type, n); lvectype = vectype; /* First check if vec_extract optab doesn't support extraction @@ -8528,7 +8540,7 @@ vectorizable_store (vec_info *vinfo, machine_mode vmode; if (!VECTOR_MODE_P (TYPE_MODE (vectype)) || !related_vector_mode (TYPE_MODE (vectype), elmode, - group_size).exists (&vmode) + n).exists (&vmode) || (convert_optab_handler (vec_extract_optab, TYPE_MODE (vectype), vmode) == CODE_FOR_nothing)) @@ -8539,8 +8551,8 @@ vectorizable_store (vec_info *vinfo, re-interpreting it as the original vector type if supported. */ unsigned lsize - = group_size * GET_MODE_BITSIZE (elmode); - unsigned int lnunits = const_nunits / group_size; + = n * GET_MODE_BITSIZE (elmode); + unsigned int lnunits = const_nunits / n; /* If we can't construct such a vector fall back to element extracts from the original vector type and element size stores. */ @@ -8553,7 +8565,7 @@ vectorizable_store (vec_info *vinfo, != CODE_FOR_nothing)) { nstores = lnunits; - lnel = group_size; + lnel = n; ltype = build_nonstandard_integer_type (lsize, 1); lvectype = build_vector_type (ltype, nstores); } @@ -8564,24 +8576,6 @@ vectorizable_store (vec_info *vinfo, issue exists here for reasonable archs. */ } } - else if (group_size >= const_nunits - && group_size % const_nunits == 0) - { - int mis_align = dr_misalignment (first_dr_info, vectype); - dr_alignment_support dr_align - = vect_supportable_dr_alignment (vinfo, dr_info, vectype, - mis_align); - if (dr_align == dr_aligned - || dr_align == dr_unaligned_supported) - { - nstores = 1; - lnel = const_nunits; - ltype = vectype; - lvectype = vectype; - alignment_support_scheme = dr_align; - misalignment = mis_align; - } - } ltype = build_aligned_type (ltype, TYPE_ALIGN (elem_type)); ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); } @@ -10366,34 +10360,32 @@ vectorizable_load (vec_info *vinfo, auto_vec dr_chain; if (memory_access_type == VMAT_STRIDED_SLP) { - if (group_size < const_nunits) + HOST_WIDE_INT n = gcd (group_size, const_nunits); + /* Use the target vector type if the group size is a multiple + of it. */ + if (n == const_nunits) + { + nloads = 1; + lnel = const_nunits; + ltype = vectype; + } + /* Else use the biggest vector we can load the group without + accessing excess elements. */ + else if (n > 1) { - /* First check if vec_init optab supports construction from vector - elts directly. Otherwise avoid emitting a constructor of - vector elements by performing the loads using an integer type - of the same size, constructing a vector of those and then - re-interpreting it as the original vector type. This avoids a - huge runtime penalty due to the general inability to perform - store forwarding from smaller stores to a larger load. */ tree ptype; tree vtype - = vector_vector_composition_type (vectype, - const_nunits / group_size, + = vector_vector_composition_type (vectype, const_nunits / n, &ptype); if (vtype != NULL_TREE) { - nloads = const_nunits / group_size; - lnel = group_size; + nloads = const_nunits / n; + lnel = n; lvectype = vtype; ltype = ptype; } } - else - { - nloads = 1; - lnel = const_nunits; - ltype = vectype; - } + /* Else fall back to the default element-wise access. */ ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype))); } /* Load vector(1) scalar_type if it's 1 element-wise vectype. */