From patchwork Thu Oct 24 14:19:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 2001770 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=jtyyhaWF; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=64p0QyLU; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=YUxMv/zG; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=cYaIPayf; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XZ7P65sDmz1xtp for ; Fri, 25 Oct 2024 01:20:42 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EFFA43858C2B for ; Thu, 24 Oct 2024 14:20:40 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id DF8293858C98 for ; Thu, 24 Oct 2024 14:19:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DF8293858C98 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DF8293858C98 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729779591; cv=none; b=vuoL5ev5PYbQU+0J9VoODUTHVQauS5323/xcD2AjnsGYmBbio5D9DJV2y7LJ2w0FGFsN6p6OsP63gsIXZ6zOVkUtvAE5ZAfuT8QrfiriacwL9Iy2fH6aL2NH1bCBB1D4GQiKxfNX8UCmyquUIDXXsssP35ycX0ItcN2/LidNPjg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729779591; c=relaxed/simple; bh=pgGoM9oFEhQg3gqvLUS+n6K+bnOtdo03HqQbzeM/edY=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version:Message-Id; b=c98uz0svV5XGgjjXOxyFXebS1MFdn5aje/oxdlmIT5wEjiko1lr9zP+uuSKbHz4tXs4xLp6MXxLE0hhi8uWAa2e3tIbo4Vbbw+gc4hQ8tk48rF12vNcptsKQqRKHAOFySUvbhnLNOA9HQu4D1BJ4XQfbeMh7yeJVjBtJoRCs75I= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D20EA1F395 for ; Thu, 24 Oct 2024 14:19:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1729779588; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=fCryYY+iAt7ZaZTtUEZoV2kiccS/3Asc8MBeIy1830A=; b=jtyyhaWFvwByqq/crNfr2qWF2giOLu/4mVQdq6N2mEMnnSzJWxQ7JdcyrRn2sNTsPqlYYf zPO2EEWPyyUBONH9M35twcHOpyxT1b4k8r61ZZ/YHFLz8Ek4VR1W9wQX16PEG8b77YO8rL h1KoSHX0vTnpH/BnLQBOCRZ+FCBUlkA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1729779588; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=fCryYY+iAt7ZaZTtUEZoV2kiccS/3Asc8MBeIy1830A=; b=64p0QyLU+J4CbviNyNl9gC8WnKxk47gVWuutosmazMKkqxgeejeroKICWqKn16cJcDZJTl ttEY2iLfwNH/7LCw== Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="YUxMv/zG"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=cYaIPayf DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1729779587; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=fCryYY+iAt7ZaZTtUEZoV2kiccS/3Asc8MBeIy1830A=; b=YUxMv/zGBZ6aQus251rwUYu4qHlfkTxDqohXkRvDh2ROChgXzkeRI1u0l5sFrlEseXzJoR zCV2Oj/Hhg0o/yntX3z0p2TdwtkywazH42Sem1dUY8ienzZqeKmVFGB+0mw8iCTDZzLdXx fLx88sz7fT1Ma64QbzRflF+PT4JhIlY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1729779587; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=fCryYY+iAt7ZaZTtUEZoV2kiccS/3Asc8MBeIy1830A=; b=cYaIPayf2L3EwF9uk9Zcz2KqygXsyH6A5gMkw8fq4n6xSj2pBqxTKLgD9ZEXfVONliI2FO HsnTERUDtCilVgDw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id B861D136F5 for ; Thu, 24 Oct 2024 14:19:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id X5hkK4NXGmeeLQAAD6G6ig (envelope-from ) for ; Thu, 24 Oct 2024 14:19:47 +0000 Date: Thu, 24 Oct 2024 16:19:39 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2][v2] Relax vect_check_scalar_mask check MIME-Version: 1.0 Message-Id: <20241024141947.B861D136F5@imap1.dmz-prg2.suse.org> X-Rspamd-Queue-Id: D20EA1F395 X-Spam-Level: X-Spamd-Result: default: False [-4.51 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; MISSING_XM_UA(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:mid,imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns]; RCVD_TLS_ALL(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[gcc-patches@gcc.gnu.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; DKIM_TRACE(0.00)[suse.de:+] X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Action: no action X-Spam-Score: -4.51 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org When the mask is not a constant or external def there's no need to check the scalar type, in particular with SLP and the mask being a VEC_PERM_EXPR there isn't a scalar operand ready to check (not one vect_is_simple_use will get you). We later check the vector type and reject non-mask types there. * tree-vect-stmts.cc (vect_check_scalar_mask): Only check the scalar type for constant or extern defs. --- gcc/tree-vect-stmts.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index e7f14c3144c..3ff0519ff17 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2520,7 +2520,8 @@ vect_check_scalar_mask (vec_info *vinfo, stmt_vec_info stmt_info, return false; } - if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (*mask))) + if ((mask_dt == vect_constant_def || mask_dt == vect_external_def) + && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (*mask))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, From patchwork Thu Oct 24 14:20:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 2001772 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=2T9WxXB6; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=E1LJrPgW; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=2T9WxXB6; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=E1LJrPgW; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XZ7PJ62ZCz1xtp for ; Fri, 25 Oct 2024 01:20:52 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0E10D3858C39 for ; Thu, 24 Oct 2024 14:20:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id 57D663858C3A for ; Thu, 24 Oct 2024 14:20:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 57D663858C3A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 57D663858C3A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729779629; cv=none; b=iNYgOgFDBcANELzdl1n1ItbeloZkubSj7mL7qzwoyJtKWtDd+5Ao7687bi38yvDtlzMgCE94zn/CPt06xO7yQiLa3px4H3o0Qh2DCcB6pvc7BocGeNYnl82tofM9qHnTHLXPWX6ZzyxiskG1ex3fmtWB4rQ8rE4hP4+oGdc0HE0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729779629; c=relaxed/simple; bh=HOer0/tG5gss4+mqyd5VT4TpCk9ID8M8/oqBcumr5OU=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version:Message-Id; b=Wwen+w51Fu9IcZUJ557CPxa/X8Rmv6jbDThgkNP5Qi1HvrEfAnVjMLVciEEw8ztkl0MWjMQS0+KeM63OAK75c+L5ingAvL4IqBZW+FYvd6f1Tv9SJY0vAIMOLB8iOEkndxfqZN7oC8JWClPbrOLUcHvbHDraTnpHFXkGvcQ5bFg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 500411F395; Thu, 24 Oct 2024 14:20:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1729779625; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=libDRAyKCkMNJPapOvAzN+eXS0nw+x1ArP8JZWkZh4U=; b=2T9WxXB6LMvYihh0hs7q+lCv9ByT0TKbYtgp428+9lPJCGVbepm8jI5OWTwdNvA13HAFBo 9ynNchJU8ixO6OPpRc/XdaaVVxU3p20GLsCH+r709m8SHE3oFFKqaFKEObfaTfc0UvrXR5 J1drmE5hz4S6ctkf1OpHM+X7x9y8OJo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1729779625; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=libDRAyKCkMNJPapOvAzN+eXS0nw+x1ArP8JZWkZh4U=; b=E1LJrPgWYXXxcyvng//lgbunsvTR3krpZQ5ozXsITqxPdWAPEf69BsB28qwAJXyO9ctnL1 ximQ43iAcKo1YoCA== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1729779625; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=libDRAyKCkMNJPapOvAzN+eXS0nw+x1ArP8JZWkZh4U=; b=2T9WxXB6LMvYihh0hs7q+lCv9ByT0TKbYtgp428+9lPJCGVbepm8jI5OWTwdNvA13HAFBo 9ynNchJU8ixO6OPpRc/XdaaVVxU3p20GLsCH+r709m8SHE3oFFKqaFKEObfaTfc0UvrXR5 J1drmE5hz4S6ctkf1OpHM+X7x9y8OJo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1729779625; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=libDRAyKCkMNJPapOvAzN+eXS0nw+x1ArP8JZWkZh4U=; b=E1LJrPgWYXXxcyvng//lgbunsvTR3krpZQ5ozXsITqxPdWAPEf69BsB28qwAJXyO9ctnL1 ximQ43iAcKo1YoCA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 2D1D3136F5; Thu, 24 Oct 2024 14:20:25 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id M5o0CalXGmf2LQAAD6G6ig (envelope-from ); Thu, 24 Oct 2024 14:20:25 +0000 Date: Thu, 24 Oct 2024 16:20:24 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: RISC-V Subject: [PATCH 2/2][v2] tree-optimization/116575 - SLP masked load-lanes discovery MIME-Version: 1.0 Message-Id: <20241024142025.2D1D3136F5@imap1.dmz-prg2.suse.org> X-Spam-Level: X-Spamd-Result: default: False [-4.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MISSING_XM_UA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo, imap1.dmz-prg2.suse.org:mid] X-Spam-Score: -4.30 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org The following implements masked load-lane discovery for SLP. The challenge here is that a masked load has a full-width mask with group-size number of elements when this becomes a masked load-lanes instruction one mask element gates all group members. We already have some discovery hints in place, namely STMT_VINFO_SLP_VECT_ONLY to guard non-uniform masks, but we need to choose a way for SLP discovery to handle possible masked load-lanes SLP trees. I have this time chosen to handle load-lanes discovery where we have performed permute optimization already and conveniently got the graph with predecessor edges built. This is because unlike non-masked loads masked loads with a load_permutation are never produced by SLP discovery (because load permutation handling doesn't handle un-permuting the mask) and thus the load-permutation lowering which handles non-masked load-lanes discovery doesn't trigger. With this SLP discovery for a possible masked load-lanes, thus a masked load with uniform mask, produces a splat of a single-lane sub-graph as the mask SLP operand. This is a representation that shouldn't pessimize the mask load case and allows the masked load-lanes transform to simply elide this splat. This fixes the aarch64-sve.exp mask_struct_load*.c testcases with --param vect-force-slp=1 Re-bootstrap & regtest running on x86_64-unknown-linux-gnu, the observed CI FAILs are gone. PR tree-optimization/116575 * tree-vect-slp.cc (vect_get_and_check_slp_defs): Handle gaps, aka NULL scalar stmt. (vect_build_slp_tree_2): Allow gaps in the middle of a grouped mask load. When the mask of a grouped mask load is uniform do single-lane discovery for the mask and insert a splat VEC_PERM_EXPR node. (vect_optimize_slp_pass::decide_masked_load_lanes): New function. (vect_optimize_slp_pass::run): Call it. --- gcc/tree-vect-slp.cc | 141 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 138 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 53f5400a961..b192328e3eb 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -641,6 +641,16 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, unsigned int commutative_op = -1U; bool first = stmt_num == 0; + if (!stmt_info) + { + for (auto oi : *oprnds_info) + { + oi->def_stmts.quick_push (NULL); + oi->ops.quick_push (NULL_TREE); + } + return 0; + } + if (!is_a (stmt_info->stmt) && !is_a (stmt_info->stmt) && !is_a (stmt_info->stmt)) @@ -2029,9 +2039,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, has_gaps = true; /* We cannot handle permuted masked loads directly, see PR114375. We cannot handle strided masked loads or masked - loads with gaps. */ + loads with gaps unless the mask is uniform. */ if ((STMT_VINFO_GROUPED_ACCESS (stmt_info) - && (DR_GROUP_GAP (first_stmt_info) != 0 || has_gaps)) + && (DR_GROUP_GAP (first_stmt_info) != 0 + || (has_gaps + && STMT_VINFO_SLP_VECT_ONLY (first_stmt_info)))) || STMT_VINFO_STRIDED_P (stmt_info)) { load_permutation.release (); @@ -2054,7 +2066,12 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, unsigned i = 0; for (stmt_vec_info si = first_stmt_info; si; si = DR_GROUP_NEXT_ELEMENT (si)) - stmts2[i++] = si; + { + if (si != first_stmt_info) + for (unsigned k = 1; k < DR_GROUP_GAP (si); ++k) + stmts2[i++] = NULL; + stmts2[i++] = si; + } bool *matches2 = XALLOCAVEC (bool, dr_group_size); slp_tree unperm_load = vect_build_slp_tree (vinfo, stmts2, dr_group_size, @@ -2683,6 +2700,46 @@ out: continue; } + /* When we have a masked load with uniform mask discover this + as a single-lane mask with a splat permute. This way we can + recognize this as a masked load-lane by stripping the splat. */ + if (is_a (STMT_VINFO_STMT (stmt_info)) + && gimple_call_internal_p (STMT_VINFO_STMT (stmt_info), + IFN_MASK_LOAD) + && STMT_VINFO_GROUPED_ACCESS (stmt_info) + && ! STMT_VINFO_SLP_VECT_ONLY (DR_GROUP_FIRST_ELEMENT (stmt_info))) + { + vec def_stmts2; + def_stmts2.create (1); + def_stmts2.quick_push (oprnd_info->def_stmts[0]); + child = vect_build_slp_tree (vinfo, def_stmts2, 1, + &this_max_nunits, + matches, limit, + &this_tree_size, bst_map); + if (child) + { + slp_tree pnode = vect_create_new_slp_node (1, VEC_PERM_EXPR); + SLP_TREE_VECTYPE (pnode) = SLP_TREE_VECTYPE (child); + SLP_TREE_LANES (pnode) = group_size; + SLP_TREE_SCALAR_STMTS (pnode).create (group_size); + SLP_TREE_LANE_PERMUTATION (pnode).create (group_size); + for (unsigned k = 0; k < group_size; ++k) + { + SLP_TREE_SCALAR_STMTS (pnode) + .quick_push (oprnd_info->def_stmts[0]); + SLP_TREE_LANE_PERMUTATION (pnode) + .quick_push (std::make_pair (0u, 0u)); + } + SLP_TREE_CHILDREN (pnode).quick_push (child); + pnode->max_nunits = child->max_nunits; + children.safe_push (pnode); + oprnd_info->def_stmts = vNULL; + continue; + } + else + def_stmts2.release (); + } + if ((child = vect_build_slp_tree (vinfo, oprnd_info->def_stmts, group_size, &this_max_nunits, matches, limit, @@ -5462,6 +5519,9 @@ private: /* Clean-up. */ void remove_redundant_permutations (); + /* Masked load lanes discovery. */ + void decide_masked_load_lanes (); + void dump (); vec_info *m_vinfo; @@ -7090,6 +7150,80 @@ vect_optimize_slp_pass::dump () } } +/* Masked load lanes discovery. */ + +void +vect_optimize_slp_pass::decide_masked_load_lanes () +{ + for (auto v : m_vertices) + { + slp_tree node = v.node; + if (SLP_TREE_DEF_TYPE (node) != vect_internal_def + || SLP_TREE_CODE (node) == VEC_PERM_EXPR) + continue; + stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node); + if (! STMT_VINFO_GROUPED_ACCESS (stmt_info) + /* The mask has to be uniform. */ + || STMT_VINFO_SLP_VECT_ONLY (stmt_info) + || ! is_a (STMT_VINFO_STMT (stmt_info)) + || ! gimple_call_internal_p (STMT_VINFO_STMT (stmt_info), + IFN_MASK_LOAD)) + continue; + stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); + if (STMT_VINFO_STRIDED_P (stmt_info) + || compare_step_with_zero (m_vinfo, stmt_info) <= 0 + || vect_load_lanes_supported (SLP_TREE_VECTYPE (node), + DR_GROUP_SIZE (stmt_info), + true) == IFN_LAST) + continue; + + /* Uniform masks need to be suitably represented. */ + slp_tree mask = SLP_TREE_CHILDREN (node)[0]; + if (SLP_TREE_CODE (mask) != VEC_PERM_EXPR + || SLP_TREE_CHILDREN (mask).length () != 1) + continue; + bool match = true; + for (auto perm : SLP_TREE_LANE_PERMUTATION (mask)) + if (perm.first != 0 || perm.second != 0) + { + match = false; + break; + } + if (!match) + continue; + + /* Now see if the consumer side matches. */ + for (graph_edge *pred = m_slpg->vertices[node->vertex].pred; + pred; pred = pred->pred_next) + { + slp_tree pred_node = m_vertices[pred->src].node; + /* All consumers should be a permute with a single outgoing lane. */ + if (SLP_TREE_CODE (pred_node) != VEC_PERM_EXPR + || SLP_TREE_LANES (pred_node) != 1) + { + match = false; + break; + } + gcc_assert (SLP_TREE_CHILDREN (pred_node).length () == 1); + } + if (!match) + continue; + /* Now we can mark the nodes as to use load lanes. */ + node->ldst_lanes = true; + for (graph_edge *pred = m_slpg->vertices[node->vertex].pred; + pred; pred = pred->pred_next) + m_vertices[pred->src].node->ldst_lanes = true; + /* The catch is we have to massage the mask. We have arranged + analyzed uniform masks to be represented by a splat VEC_PERM + which we can now simply elide as we cannot easily re-do SLP + discovery here. */ + slp_tree new_mask = SLP_TREE_CHILDREN (mask)[0]; + SLP_TREE_REF_COUNT (new_mask)++; + SLP_TREE_CHILDREN (node)[0] = new_mask; + vect_free_slp_tree (mask); + } +} + /* Main entry point for the SLP graph optimization pass. */ void @@ -7110,6 +7244,7 @@ vect_optimize_slp_pass::run () } else remove_redundant_permutations (); + decide_masked_load_lanes (); free_graph (m_slpg); }