From patchwork Wed Nov 1 20:30:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1858063 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=IHrSp+eT; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SLJYR3X4jz1yQ5 for ; Thu, 2 Nov 2023 07:30:51 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 491453858425 for ; Wed, 1 Nov 2023 20:30:49 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by sourceware.org (Postfix) with ESMTPS id 9292B3858D1E for ; Wed, 1 Nov 2023 20:30:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9292B3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9292B3858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::52e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698870642; cv=none; b=PJV5Q4o6g79HcOQE2+Qf+tAtilln8mUwJoVOteod3UAL5CFxlwrnWT8fYqttmVGyV1+hdx9IlWvYuckuu+/Um+H3VhNqfA2cKJJ+2fQf5Lc8pqB+Mwt0nqVFTMfk+9wCiFG85ZvY5q3XfevS/qLHbUXkNX+4hR1js5BzXq1tfgs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698870642; c=relaxed/simple; bh=obrlwjsLAxBEcjq4SfGtx6yANeYLxPxAH/yWZPkPQZc=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=SeDc7DJzdFzYrWvqw8+2De6TJZYWBoIkXk/qggDKjct7QC5oiqLLvYBmBP1G9pyTj/7cvuLs4v+dC7MklipcKnMrHtWw1AtkPMObQ1EuVPdrAEROZaj75vHdyRf3VlkI7WNFUJEYPFI/QxTafjV+xnMTkAZVd7mOJZbAT12786c= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-53d9f001b35so290070a12.2 for ; Wed, 01 Nov 2023 13:30:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698870637; x=1699475437; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=cSeWW8SNXEzZxxPHdP43uIB2rl3BWod4WAc4C5sgu0Q=; b=IHrSp+eTYyP5ziPCpoaIJWmOpRsPwO6HyETgKjBIVU1dZetdNftjsRDu86vyn/eDyG sOr8oyEdgQJmoPD5Cas3GRjekNh8pR+6eK7Dw+UCzEDpVHGF0/v6onXePm4S4ytrzJkB hnXgcfTd6G+kQ/sJSwHRqNSCbSj2cMZ+4ZhWI9UO40tEjvhr39ehy9xVcQnDMX/GdUL/ 9EkCBMWgesxNfmhlwOM+XPvqgFsAWtZk64izIbpE4hq58HsI1vg5IW0aQ9nwoJ1zCVLm fOcIJuA3z+DPC5JojOsY7mj0VZjC8XMA9k/mYvS68DrOjkVL+YcouJ40wU9SotmtwHGM Ec/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698870637; x=1699475437; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cSeWW8SNXEzZxxPHdP43uIB2rl3BWod4WAc4C5sgu0Q=; b=wDwDc8V/AcY8cxj9Uf3m8Fen0AE6noir/HGvY6+GpATZRZLV6YfzHYf/yBH+f5v72f Vsgjyd+/bFmEfNJzrfywHTOuP9CVxk31CzCq/MkoHvOiV+s5ifwelkoSnwvy2qKFAIc2 HqSCQTztNb/6kGkKy9Q98P2KGnSVBI49tbogtdCZIJtRSIKD/E2bH2o/WPHI0x2RpN12 FKSy0g3nQZOpxmcUk51T2W8aegT/+u6CqygA4YwoyFNQdfuEpUvfPq/Wu3rY7otAZ6lI qvrRkBXNPNaj94QI+uE9U30FTCjqbTmv9i7Pd2oSbduR72pB2rpMHUx34RlDRbsSl6PJ moCQ== X-Gm-Message-State: AOJu0YyLRWac+WEYUGZBBpEUweGtUIpZs9arEvpBeBApMSoX3ppkeB/L 0t4RAgxiiXBDy9KulFNiARsJjs9yPcc= X-Google-Smtp-Source: AGHT+IHeG8Krs2NTW6dzShq2Ziab4TJNFeCHLj+4OAFsBp9a9QH/fusn2SSkQisMNA4JBKzDz4XSKw== X-Received: by 2002:a05:6402:1348:b0:53f:a377:7287 with SMTP id y8-20020a056402134800b0053fa3777287mr13334602edw.18.1698870637055; Wed, 01 Nov 2023 13:30:37 -0700 (PDT) Received: from noahgold-desk.intel.com ([192.55.54.53]) by smtp.gmail.com with ESMTPSA id e14-20020a056402190e00b0053ebafe7a60sm1448670edz.59.2023.11.01.13.30.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 13:30:36 -0700 (PDT) From: Noah Goldstein To: libc-alpha@sourceware.org Cc: goldstein.w.n@gmail.com, hjl.tools@gmail.com, carlos@systemhalted.org Subject: x86: Only align destination to 1x VEC_SIZE in memset 4x loop Date: Wed, 1 Nov 2023 15:30:26 -0500 Message-Id: <20231101203026.2608879-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on performance other than potentially resulting in an additional iteration of the loop. 1x maintains aligned stores (the only reason to align in this case) and doesn't incur any unnecessary loop iterations. Reviewed-by: Sunil K Pandey --- sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index 3d9ad49cb9..0f0636b90f 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -293,7 +293,7 @@ L(more_2x_vec): leaq (VEC_SIZE * 4)(%rax), %LOOP_REG #endif /* Align dst for loop. */ - andq $(VEC_SIZE * -2), %LOOP_REG + andq $(VEC_SIZE * -1), %LOOP_REG .p2align 4 L(loop): VMOVA %VMM(0), LOOP_4X_OFFSET(%LOOP_REG)