From patchwork Fri Jun 7 23:04:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Damato X-Patchwork-Id: 1945322 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fastly.com header.i=@fastly.com header.a=rsa-sha256 header.s=google header.b=TScQaTZF; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vwxcg3f7nz20Py for ; Sat, 8 Jun 2024 09:05:23 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BFE43385DDC0 for ; Fri, 7 Jun 2024 23:05:21 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id DBFA23858402 for ; Fri, 7 Jun 2024 23:05:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DBFA23858402 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=fastly.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=fastly.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DBFA23858402 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::635 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717801503; cv=none; b=dbdJ7eKtM7R/T1peqmjAhPeZ7KyF72UJ5+oabI3ilsMpb6EGqIXD6JFJYQKMJdgx7uBmxsPXiIHgMScBtyUEnMF1HALYEOT0NZrEjWAROR420Zt3O4Sn/e++TmcrsiHgHNZxAltmNHdptz//jcLh6RcWiq/kv2wtAkAGOplOles= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717801503; c=relaxed/simple; bh=XdegeAWrQjdmmZVIN39NUX8DKDLktofuUnYDPYI8VGg=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=RRAvK5w0ZR+LYC1FYgzoHzBx5Hof23XusSal34FXn0ZaUCT7D+W8nbtyktefKrZshVeiwqzRM3JKc8LNnXie3QZ2w4jf1YZzTBJe+7jv0EaMUltWZGxv5R6YC7Rmk5jTHTU3V8aExWmxU3X+MN2tM9ZvreuQTdBtMGHiuF3TaY8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1f6c7cdec83so18854035ad.0 for ; Fri, 07 Jun 2024 16:05:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastly.com; s=google; t=1717801500; x=1718406300; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ZbAa/sViIyH+Ib59LUAPdV4/609yCizP1qjy0GUM4aA=; b=TScQaTZFTkMrL+u8leIB4LWEQeGpTcSFKkX9x2CTZzGQluZ5xR1hmud0PIbN7Q+b+H X/YmITY+5AzZkpbWF24mXsgqtiPZMgygF9o1HLT97eY/4O3mR2L0p3dI7CzjNIjg3QJ2 F5+qLwN+Y/HZ22nQxdSCc5xaIxgQx8Qw4Y6xs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717801500; x=1718406300; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZbAa/sViIyH+Ib59LUAPdV4/609yCizP1qjy0GUM4aA=; b=ZfriOZNKhNXCoDBug47syVIPBuY6lali/o6LXQCZhaeSIcILPHi+GB4lhbcTObCIiG 68Rj8tpLbTyzXGcXdcDS7dUNwlh1aIALx3c2JOxQfKRMKsRkfvitWt7hHg2M5cWzmN49 NO90DapyBVZApz3FeZYP46k4T/varN/8U7ARVh8NdgO0dkY5xjK69DJQuIkDqT769jBQ ufVtE1AxnergXF1GY7tfFtOzCd6sHzcLagtAzHNf+BE0gElYaDNuXfw2+njn8Dc27KaR zEW+amQN0abWnYIOmnEWsTq5aZmcin4F4JMjdtDF4TvbDCFWrWg8o4Qq8Rq2Rl0LSjJt s7wQ== X-Gm-Message-State: AOJu0Yx30Isdr4xLF4cJUZTLu6inSJFCWnpaILqMN/C5/Wyl3fHz2x4g Y1vtUTVBgMEXZZtSY8031nJ3oD/1pjWc3JouylYEXZoHDui8uXXuapaMPqjarwA+V1h1rVnffNJ QJUuyGbk0CijtlPWVqn4EE9jvlSvrtvX4aA3ep3TwIv/ZCD1ZRPJfLB7xhUmhx2MNLb1uMn1xr2 uawn9UaeN+BzehI1HPwR4iREJI2JDTUFJkbsERTgo= X-Google-Smtp-Source: AGHT+IGIoXhjKxnlJuH7G24b4x1NkDQDAdWnEJkXArXV0glR+f2uVNpMGAwlx12czQy1LMhUkV1lmQ== X-Received: by 2002:a17:903:32d2:b0:1f6:39d8:dc4f with SMTP id d9443c01a7336-1f6d02c0542mr44287545ad.10.1717801499281; Fri, 07 Jun 2024 16:04:59 -0700 (PDT) Received: from localhost.localdomain ([2620:11a:c019:0:65e:3115:2f58:c5fd]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f6bd778259sm39693545ad.117.2024.06.07.16.04.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Jun 2024 16:04:58 -0700 (PDT) From: Joe Damato To: libc-alpha@sourceware.org Cc: goldstein.w.n@gmail.com, Joe Damato Subject: [PATCH] x86: Enable non-temporal memset tunable for AMD Date: Fri, 7 Jun 2024 23:04:47 +0000 Message-Id: <20240607230447.52478-1-jdamato@fastly.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for memset") a tunable threshold for enabling non-temporal memset was added, but only for Intel hardware. Since that commit, new benchmark results suggest that non-temporal memset is beneficial on AMD, as well, so allow this tunable to be set for AMD. See: https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing which has been updated to include data using different stategies for large memset on AMD Zen2, Zen3, and Zen4. Signed-off-by: Joe Damato Reviewed-by: Noah Goldstein --- sysdeps/x86/dl-cacheinfo.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index d375a7cba6..d2fe61b997 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -986,11 +986,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) rep_movsb_threshold = 2112; - /* Non-temporal stores in memset have only been tested on Intel hardware. - Until we benchmark data on other x86 processor, disable non-temporal - stores in memset. */ + /* Non-temporal stores are more performant on Intel and AMD hardware above + non_temporal_threshold. Enable this for both Intel and AMD hardware. */ unsigned long int memset_non_temporal_threshold = SIZE_MAX; - if (cpu_features->basic.kind == arch_kind_intel) + if (cpu_features->basic.kind == arch_kind_intel + || cpu_features->basic.kind == arch_kind_amd) memset_non_temporal_threshold = non_temporal_threshold; /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of