From patchwork Fri Jan 15 09:22:11 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Enkovich X-Patchwork-Id: 567983 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4F60F140BFA for ; Fri, 15 Jan 2016 20:23:01 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=H3p3qvQ9; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=eNRpV6/IvCt45zmD1IV2pgj64fo07Nrf5jv7D6OFPPe3ej1TvEj47 gnt6RzX1ZfuafNU1P5KVmwxnj0NND+H1dZFaUqfga/GjvyTBglfv0CQXYCw2MJhm ToOMfr+mQZgrHfiKkJJPhAhRVElPyPbPrhf4gQL6d/JF7NWGzN7jFw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=HXP2UrNaGYTHj0K9GYzrVN6XSW4=; b=H3p3qvQ9JxUv/ZRs51yw X4DV7pTwD1LspfmGAfcisPmXh4NtMnzly00malZ0e7rcrhCffywytHkJt4lOTHCW u9mGzj77PDihxQcZiuAGfTzZh0A89/QfpuuwwutuxK576wrAY/hlFmZWi4jY3HVj upjElS9Z1pJG8EPEJ4qqODg= Received: (qmail 91894 invoked by alias); 15 Jan 2016 09:22:53 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 91864 invoked by uid 89); 15 Jan 2016 09:22:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=Delay, TARGET_64BIT, target_64bit, TImode X-HELO: mail-ig0-f174.google.com Received: from mail-ig0-f174.google.com (HELO mail-ig0-f174.google.com) (209.85.213.174) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 15 Jan 2016 09:22:50 +0000 Received: by mail-ig0-f174.google.com with SMTP id h5so6732411igh.0 for ; Fri, 15 Jan 2016 01:22:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-type:content-disposition:user-agent; bh=kXzYO7WMyL5HSGNHjI+omDMkv2WoQ4No6Bl6qoAyk90=; b=XWY9lhAPo6v8LWNr3MPK7Cxb/GDuvThS8gulYq322LyYQ1e1m9+nzrAp+viYhfyG6t i0UFpalp1VNCYiw5CKO0rzHOkiOLYpJ/L8SWjd/bjd/ZGHohPHg18KQx4/kkCxJ9iTFi b7+DICTkwO45SzxGxDgpBdEJQsib6pqKXRkEd1mda1/yh+MOvERjCjayINWjt6eUMXDj wLIY0CVxd07R0kCZz/WD2weh61Vb+Yn4xIPI/dkrS1+K6IZwu1i4oqxD5qQ5alpGfVpy EOQB/ciRvaxm154Eqo30gHLB1D7fdnbT5zfXl146scYjt9j/1kCx9BxAeQgS70Vrmvg4 2GyA== X-Gm-Message-State: AG10YOQnNuCkP1ZjHMYVGDpPrvvIvRYbv50q3ST35t9mghqTdlTNuiu3JGc5ddflJZcWLA== X-Received: by 10.50.171.130 with SMTP id au2mr2207749igc.43.1452849768102; Fri, 15 Jan 2016 01:22:48 -0800 (PST) Received: from msticlxl57.ims.intel.com (irdmzpr01-ext.ir.intel.com. [192.198.151.36]) by smtp.gmail.com with ESMTPSA id d9sm611756igx.5.2016.01.15.01.22.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Jan 2016 01:22:47 -0800 (PST) Date: Fri, 15 Jan 2016 12:22:11 +0300 From: Ilya Enkovich To: gcc-patches@gcc.gnu.org Subject: [PATCH, i386] Delay DI mode xor split when expanding comparison Message-ID: <20160115092211.GA13418@msticlxl57.ims.intel.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes Hi, When scalar-to-vector pass was introduced in i386 target, DI mode instructions split was delayed to split2 pass (was performed on expand before). It appears this causes ~5% degradation on libquantum benchmark. This happens because in original code we have AND and XOR combined into ANDN. Now it doesn't happen because AND is not split on expand but XOR is split. This patch delays split of XOR generated for DI mode comparison. This resolves regression on libquantum. Unfortunately we still don't generate SSE version of ANDN, I'll look into this later. Botostrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya --- gcc/ 2016-01-14 Ilya Enkovich * config/i386/i386.c (ix86_expand_branch): Don't split DI mode xor instruction to SI mode. gcc/testsuite/ 2016-01-14 Ilya Enkovich * gcc.target/i386/pr65105-5.c: New test. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ed91e5d..504ac55 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21699,6 +21699,19 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label) case DImode: if (TARGET_64BIT) goto simple; + /* For 32-bit target DI comparison may be performed on + SSE registers. To allow this we should avoid split + to SI mode which is achieved by doing xor in DI mode + and then comparing with zero (which is recognized by + STV pass). We don't compare using xor when optimizing + for size. */ + if (!optimize_insn_for_size_p () + && TARGET_STV + && (code == EQ || code == NE)) + { + op0 = force_reg (mode, gen_rtx_XOR (mode, op0, op1)); + op1 = const0_rtx; + } case TImode: /* Expand DImode branch into multiple compare+branch. */ { diff --git a/gcc/testsuite/gcc.target/i386/pr65105-5.c b/gcc/testsuite/gcc.target/i386/pr65105-5.c new file mode 100644 index 0000000..5818c1c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr65105-5.c @@ -0,0 +1,22 @@ +/* PR target/pr65105 */ +/* { dg-do compile { target { ia32 } } } */ +/* { dg-options "-O2 -march=core-avx2" } */ +/* { dg-final { scan-assembler "pand" } } */ +/* { dg-final { scan-assembler "pxor" } } */ +/* { dg-final { scan-assembler "ptest" } } */ + +struct S1 +{ + unsigned long long a; + unsigned long long b; +}; + +void +test (int p1, unsigned long long p2, int p3, struct S1 *p4) +{ + int i; + + for (i = 0; i < p1; i++) + if ((p4[i].a & p2) != p2) + p4[i].a ^= (1ULL << p3); +}