From patchwork Tue Apr 16 12:08:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Roman Zhuykov X-Patchwork-Id: 1086266 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-499309-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=ispras.ru Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="RvnXjMEI"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44k4031n5Nz9s5c for ; Tue, 16 Apr 2019 22:08:35 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:content-type:content-transfer-encoding:date:from :to:cc:subject:in-reply-to:references:message-id; q=dns; s= default; b=BU2RcFgAjpU8hGM4p+kQ1usoBR77QDcAk7kmGDzsQsHAGKyNXeGzQ hQMcvP4itmIN+/oA5m4MgfI6wvHOxbBYVfgbH7lhunf3jj/DMe+Gn0yh5gNH7h7c nkhWbYJL0pG1/L4SGc7TjqKmft0abQkXRY/LhOIIBi2oZCTdG05vQA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:content-type:content-transfer-encoding:date:from :to:cc:subject:in-reply-to:references:message-id; s=default; bh= CXBVMgX60PmaakeuLAxkumr+GOs=; b=RvnXjMEI+GJcKqhPLhxKwQTu9pqmdbif Ccd8j8gHD+qdyWfozudd12Gp8WbG9P41uHFO47Bi7RkzznHzdiuldmwOYa2iGRFW qC3M7FrMxBIKQxTm5dIW/8DHHK3hOuYOshDuLlZiLjNwaSldG7Ef0uPs2hf1EjSV T/ZMNhgonUE= Received: (qmail 84170 invoked by alias); 16 Apr 2019 12:08:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 84160 invoked by uid 89); 16 Apr 2019 12:08:27 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-14.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=anti, scanner, organize, 7448 X-HELO: mail.ispras.ru Received: from mail.ispras.ru (HELO mail.ispras.ru) (83.149.199.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 16 Apr 2019 12:08:25 +0000 Received: from mail.ispras.ru (localhost [127.0.0.1]) by mail.ispras.ru (Postfix) with ESMTPSA id 8E38D540094; Tue, 16 Apr 2019 15:08:23 +0300 (MSK) MIME-Version: 1.0 Date: Tue, 16 Apr 2019 15:08:23 +0300 From: Roman Zhuykov To: gcc-patches@gcc.gnu.org Cc: Jakub Jelinek Subject: [4/4][PATCH] Discussing PR83507 In-Reply-To: References: Message-ID: X-Sender: zhroma@ispras.ru User-Agent: Roundcube Webmail/1.1.2 This issue unfortunately was not solved correctly. In that example we don’t have -fmodulo-sched-allow-regmoves enabled and we should not create any register moves at all. I’ll describe here everything I found while looking on the issue. At the first step I have patched trunk compiler like this: } diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c /* Create NREG_MOVES register moves. */ + gcc_assert (flag_modulo_sched_allow_regmoves); first_move = ps->reg_moves.length (); ps->reg_moves.safe_grow_cleared (first_move + nreg_moves); extend_node_sched_params (ps); @@ -744,8 +745,7 @@ schedule_reg_moves (partial_schedule_ptr ps) /* Generate each move. */ old_reg = prev_reg = SET_DEST (set); - if (HARD_REGISTER_P (old_reg)) - return false; + gcc_assert (!HARD_REGISTER_P (old_reg)); for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++) { @@ -1371,7 +1371,12 @@ sms_schedule (void) else issue_rate = 1; - /* Initialize the scheduler. */ + /* Initialize the scheduler, we need live info even at O1. */ + if (optimize == 1) + { + df_live_add_problem (); + df_live_set_all_dirty (); + } setup_sched_infos (); haifa_sched_init (); @@ -1473,6 +1478,8 @@ sms_schedule (void) if (CALL_P (insn) || BARRIER_P (insn) + || (NONDEBUG_INSN_P (insn) && extract_asm_operands (PATTERN (insn)) + && MEM_VOLATILE_P (extract_asm_operands (PATTERN (insn)))) || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn) && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE && !reg_mentioned_p (count_reg, insn)) @@ -1489,6 +1496,11 @@ sms_schedule (void) fprintf (dump_file, "SMS loop-with-call\n"); else if (BARRIER_P (insn)) fprintf (dump_file, "SMS loop-with-barrier\n"); + else if (NONDEBUG_INSN_P (insn) + && extract_asm_operands (PATTERN (insn)) + && MEM_VOLATILE_P (extract_asm_operands + (PATTERN (insn)))) + fprintf (dump_file, "SMS loop-with-volatile-asm\n"); else if ((NONDEBUG_INSN_P (insn) && !JUMP_P (insn) && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)) fprintf (dump_file, "SMS loop-with-not-single-set\n"); @@ -1752,6 +1764,9 @@ sms_schedule (void) /* Release scheduler data, needed until now because of DFA. */ haifa_sched_finish (); loop_optimizer_finalize (); + + if (optimize == 1) + df_remove_problem (df_live); } /* The SMS scheduling algorithm itself diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c --- a/gcc/modulo-sched.c +++ b/gcc/modulo-sched.c @@ -735,6 +735,7 @@ schedule_reg_moves (partial_schedule_ptr ps) continue; /* Create NREG_MOVES register moves. */ + gcc_assert (flag_modulo_sched_allow_regmoves); first_move = ps->reg_moves.length (); ps->reg_moves.safe_grow_cleared (first_move + nreg_moves); extend_node_sched_params (ps); @@ -744,8 +745,7 @@ schedule_reg_moves (partial_schedule_ptr ps) /* Generate each move. */ old_reg = prev_reg = SET_DEST (set); - if (HARD_REGISTER_P (old_reg)) - return false; + gcc_assert (!HARD_REGISTER_P (old_reg)); for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++) { On current trunk this patch doesn’t give any failures on powerpc, but I’ve checked other platforms and found gcc.c-torture/execute/pr84524.c ICEing with -O1 -fmodulo-sched on aarch64. Speaking about sms-13.c, it happens not to be a proper loop for modulo-scheduling at current trunk. Some changes in other passes caused this, and that file actually tests nothing right now. Than, I added 8 branch into my testing, and decide first to find a proper way to fix sms-13.c on 8 branch. Without -fmodulo-sched-allow-regmoves each TRUE_DEP edge should be paired with another (sometimes ANTI_DEP or OUTPUT_DEP) edge in the opposite direction. My assertion failure means that DDG was odd. As Alexander mentioned here https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00028.html “SMS uses sched-deps for intra-loop deps, and then separately uses DF for cross-iteration deps, which means that it should be ready for surprises when the two scanners are not 100% in sync.” Situation here is one of such surprises, and debugging shows that we loose one of dependencies here. Considering this two instructions: (insn 30 29 31 8 (parallel [ (set (reg:SI 142) (minus:SI (reg/v:SI 133 [ x+-2 ]) (reg/v:SI 130 [ b+-2 ]))) (set (reg:SI 76 ca) (leu:SI (reg/v:SI 130 [ b+-2 ]) (reg/v:SI 133 [ x+-2 ]))) ]) "test.c":22 107 {subfsi3_carry} (expr_list:REG_UNUSED (reg:SI 142) (nil))) (insn 31 30 32 8 (parallel [ (set (reg:SI 143) (plus:SI (reg:SI 76 ca) (const_int -1 [0xffffffffffffffff]))) (clobber (reg:SI 76 ca)) ]) "test.c":22 116 {subfsi3_carry_in_xx} (expr_list:REG_DEAD (reg:SI 76 ca) (expr_list:REG_UNUSED (reg:SI 76 ca) (nil)))) Sched-deps organize obvious 30->31 true dependency, and we want DF to help us with 31->30 output dependency here. CA is clobbered in 31, and if we move insn 31 forward we have to stop at insn 30 on the next iteration. Build_inter_loop_deps doesn’t create such output dependency because reaching definitions “rd_bb_info->gen” dependency set doesn’t contain CA register. Maybe, because it is a clobber, not a set. So we need somehow organize that dependency, and I’ve found that df_live set can be used here instead. I proceed with the following change: diff --git a/gcc/ddg.c b/gcc/ddg.c --- a/gcc/ddg.c +++ b/gcc/ddg.c @@ -365,16 +365,14 @@ add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def) static void build_inter_loop_deps (ddg_ptr g) { - unsigned rd_num; - struct df_rd_bb_info *rd_bb_info; + unsigned regno; bitmap_iterator bi; - - rd_bb_info = DF_RD_BB_INFO (g->bb); + struct df_live_bb_info *live_bb_info = DF_LIVE_BB_INFO (g->bb); /* Find inter-loop register output, true and anti deps. */ - EXECUTE_IF_SET_IN_BITMAP (&rd_bb_info->gen, 0, rd_num, bi) + EXECUTE_IF_SET_IN_BITMAP (&live_bb_info->gen, 0, regno, bi) { - df_ref rd = DF_DEFS_GET (rd_num); + df_ref rd = df_bb_regno_last_def_find (g->bb, regno); add_cross_iteration_register_deps (g, rd); } (Here I skipped minor hunks which properly add/remove df_live problem in modulo-sched.c when optimize == 1). This fix corrected sms-13.c on 8 branch, but pr84524.c mentioned above was still ICEing on both trunk and 8-branch. No other failures were introduces while testing this intermediate patch on different branches and platforms. Tested patch also contained temporarily additional checks that the only difference between “rd_bb_info->gen” and “live_bb_info->gen” sets may be additional hard registers in live set. In pr84524.c we got a loop with an extended inline asm: asm volatile ("" : "+r" (v)) which also gives us a “surprising” situation Alexander predicts. For sched-deps scanner such volatile asm is a “true barrier” and we create dependencies to almost all other instructions, while DF scanners don’t give us such information. Maybe it is a good idea somehow re-implement modulo scheduler using only one scanner instead of two, but at the moment the best thing to do is to detect the situation earlier and skip such loops. So, the last two hunks are: @@ -1473,6 +1478,8 @@ sms_schedule (void) if (CALL_P (insn) || BARRIER_P (insn) + || (NONDEBUG_INSN_P (insn) && extract_asm_operands (PATTERN (insn)) + && MEM_VOLATILE_P (extract_asm_operands (PATTERN (insn)))) || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn) && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE && !reg_mentioned_p (count_reg, insn)) @@ -1489,6 +1496,11 @@ sms_schedule (void) fprintf (dump_file, "SMS loop-with-call\n"); else if (BARRIER_P (insn)) fprintf (dump_file, "SMS loop-with-barrier\n"); + else if (NONDEBUG_INSN_P (insn) + && extract_asm_operands (PATTERN (insn)) + && MEM_VOLATILE_P (extract_asm_operands + (PATTERN (insn)))) + fprintf (dump_file, "SMS loop-with-volatile-asm\n"); else if ((NONDEBUG_INSN_P (insn) && !JUMP_P (insn) && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)) Patch testing addressed in my cover letter. Ok for trunk? gcc/ChangeLog: 2019-04-10 Roman Zhuykov PR target/83507 * ddg.c (build_inter_loop_deps): Use live_bb_info instead of rd_bb_info to search last register definitions. * modulo-sched.c (schedule_reg_moves): Add an assertion which prevents creating register moves when it is not allowed. (sms_schedule): Add/remove df_live problem when optimize == 1. Bail out when we got volatile asm inside the loop. diff --git a/gcc/ddg.c b/gcc/ddg.c --- a/gcc/ddg.c +++ b/gcc/ddg.c @@ -365,16 +365,14 @@ add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def) static void build_inter_loop_deps (ddg_ptr g) { - unsigned rd_num; - struct df_rd_bb_info *rd_bb_info; + unsigned regno; bitmap_iterator bi; - - rd_bb_info = DF_RD_BB_INFO (g->bb); + struct df_live_bb_info *live_bb_info = DF_LIVE_BB_INFO (g->bb); /* Find inter-loop register output, true and anti deps. */ - EXECUTE_IF_SET_IN_BITMAP (&rd_bb_info->gen, 0, rd_num, bi) + EXECUTE_IF_SET_IN_BITMAP (&live_bb_info->gen, 0, regno, bi) { - df_ref rd = DF_DEFS_GET (rd_num); + df_ref rd = df_bb_regno_last_def_find (g->bb, regno); add_cross_iteration_register_deps (g, rd);