From patchwork Sat Aug 25 19:38:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Koenig X-Patchwork-Id: 962196 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-484427-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=netcologne.de Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="mayEiArR"; dkim=pass (2048-bit key; unprotected) header.d=netcologne.de header.i=@netcologne.de header.b="KxK0bNW1"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41yT3d1Gjnz9s4v for ; Sun, 26 Aug 2018 05:38:51 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=aeCtI8DTV+W6ie4+ToKMtknHR+QqWH2qRZqIxM5f13xb9VW+yW nZAKU/f9IXVG1u1DUZ00SLtqR3400fM61dwwlQ1IcB1ILwUe8KQroEQgy3Q5G3ja za1BT0tSl5TAgLBGN2maZ6zb+4GWNRfjHlw0sFugztWTvfa19xOm0JOiQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=IJnodBMnDresh7qLyQi8MDEOVTg=; b=mayEiArR3zYRCDfZUXA/ atJpYKBJDqkXWEFm01qEZRYXOSpaTwHc0K+OaxWdvEp3J5nAgzf3Dxh2VHrzkko8 3lPNpX4v30ToJ+PNvogHLpqNWtUKJ/jKdKgDYBnU7z09MlPgtZjVGOx2uLinMsei dQei9mcHvd4U+bqGiSRgQQ4= Received: (qmail 108154 invoked by alias); 25 Aug 2018 19:38:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 108129 invoked by uid 89); 25 Aug 2018 19:38:35 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=mo, 1n, UD:m4, UD:netcologne.de X-HELO: cc-smtpout2.netcologne.de Received: from cc-smtpout2.netcologne.de (HELO cc-smtpout2.netcologne.de) (89.1.8.212) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 25 Aug 2018 19:38:29 +0000 Received: from cc-smtpin1.netcologne.de (cc-smtpin1.netcologne.de [89.1.8.201]) by cc-smtpout2.netcologne.de (Postfix) with ESMTP id 0CE15129A5; Sat, 25 Aug 2018 21:38:27 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=netcologne.de; s=nc1116a; t=1535225907; bh=0GS2zFd1UbPAemcuL5P2WE3wra2lgAmC/fol1d2L418=; h=To:From:Subject:Message-ID:Date:From; b=KxK0bNW17+6yaacG4Fxs7P7e1CKZZzJpR8izEMdCv5EtRhZlu20s6aj8UdeXlXFvR W4/znJ/my3X4OBcnkK47B8USYp4g9dooPs/QnZmN9LeyhSZGyfIrQwObu43Xfqwbh1 LNsc7Rryh/71JlONsPrlyX2Ab1G1+DUXTA4JGB4kJ9zm9F/Z/nXv0aY8Yp4JyvOjjN 5rS8tAtND6ZNT77WoLxzQh2BZJzL5rzm7WhejGg1vaDrCAgzmyespnRkrymnuHZnA2 QX/tlNzAGnLD88aCIAuo1uoJUvnfNaV3mcwrNUJpb2g83Qmx6HaXSm04NSsmad3m1J UH+7PNCa/ohVA== Received: from localhost (localhost [127.0.0.1]) by cc-smtpin1.netcologne.de (Postfix) with ESMTP id F354711D8F; Sat, 25 Aug 2018 21:38:26 +0200 (CEST) Received: from [78.35.146.4] (helo=cc-smtpin1.netcologne.de) by localhost with ESMTP (eXpurgate 4.6.0) (envelope-from ) id 5b81b032-5f8a-7f0000012729-7f000001b16a-1 for ; Sat, 25 Aug 2018 21:38:26 +0200 Received: from [192.168.178.68] (xdsl-78-35-146-4.netcologne.de [78.35.146.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by cc-smtpin1.netcologne.de (Postfix) with ESMTPSA; Sat, 25 Aug 2018 21:38:24 +0200 (CEST) To: "fortran@gcc.gnu.org" , gcc-patches From: Thomas Koenig Subject: [patch, libfortran] correct buffer size in library matmul Message-ID: Date: Sat, 25 Aug 2018 21:38:22 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Hello world, the attached patch fixes a regression where the calculation of the size of the buffer for matmul was too small for ine special case. Regression-tested. OK for trunk? Regards Thomas 2018-08-25 Thomas Koenig PR libfortran/86704 * m4/matmul_internal.m4: Correct calculation of needed buffer size for arrays of shape (1,n). * generated/matmul_c10.c: Regenerated * generated/matmul_c16.c: Regenerated * generated/matmul_c4.c: Regenerated * generated/matmul_c8.c: Regenerated * generated/matmul_i1.c: Regenerated * generated/matmul_i16.c: Regenerated * generated/matmul_i2.c: Regenerated * generated/matmul_i4.c: Regenerated * generated/matmul_i8.c: Regenerated * generated/matmul_r10.c: Regenerated * generated/matmul_r16.c: Regenerated * generated/matmul_r4.c: Regenerated * generated/matmul_r8.c: Regenerated * generated/matmulavx128_c10.c: Regenerated * generated/matmulavx128_c16.c: Regenerated * generated/matmulavx128_c4.c: Regenerated * generated/matmulavx128_c8.c: Regenerated * generated/matmulavx128_i1.c: Regenerated * generated/matmulavx128_i16.c: Regenerated * generated/matmulavx128_i2.c: Regenerated * generated/matmulavx128_i4.c: Regenerated * generated/matmulavx128_i8.c: Regenerated * generated/matmulavx128_r10.c: Regenerated * generated/matmulavx128_r16.c: Regenerated * generated/matmulavx128_r4.c: Regenerated * generated/matmulavx128_r8.c: Regenerated 2018-08-25 Thomas Koenig PR libfortran/86704 * gfortran.dg/matmul_19.f90: New test. Index: m4/matmul_internal.m4 =================================================================== --- m4/matmul_internal.m4 (Revision 263752) +++ m4/matmul_internal.m4 (Arbeitskopie) @@ -233,8 +233,13 @@ sinclude(`matmul_asm_'rtype_code`.m4')dnl return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_c10.c =================================================================== --- generated/matmul_c10.c (Revision 263752) +++ generated/matmul_c10.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_c10_avx (gfc_array_c10 * const restrict ret return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_c10_avx2 (gfc_array_c10 * const restrict re return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_c10_avx512f (gfc_array_c10 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_c10_vanilla (gfc_array_c10 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_c10 (gfc_array_c10 * const restrict retarra return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_c16.c =================================================================== --- generated/matmul_c16.c (Revision 263752) +++ generated/matmul_c16.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_c16_avx (gfc_array_c16 * const restrict ret return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_c16_avx2 (gfc_array_c16 * const restrict re return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_c16_avx512f (gfc_array_c16 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_c16_vanilla (gfc_array_c16 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_c16 (gfc_array_c16 * const restrict retarra return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_c4.c =================================================================== --- generated/matmul_c4.c (Revision 263752) +++ generated/matmul_c4.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_c4_avx (gfc_array_c4 * const restrict retar return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_c4_avx2 (gfc_array_c4 * const restrict reta return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_c4_avx512f (gfc_array_c4 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_c4_vanilla (gfc_array_c4 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_c4 (gfc_array_c4 * const restrict retarray, return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_c8.c =================================================================== --- generated/matmul_c8.c (Revision 263752) +++ generated/matmul_c8.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_c8_avx (gfc_array_c8 * const restrict retar return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_c8_avx2 (gfc_array_c8 * const restrict reta return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_c8_avx512f (gfc_array_c8 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_c8_vanilla (gfc_array_c8 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_c8 (gfc_array_c8 * const restrict retarray, return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_i1.c =================================================================== --- generated/matmul_i1.c (Revision 263752) +++ generated/matmul_i1.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_i1_avx (gfc_array_i1 * const restrict retar return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_i1_avx2 (gfc_array_i1 * const restrict reta return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_i1_avx512f (gfc_array_i1 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_i1_vanilla (gfc_array_i1 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_i1 (gfc_array_i1 * const restrict retarray, return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_i16.c =================================================================== --- generated/matmul_i16.c (Revision 263752) +++ generated/matmul_i16.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_i16_avx (gfc_array_i16 * const restrict ret return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_i16_avx2 (gfc_array_i16 * const restrict re return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_i16_avx512f (gfc_array_i16 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_i16_vanilla (gfc_array_i16 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_i16 (gfc_array_i16 * const restrict retarra return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_i2.c =================================================================== --- generated/matmul_i2.c (Revision 263752) +++ generated/matmul_i2.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_i2_avx (gfc_array_i2 * const restrict retar return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_i2_avx2 (gfc_array_i2 * const restrict reta return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_i2_avx512f (gfc_array_i2 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_i2_vanilla (gfc_array_i2 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_i2 (gfc_array_i2 * const restrict retarray, return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_i4.c =================================================================== --- generated/matmul_i4.c (Revision 263752) +++ generated/matmul_i4.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_i4_avx (gfc_array_i4 * const restrict retar return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_i4_avx2 (gfc_array_i4 * const restrict reta return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_i4_avx512f (gfc_array_i4 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_i4_vanilla (gfc_array_i4 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_i4 (gfc_array_i4 * const restrict retarray, return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_i8.c =================================================================== --- generated/matmul_i8.c (Revision 263752) +++ generated/matmul_i8.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_i8_avx (gfc_array_i8 * const restrict retar return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_i8_avx2 (gfc_array_i8 * const restrict reta return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_i8_avx512f (gfc_array_i8 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_i8_vanilla (gfc_array_i8 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_i8 (gfc_array_i8 * const restrict retarray, return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_r10.c =================================================================== --- generated/matmul_r10.c (Revision 263752) +++ generated/matmul_r10.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_r10_avx (gfc_array_r10 * const restrict ret return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_r10_avx2 (gfc_array_r10 * const restrict re return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_r10_avx512f (gfc_array_r10 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_r10_vanilla (gfc_array_r10 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_r10 (gfc_array_r10 * const restrict retarra return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_r16.c =================================================================== --- generated/matmul_r16.c (Revision 263752) +++ generated/matmul_r16.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_r16_avx (gfc_array_r16 * const restrict ret return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_r16_avx2 (gfc_array_r16 * const restrict re return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_r16_avx512f (gfc_array_r16 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_r16_vanilla (gfc_array_r16 * const restrict return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_r16 (gfc_array_r16 * const restrict retarra return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_r4.c =================================================================== --- generated/matmul_r4.c (Revision 263752) +++ generated/matmul_r4.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_r4_avx (gfc_array_r4 * const restrict retar return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_r4_avx2 (gfc_array_r4 * const restrict reta return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_r4_avx512f (gfc_array_r4 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_r4_vanilla (gfc_array_r4 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_r4 (gfc_array_r4 * const restrict retarray, return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmul_r8.c =================================================================== --- generated/matmul_r8.c (Revision 263752) +++ generated/matmul_r8.c (Arbeitskopie) @@ -317,8 +317,13 @@ matmul_r8_avx (gfc_array_r8 * const restrict retar return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -869,8 +874,13 @@ matmul_r8_avx2 (gfc_array_r8 * const restrict reta return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1421,8 +1431,13 @@ matmul_r8_avx512f (gfc_array_r8 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -1987,8 +2002,13 @@ matmul_r8_vanilla (gfc_array_r8 * const restrict r return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -2613,8 +2633,13 @@ matmul_r8 (gfc_array_r8 * const restrict retarray, return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_c10.c =================================================================== --- generated/matmulavx128_c10.c (Revision 263752) +++ generated/matmulavx128_c10.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_c10_avx128_fma3 (gfc_array_c10 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_c10_avx128_fma4 (gfc_array_c10 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_c16.c =================================================================== --- generated/matmulavx128_c16.c (Revision 263752) +++ generated/matmulavx128_c16.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_c16_avx128_fma3 (gfc_array_c16 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_c16_avx128_fma4 (gfc_array_c16 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_c4.c =================================================================== --- generated/matmulavx128_c4.c (Revision 263752) +++ generated/matmulavx128_c4.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_c4_avx128_fma3 (gfc_array_c4 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_c4_avx128_fma4 (gfc_array_c4 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_c8.c =================================================================== --- generated/matmulavx128_c8.c (Revision 263752) +++ generated/matmulavx128_c8.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_c8_avx128_fma3 (gfc_array_c8 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_c8_avx128_fma4 (gfc_array_c8 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_i1.c =================================================================== --- generated/matmulavx128_i1.c (Revision 263752) +++ generated/matmulavx128_i1.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_i1_avx128_fma3 (gfc_array_i1 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_i1_avx128_fma4 (gfc_array_i1 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_i16.c =================================================================== --- generated/matmulavx128_i16.c (Revision 263752) +++ generated/matmulavx128_i16.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_i16_avx128_fma3 (gfc_array_i16 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_i16_avx128_fma4 (gfc_array_i16 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_i2.c =================================================================== --- generated/matmulavx128_i2.c (Revision 263752) +++ generated/matmulavx128_i2.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_i2_avx128_fma3 (gfc_array_i2 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_i2_avx128_fma4 (gfc_array_i2 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_i4.c =================================================================== --- generated/matmulavx128_i4.c (Revision 263752) +++ generated/matmulavx128_i4.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_i4_avx128_fma3 (gfc_array_i4 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_i4_avx128_fma4 (gfc_array_i4 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_i8.c =================================================================== --- generated/matmulavx128_i8.c (Revision 263752) +++ generated/matmulavx128_i8.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_i8_avx128_fma3 (gfc_array_i8 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_i8_avx128_fma4 (gfc_array_i8 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_r10.c =================================================================== --- generated/matmulavx128_r10.c (Revision 263752) +++ generated/matmulavx128_r10.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_r10_avx128_fma3 (gfc_array_r10 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_r10_avx128_fma4 (gfc_array_r10 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_r16.c =================================================================== --- generated/matmulavx128_r16.c (Revision 263752) +++ generated/matmulavx128_r16.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_r16_avx128_fma3 (gfc_array_r16 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_r16_avx128_fma4 (gfc_array_r16 * const rest return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_r4.c =================================================================== --- generated/matmulavx128_r4.c (Revision 263752) +++ generated/matmulavx128_r4.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_r4_avx128_fma3 (gfc_array_r4 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_r4_avx128_fma4 (gfc_array_r4 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; Index: generated/matmulavx128_r8.c =================================================================== --- generated/matmulavx128_r8.c (Revision 263752) +++ generated/matmulavx128_r8.c (Arbeitskopie) @@ -282,8 +282,13 @@ matmul_r8_avx128_fma3 (gfc_array_r8 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536; @@ -835,8 +840,13 @@ matmul_r8_avx128_fma4 (gfc_array_r8 * const restri return; /* Adjust size of t1 to what is needed. */ - index_type t1_dim; - t1_dim = (a_dim1 - (ycount > 1)) * 256 + b_dim1; + index_type t1_dim, a_sz; + if (aystride == 1) + a_sz = rystride; + else + a_sz = a_dim1; + + t1_dim = a_sz * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536;