From patchwork Fri Nov 11 02:56:54 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Pinski X-Patchwork-Id: 693543 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tFPhw339Xz9ryv for ; Fri, 11 Nov 2016 13:57:19 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="rbFOSz0/"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=xdF0I5PWMes8gxOaJnZ6LCI2cxBf8Gf5Jc2KQXfCntRdmm uVRPDf/XCGwdDdi29/vBVBYj7A2m7458ICPELSUIiPFFwtS9Fb9g41L4rKCm/muv /nydm2znz4o0GwajPZZ2NWa4nTsdrg7wJKNfzfXLJBRHJS8iR7Frqv6zajV9I= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; s= default; bh=2YIoh2u1CvtAuqUOzaFQAx8XHqQ=; b=rbFOSz0/zKOKIrCQvkKl HXtpAVNnEb2khA7K8ATI7PK1p3Xd1O2No32iBsuYobLm9z+LLRlgU0DiPdTtOYYg +vLjxAVVso+MTF6Nzh74W9D/kk9d3oTKAWF6UBsM+9vk3DCInVvnU3nsjcWbiB27 7BNie0k4ZpLoBvlB696d3sg= Received: (qmail 37185 invoked by alias); 11 Nov 2016 02:57:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 37158 invoked by uid 89); 11 Nov 2016 02:56:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=no version=3.3.2 spammy=Improved, LITTLE, imp, array_size X-HELO: mail-yb0-f196.google.com Received: from mail-yb0-f196.google.com (HELO mail-yb0-f196.google.com) (209.85.213.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 11 Nov 2016 02:56:56 +0000 Received: by mail-yb0-f196.google.com with SMTP id d128so110324ybh.3 for ; Thu, 10 Nov 2016 18:56:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=G5ddeOnt4bS7xAdSkvDYj9BsLVDEnvGn9mu4VWGR7IQ=; b=GazaJKXOTJDynwylB7xYTiJu4a2Y/gntC1FrsMr1OgwEl84paQRWVcW3KkwcNSmhRd R+UvadBxlTODPqwQ8+KccK9R9550TwXVNotSkq16dFas8LNEDXFFJtNcF8SNPuRxE/rS /0S2d914H3CMWuY/m/B6LI/BoQLdcnZHaIlwEUYGAv14Us4crQ4C9cXgEk+HvBb7/wXc ucF6AcgkM7102L6Gh2xMobv7F034l3ynFRb4qYFYhjzyANoK1kHwfEhYT+nNh7XcBMtS YdaAB60EeziRqQUQp1BfIcePTv82LkDEIvc58CEjb4Av9d1q967qix/ltNJaBkWAIJqR qDxA== X-Gm-Message-State: ABUngvdo1b60MhH1T8IdBtrrh87ZWOiC/HqhWrdgt9ukQwu1HQORtYJrMjF969XkMDeqebICRbJFiSG18EoiEQ== X-Received: by 10.37.55.205 with SMTP id e196mr790421yba.108.1478833014786; Thu, 10 Nov 2016 18:56:54 -0800 (PST) MIME-Version: 1.0 Received: by 10.129.76.210 with HTTP; Thu, 10 Nov 2016 18:56:54 -0800 (PST) From: Andrew Pinski Date: Thu, 10 Nov 2016 18:56:54 -0800 Message-ID: Subject: [PATCH/AARCH64] Improved -mcpu/mtune/march=native handling To: GCC Patches X-IsSubscribed: yes As I mentioned in my other emails, parsing /proc/cpuinfo has one issue is that the current parsing assumes many different things about the format. So the best way to do this is to parse /sys/devices/system/cpu/cpuN/regs/identification/midr_el1 files instead. To get which cpu are present (though not necessarily online) we parse "/sys/devices/system/cpu/present" file. We fall back to parsing /proc/cpu if any parsing fails of these files including not finding out which cpu we are on. The main reason why we fall back is because only newer kernels support exporting this file. To get the features I just look at the hwcap that the kernel passes to userspace so I needed to add an extra argument to AARCH64_OPT_EXTENSION. I also had to define some HWCAP_* macros in driver-aarch64.c since older kernels headers don't have these values defined. It should also be possible to parse /sys/devices/system/cpu/cpu%d/cache%d directory to get cache information too but that is left for another patch and another time. Since I don't have access to a big.LITTLE system, someone should test there with a new enough kernel; I was using stock 4.9.0-rc3. OK? Bootstrapped and tested on ThunderX on aarch64-linux-gnu with no regressions and making sure /proc/cpuinfo is not read (by using strace). Thanks, Andrew Pinski ChangeLog: * config/aarch64/aarch64-option-extensions.def: Document extra argument to AARCH64_OPT_EXTENSION. Update for the extra argument for all of the option extensions. * config/aarch64/driver-aarch64.c: Include sys/auxv.h and asm/hwcap.h. (HWCAP_CRC32): Define if needed. (HWCAP_ATOMICS): Likewise. (HWCAP_FPHP): Likewise. (HWCAP_ASIMDHP): Likewise. (aarch64_arch_extension): New field hwcap_mask. (AARCH64_OPT_EXTENSION): Handle extra argument. (AARCH64_BIG_LITTLE): Always put the larger core number first. (valid_bL_core_p): Don't check AARCH64_BIG_LITTLE for the opposite order as it already handles the order. (implementor_from_midr): New function. (part_no_from_midr): New function. (sysfsformat): New define. (host_detect_local_cpu_sys): New function. (host_detect_local_cpu): Call host_detect_local_cpu_sys if opening "/sys/devices/system/cpu/present" file worked. * common/config/aarch64/aarch64-common.c (AARCH64_OPT_EXTENSION): Handle extra argument. Index: common/config/aarch64/aarch64-common.c =================================================================== --- common/config/aarch64/aarch64-common.c (revision 242061) +++ common/config/aarch64/aarch64-common.c (working copy) @@ -121,7 +121,7 @@ struct aarch64_option_extension /* ISA extensions in AArch64. */ static const struct aarch64_option_extension all_extensions[] = { -#define AARCH64_OPT_EXTENSION(NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, Z) \ +#define AARCH64_OPT_EXTENSION(NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, Z, YY) \ {NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF}, #include "config/aarch64/aarch64-option-extensions.def" {NULL, 0, 0, 0} Index: config/aarch64/aarch64-option-extensions.def =================================================================== --- config/aarch64/aarch64-option-extensions.def (revision 242061) +++ config/aarch64/aarch64-option-extensions.def (working copy) @@ -21,7 +21,7 @@ Before using #include to read this file, define a macro: - AARCH64_OPT_EXTENSION(EXT_NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, FEATURE_STRING) + AARCH64_OPT_EXTENSION(EXT_NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, FEATURE_STRING, HWCAP) EXT_NAME is the name of the extension, represented as a string constant. FLAGS_CANONICAL is the canonical internal name for this flag. @@ -36,28 +36,29 @@ the extension (for example, the 'crypto' extension depends on four entries: aes, pmull, sha1, sha2 being present). In that case this field should contain a space (" ") separated list of the strings in 'Features' - that are required. Their order is not important. */ + that are required. Their order is not important. + HWCAP is the required hwcap mask for this feature. */ /* Enabling "fp" just enables "fp". Disabling "fp" also disables "simd", "crypto" and "fp16". */ -AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_F16, "fp") +AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_F16, "fp", HWCAP_FP) /* Enabling "simd" also enables "fp". Disabling "simd" also disables "crypto". */ -AARCH64_OPT_EXTENSION("simd", AARCH64_FL_SIMD, AARCH64_FL_FP, AARCH64_FL_CRYPTO, "asimd") +AARCH64_OPT_EXTENSION("simd", AARCH64_FL_SIMD, AARCH64_FL_FP, AARCH64_FL_CRYPTO, "asimd", HWCAP_ASIMD) /* Enabling "crypto" also enables "fp", "simd". Disabling "crypto" just disables "crypto". */ -AARCH64_OPT_EXTENSION("crypto", AARCH64_FL_CRYPTO, AARCH64_FL_FP | AARCH64_FL_SIMD, 0, "aes pmull sha1 sha2") +AARCH64_OPT_EXTENSION("crypto", AARCH64_FL_CRYPTO, AARCH64_FL_FP | AARCH64_FL_SIMD, 0, "aes pmull sha1 sha2", HWCAP_AES | HWCAP_PMULL | HWCAP_SHA1 | HWCAP_SHA2) /* Enabling or disabling "crc" only changes "crc". */ -AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32") +AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32", HWCAP_CRC32) /* Enabling or disabling "lse" only changes "lse". */ -AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics") +AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics", HWCAP_ATOMICS) /* Enabling "fp16" also enables "fp". Disabling "fp16" just disables "fp16". */ -AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fp16") +AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fp16", HWCAP_FPHP | HWCAP_ASIMDHP) #undef AARCH64_OPT_EXTENSION Index: config/aarch64/driver-aarch64.c =================================================================== --- config/aarch64/driver-aarch64.c (revision 242061) +++ config/aarch64/driver-aarch64.c (working copy) @@ -22,20 +22,40 @@ #include "system.h" #include "coretypes.h" #include "tm.h" +#include +#include /* Defined in common/config/aarch64/aarch64-common.c. */ std::string aarch64_get_extension_string_for_isa_flags (unsigned long, unsigned long); +/* Not always defined hwcaps. */ +#ifndef HWCAP_CRC32 +#define HWCAP_CRC32 (1 << 7) +#endif + +#ifndef HWCAP_ATOMICS +#define HWCAP_ATOMICS (1 << 8) +#endif + +#ifndef HWCAP_FPHP +#define HWCAP_FPHP (1 << 9) +#endif + +#ifndef HWCAP_ASIMDHP +#define HWCAP_ASIMDHP (1 << 10) +#endif + struct aarch64_arch_extension { const char *ext; unsigned int flag; const char *feat_string; + unsigned long hwcap_mask; }; -#define AARCH64_OPT_EXTENSION(EXT_NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, FEATURE_STRING) \ - { EXT_NAME, FLAG_CANONICAL, FEATURE_STRING }, +#define AARCH64_OPT_EXTENSION(EXT_NAME, FLAG_CANONICAL, FLAGS_ON, FLAGS_OFF, FEATURE_STRING, HWCAP_MASK) \ + { EXT_NAME, FLAG_CANONICAL, FEATURE_STRING, HWCAP_MASK }, static struct aarch64_arch_extension aarch64_extensions[] = { #include "aarch64-option-extensions.def" @@ -51,8 +71,10 @@ struct aarch64_core_data const unsigned long flags; }; -#define AARCH64_BIG_LITTLE(BIG, LITTLE) \ - (((BIG)&0xFFFu) << 12 | ((LITTLE) & 0xFFFu)) +#define AARCH64_BIG_LITTLE(BIG, LITTLE) \ + (((BIG) > (LITTLE)) \ + ? (((BIG)&0xFFFu) << 12 | ((LITTLE) & 0xFFFu)) \ + : (((LITTLE)&0xFFFu) << 12 | ((BIG) & 0xFFFu))) #define INVALID_IMP ((unsigned char) -1) #define INVALID_CORE ((unsigned)-1) @@ -107,8 +129,7 @@ get_arch_from_id (const char* id) static bool valid_bL_core_p (unsigned int *core, unsigned int bL_core) { - return AARCH64_BIG_LITTLE (core[0], core[1]) == bL_core - || AARCH64_BIG_LITTLE (core[1], core[0]) == bL_core; + return AARCH64_BIG_LITTLE (core[0], core[1]) == bL_core; } /* Returns the hex integer that is after ':' for the FIELD. @@ -141,6 +162,186 @@ contains_core_p (unsigned *arr, unsigned return false; } +/* Returns the implementator from MIDR. */ + +static inline unsigned +implementor_from_midr (unsigned midr) +{ + return midr >> 24; +} + + +/* Returns the part number from MIDR. */ + +static inline unsigned +part_no_from_midr (unsigned midr) +{ + return (midr & 0xffff) >> 4; +} + +#define sysfsformat "/sys/devices/system/cpu/cpu%d/regs/identification/midr_el1" + + +/* Parses the sysfs if it exits. This has higher prority than parsing + /proc/cpuinfo. + TODO: This should also parse out the cache info too. */ + +static const char * +host_detect_local_cpu_sys (FILE *f, bool cpu, bool tune, bool arch) +{ + static const int num_exts = ARRAY_SIZE (aarch64_extensions); + const char *res = NULL; + char present[128]; + unsigned midrs[2] = { INVALID_CORE, INVALID_CORE }; + int n_midr = 0; + unsigned long extension_flags = 0; + unsigned long default_flags = 0; + const char *ext_string = ""; + + /* Read present, it is formated as comma seperated min-max. */ + bool r = fgets (present, sizeof (present), f) == NULL; + fclose (f); + if (r) + return NULL; + /* Remove the newline. */ + if (present[strlen(present)-1]=='\n') + present[strlen(present)-1] = 0; + + char *nextfield = present; + do + { + int min; + int max; + char *current = nextfield; + char *dash; + min = strtol (current, &dash, 0); + if (dash[0] == 0) + { + max = min; + nextfield = NULL; + } + if (dash[0] == ',') + { + max = min; + nextfield = dash + 1; + } + /* Parse error */ + else if (dash[0] != '-') + return NULL; + else + { + max = strtol (dash+1, &dash, 0); + if (dash[0] == 0) + nextfield = NULL; + else if (dash[0] != ',') + return NULL; + else + nextfield = dash + 1; + } + + for (int cpu = min; cpu <= max; cpu++) + { + char buf[128]; + char *end; + char filename[sizeof(sysfsformat)+5]; + sprintf(filename, sysfsformat, cpu); + f = fopen (filename, "r"); + if (!f) + return NULL; + r = fgets (buf, sizeof (buf), f) == NULL; + fclose (f); + if (r) + return NULL; + unsigned midr = strtoul(buf, &end, 0); + if (end[0] != 0 && end[0] != '\n') + return NULL; + if (!contains_core_p (midrs, midr)) + { + if (n_midr == 2) + return NULL; + midrs[n_midr++] = midr; + } + } + } while (nextfield); + + /* No processor, then can't find anything. */ + if (n_midr == 0) + return NULL; + + /* Find the features via the HWCAP. */ + unsigned long hwcap = getauxval (AT_HWCAP); + for (int i = 0; i < num_exts; i++) + if (hwcap & aarch64_extensions[i].hwcap_mask) + extension_flags |= aarch64_extensions[i].flag; + else + extension_flags &= ~aarch64_extensions[i].flag; + + /* Found one type of core. */ + unsigned imp; + unsigned part_no; + if (n_midr == 1) + { + imp = implementor_from_midr (midrs[0]); + part_no = (midrs[0] & 0xffff) >> 4; + } + else + { + unsigned imp0 = implementor_from_midr (midrs[0]); + unsigned part_no0 = (midrs[0] & 0xffff) >> 4; + unsigned imp1 = implementor_from_midr (midrs[1]); + unsigned part_no1 = (midrs[1] & 0xffff) >> 4; + /* FIXME: Handle the case where there are two implementors. */ + if (imp1 != imp0) + return NULL; + imp = imp0; + if (part_no0 != part_no1) + part_no = AARCH64_BIG_LITTLE (part_no0, part_no1); + else + part_no = part_no0; + } + + int core_idx = -1; + for (int i = 0; aarch64_cpu_data[i].name != NULL; i++) + if (part_no == aarch64_cpu_data[i].part_no + && aarch64_cpu_data[i].implementer_id == imp) + { + core_idx = i; + break; + } + if (core_idx == -1) + return NULL; + + if (arch) + { + struct aarch64_arch_driver_info* arch_info + = get_arch_from_id (aarch64_cpu_data [core_idx].arch); + /* We got some arch indentifier that's not in aarch64-arches.def? */ + if (!arch_info) + return NULL; + res = concat ("-march=", arch_info->name, NULL); + default_flags = arch_info->flags; + } + else + { + res = concat ("-m", cpu ? "cpu" : "tune", "=", + aarch64_cpu_data[core_idx].name, NULL); + default_flags = aarch64_cpu_data[core_idx].flags; + } + + //FIXME: Detect the caches here via /sys/devices/system/cpu/cpu%d/cache%d. + + if (tune) + return res; + + ext_string + = aarch64_get_extension_string_for_isa_flags (extension_flags, + default_flags).c_str (); + + res = concat (res, ext_string, NULL); + return res; +} + + /* This will be called by the spec parser in gcc.c when it sees a %:local_cpu_detect(args) construct. Currently it will be called with either "arch", "cpu" or "tune" as argument depending on if @@ -193,6 +394,16 @@ host_detect_local_cpu (int argc, const c if (!arch && !tune && !cpu) goto not_found; + + f = fopen ("/sys/devices/system/cpu/present", "r"); + + if (f) + { + res = host_detect_local_cpu_sys (f, cpu, tune, arch); + if (res) + return res; + } + f = fopen ("/proc/cpuinfo", "r"); if (f == NULL)