From patchwork Tue Apr 30 07:25:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1929341 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=dYNpHeGm; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VTBZY6P4Yz23hd for ; Tue, 30 Apr 2024 17:26:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ECB453858D35 for ; Tue, 30 Apr 2024 07:26:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 902613858D35 for ; Tue, 30 Apr 2024 07:25:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 902613858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 902613858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714461949; cv=none; b=er7kWB9xzDzt4Ma7kFfO52NDvJ7n6OdMLJiOdZWDFzbstFGA9iIFFOTEb0hgxfbzEA74KQPCYoOJ2yoKwve9zqb/HbkjTYIR9L+p9FUEgYGrpWHQenRCYNAFFM4Ob1hNewR9F28q4bGFVxJ4EiF+IUPtwuk60qOrf2HtPEIhfkI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714461949; c=relaxed/simple; bh=uK8/fBsEzCvfY8TaWHFiZjiYCPDy2L/jIrIxkwzp5lQ=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=c0OBlTzBWFPizogeM0HhQ8wguFuHJCraOL25QjvAZvclOxZfwOodSBE+GKi/mDuNu8jSj3qqJMj1lPOJH6yqDzdIFP1ohHICAR6SfVm4xghowLexiP9+CvJf8jEWkcrmkBFC+N/uL2he2xsp+PfUvuX5CX2thxHUI5grpSOcdxA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1714461947; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=6LJ4PO7SLbLPYV6evw1RG+rpGtkOuDqzsk3/lPx9f5Q=; b=dYNpHeGmw7wxxj3Xq+6cI6rWtJbsaOJxO16/IIYisWpcjzLUM22WvG5O4es1h5WvIddXra PjrVKHdG6CgKA6EjLvAWw84tpADZlOAR5/wT/E+1cAoWSUYZwxzEDtGOZHwJJCiWCSAFG2 b59L1/C4SN8RQjHsS73BmgJkq6i1epw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-237-oQNqoi-gOVCWuwAZBrLrsg-1; Tue, 30 Apr 2024 03:25:45 -0400 X-MC-Unique: oQNqoi-gOVCWuwAZBrLrsg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 293BD182ED61; Tue, 30 Apr 2024 07:25:45 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.45.224.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DF7FF492BC7; Tue, 30 Apr 2024 07:25:44 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 43U7PhGn2788777 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 30 Apr 2024 09:25:43 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 43U7PgCN2788776; Tue, 30 Apr 2024 09:25:42 +0200 Date: Tue, 30 Apr 2024 09:25:42 +0200 From: Jakub Jelinek To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] gimple-ssa-sprintf: Use [0, 1] range for %lc with (wint_t) 0 argument [PR114876] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi! Seems when Martin S. implemented this, he coded there strict reading of the standard, which said that %lc with (wint_t) 0 argument is handled as wchar_t[2] temp = { arg, 0 }; %ls with temp arg and so shouldn't print any values. But, most of the libc implementations actually handled that case like %c with '\0' argument, adding a single NUL character, the only known exception is musl. Recently, C23 changed this in response to GB-141 and POSIX in https://austingroupbugs.net/view.php?id=1647 so that it should have the same behavior as %c with '\0'. Because there is implementation divergence, the following patch uses a range rather than hardcoding it to all 1s (i.e. the %c behavior), though the likely case is still 1 (forward looking plus most of implementations). The res.knownrange = true; assignment removed is redundant due to the same assignment done unconditionally before the if statement, rest is formatting fixes. I don't think the min >= 0 && min < 128 case is right either, I'd think it should be min >= 0 && max < 128, otherwise it is just some possible inputs are (maybe) ASCII and there can be others, but this code is a total mess anyway, with the min, max, likely (somewhere in [min, max]?) and then unlikely possibly larger than max, dunno, perhaps for at least some chars in the ASCII range the likely case could be for the ascii case; so perhaps just the one_2_one_ascii shouldn't set max to 1 and mayfail should be true for max >= 128. Anyway, didn't feel I should touch that right now. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Shall it go to 14.1, or wait for 14.2? 2024-04-30 Jakub Jelinek PR tree-optimization/114876 * gimple-ssa-sprintf.cc (format_character): For min == 0 && max == 0, set max, likely and unlikely members to 1 rather than 0. Remove useless res.knownrange = true;. Formatting fixes. * gcc.dg/pr114876.c: New test. * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Adjust expected diagnostics. Jakub --- gcc/gimple-ssa-sprintf.cc.jj 2024-01-03 11:51:22.225860346 +0100 +++ gcc/gimple-ssa-sprintf.cc 2024-04-29 12:52:59.760668894 +0200 @@ -2177,8 +2177,7 @@ format_character (const directive &dir, res.knownrange = true; - if (dir.specifier == 'C' - || dir.modifier == FMT_LEN_l) + if (dir.specifier == 'C' || dir.modifier == FMT_LEN_l) { /* A wide character can result in as few as zero bytes. */ res.range.min = 0; @@ -2189,10 +2188,13 @@ format_character (const directive &dir, { if (min == 0 && max == 0) { - /* The NUL wide character results in no bytes. */ - res.range.max = 0; - res.range.likely = 0; - res.range.unlikely = 0; + /* In strict reading of older ISO C or POSIX, this required + no characters to be emitted. ISO C23 changes that, so + does POSIX, to match what has been implemented in most of the + implementations, namely emitting a single NUL character. + Let's use 0 for minimum and 1 for all the other values. */ + res.range.max = 1; + res.range.likely = res.range.unlikely = 1; } else if (min >= 0 && min < 128) { @@ -2200,11 +2202,12 @@ format_character (const directive &dir, is not a 1-to-1 mapping to the source character set or if the source set is not ASCII. */ bool one_2_one_ascii - = (target_to_host_charmap[0] == 1 && target_to_host ('a') == 97); + = (target_to_host_charmap[0] == 1 + && target_to_host ('a') == 97); /* A wide character in the ASCII range most likely results in a single byte, and only unlikely in up to MB_LEN_MAX. */ - res.range.max = one_2_one_ascii ? 1 : target_mb_len_max ();; + res.range.max = one_2_one_ascii ? 1 : target_mb_len_max (); res.range.likely = 1; res.range.unlikely = target_mb_len_max (); res.mayfail = !one_2_one_ascii; @@ -2235,7 +2238,6 @@ format_character (const directive &dir, /* A plain '%c' directive. Its output is exactly 1. */ res.range.min = res.range.max = 1; res.range.likely = res.range.unlikely = 1; - res.knownrange = true; } /* Bump up the byte counters if WIDTH is greater. */ --- gcc/testsuite/gcc.dg/pr114876.c.jj 2024-04-29 12:26:45.774965158 +0200 +++ gcc/testsuite/gcc.dg/pr114876.c 2024-04-29 12:51:37.863777055 +0200 @@ -0,0 +1,34 @@ +/* PR tree-optimization/114876 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-not "return \[01\];" "optimized" } } */ +/* { dg-final { scan-tree-dump "return 3;" "optimized" } } */ +/* { dg-final { scan-tree-dump "return 4;" "optimized" } } */ + +int +foo (void) +{ + char buf[64]; + return __builtin_sprintf (buf, "%lc%lc%lc", (__WINT_TYPE__) 0, (__WINT_TYPE__) 0, (__WINT_TYPE__) 0); +} + +int +bar (void) +{ + char buf[64]; + return __builtin_sprintf (buf, "%c%c%c", 0, 0, 0); +} + +int +baz (void) +{ + char buf[64]; + return __builtin_sprintf (buf, "%lc%lc%lca", (__WINT_TYPE__) 0, (__WINT_TYPE__) 0, (__WINT_TYPE__) 0); +} + +int +qux (void) +{ + char buf[64]; + return __builtin_sprintf (buf, "%c%c%ca", 0, 0, 0); +} --- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c.jj 2020-12-03 10:04:35.888092988 +0100 +++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c 2024-04-29 12:49:14.452717581 +0200 @@ -200,11 +200,11 @@ void test_sprintf_chk_c_const (void) T (3, "%c%c", '1', '2'); /* Wide characters. */ - T (0, "%lc", (wint_t)0); /* { dg-warning "nul past the end" } */ - T (1, "%lc", (wint_t)0); - T (1, "%lc%lc", (wint_t)0, (wint_t)0); + T (0, "%lc", (wint_t)0); /* { dg-warning ".%lc. directive writing up to 1 bytes into a region of size 0" } */ + T (1, "%lc", (wint_t)0); /* { dg-warning "nul past the end" } */ + T (1, "%lc%lc", (wint_t)0, (wint_t)0); /* { dg-warning ".%lc. directive writing up to 1 bytes into a region of size between 0 and 1" } */ T (2, "%lc", (wint_t)0); - T (2, "%lc%lc", (wint_t)0, (wint_t)0); + T (2, "%lc%lc", (wint_t)0, (wint_t)0); /* { dg-warning "nul past the end" } */ /* The following could result in as few as no bytes and in as many as MB_CUR_MAX, but since the MB_CUR_MAX value is a runtime property @@ -1550,7 +1550,7 @@ void test_snprintf_c_const (char *d) /* Wide characters. */ T (0, "%lc", (wint_t)0); - T (1, "%lc", (wint_t)0); + T (1, "%lc", (wint_t)0); /* { dg-warning "output may be truncated before the last format character" } */ T (2, "%lc", (wint_t)0); /* The following could result in as few as a single byte and in as many @@ -1603,7 +1603,7 @@ void test_snprintf_chk_c_const (void) /* Wide characters. */ T (0, "%lc", (wint_t)0); - T (1, "%lc", (wint_t)0); + T (1, "%lc", (wint_t)0); /* { dg-warning "output may be truncated before the last format character" } */ T (2, "%lc", (wint_t)0); /* The following could result in as few as a single byte and in as many