From patchwork Fri Oct 18 11:20:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Avinal Kumar X-Patchwork-Id: 1999076 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=JhadKCx5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVMm613H9z1xth for ; Fri, 18 Oct 2024 22:24:06 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5378A385840B for ; Fri, 18 Oct 2024 11:24:04 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by sourceware.org (Postfix) with ESMTPS id C4B0E3858C32 for ; Fri, 18 Oct 2024 11:23:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C4B0E3858C32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C4B0E3858C32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::62b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729250625; cv=none; b=ZuIgvHR52eO57HtOwvA2iIFPDIUaawuXdQ7rPLVju+Vs+gUGy2DTIBqirRqbR7Z2R4a02dBIBWQ6xZHE/M6CbpMMNO2WdIUmwfTrNcyx8mOuk/GjIiTFv91Om5zC/+BeksWXNq5v12UcucusXHTr70rC0uXz1KgvGcEm5AAEfBE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729250625; c=relaxed/simple; bh=RKKxZwcHKYdGVBTcamBv3dhiC0tBc9Hj9QQ12S1Nkw8=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=C3lf1FPAo0whW32A1kSYZOVi12MQIDSrhhNkbOJymMEpqOre09LAFbYTBgEk+4p0k8wTQQqD5LH5MfTErfQxrxfpgZwUVK2Vah+hz1uOM3aLqltLYCu1MMjrKhr70HTp3yg4u1QLuyRcuwkoYBA5Z2IXhqBHrVHDl12I7Rtiyis= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-20ce65c8e13so17408825ad.1 for ; Fri, 18 Oct 2024 04:23:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729250620; x=1729855420; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=0C1fcXvDQYwQGR1HncWvcRTeI/Scg9Ef+nC8LMxhWT0=; b=JhadKCx5ltHyNixFKVz690vU25v4AuL0jLhZbItnxm1HdvdqxUvYrvYWQn3zToO0PJ JPNCzsKOBiiOXIGrxdcT6BOuLXkEsodYI1i+pbPDSi/KcfWsE7l62kGD/aXLjVSXCFU5 aVO3wGf8Cnv0epGl64er93wEK8PYItHBOv5hQjR+kekJEVWbiIXgoMrq/PRQVcW8q0yw 9QKfT73f3IjoCF+GZHmRLQCBj6RvM148kdwoPp1yuWNX6fght6rJrHN4mYR8JHBTiqUj pw4gnmxrYmZuuhywoQ9I0mFIJdDr57eHZRjLQn0XAwkTAQKsrjP2Gk6YPy4/IR7Cjime LbaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729250620; x=1729855420; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0C1fcXvDQYwQGR1HncWvcRTeI/Scg9Ef+nC8LMxhWT0=; b=u7zoyBNSaCFkjVhp0yHoeKVJKJEiyOvPEn4IAYuabSCRewFffLbAX9j6tIdaU1Zim/ E6QYU29pijVPwmb/vmosMvnpFbuq2dt3GHSuCHsxULAJV4XfmVMjdYtOZ//reeEglj6A DxAfuKACUW5qPHuoRSMjsgJloEpaDyacmBdqOwmk9QNY2y4EpcsQ/CKLN/WGWED/9u0Q vQGpHNwECxTazK/7MCE8ZMRL1Mf4Os+NhSoJEy/0ENSApcdELdCyNNozM6dYPw5v7QGf VMMeD9S6vsMngKR5AT7fqXN2H0D72PtGp2zsn4EDkb4kVR3QnsmgRMyl2CRnpkn5uzfF UB8g== X-Gm-Message-State: AOJu0YwdM8rXw5VlUdIfs1qLJzQ9SOwNtG0+aSI9qeL0mUkWa+kVvL6i UDXxOLCtBOs1LlBa2aC6Te7AFrIjkPaZ8A3EMGU3/8qeEURHibGSLkbaog== X-Google-Smtp-Source: AGHT+IEwtJvJho72/vHjwKYi4sS01Z+KT8qPQJMYwM1unvEPKhwVgMFFNs2/LvoUOqz6EuMZCYJHew== X-Received: by 2002:a17:902:e74b:b0:20c:c631:d81f with SMTP id d9443c01a7336-20e5a8d2c95mr23073725ad.21.1729250619836; Fri, 18 Oct 2024 04:23:39 -0700 (PDT) Received: from spacetime.neon-universe.ts.net ([49.207.234.207]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20e5a71ebe1sm10720055ad.18.2024.10.18.04.23.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 04:23:39 -0700 (PDT) From: Avinal Kumar To: libc-alpha@sourceware.org Cc: Avinal Kumar Subject: [PATCH] stdio-common: Fix scanf parsing for NaN types [BZ #30647] Date: Fri, 18 Oct 2024 16:50:26 +0530 Message-ID: <20241018112325.1515086-1-avinal.xlvii@gmail.com> X-Mailer: git-send-email 2.47.0 MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org The scanf family of functions like sscanf and fscanf currently ignore nan() and nan(n-char-sequence). This happens because __vfscanf_internal only checks for 'nan'. This commit adds support for all valid nan types i.e. nan, nan() and nan(n-char-sequence), where n-char-sequence can be [a-zA-Z0-9_]+, thus fixing the bug 30647. Any other representation of NaN should result in conversion error. New tests are also added to verify the correct parsing of NaN types. Signed-off-by: Avinal Kumar --- Please refer https://sourceware.org/bugzilla/show_bug.cgi?id=30647 stdio-common/Makefile | 1 + stdio-common/tst-scanf-nan.c | 81 +++++++++++++++++++++++++++++++++ stdio-common/vfscanf-internal.c | 49 +++++++++++++++++++- 3 files changed, 130 insertions(+), 1 deletion(-) create mode 100644 stdio-common/tst-scanf-nan.c diff --git a/stdio-common/Makefile b/stdio-common/Makefile index 88105b3c1b..a166eb7cf8 100644 --- a/stdio-common/Makefile +++ b/stdio-common/Makefile @@ -261,6 +261,7 @@ tests := \ tst-scanf-binary-gnu89 \ tst-scanf-bz27650 \ tst-scanf-intn \ + tst-scanf-nan \ tst-scanf-round \ tst-scanf-to_inpunct \ tst-setvbuf1 \ diff --git a/stdio-common/tst-scanf-nan.c b/stdio-common/tst-scanf-nan.c new file mode 100644 index 0000000000..53658ecc9a --- /dev/null +++ b/stdio-common/tst-scanf-nan.c @@ -0,0 +1,81 @@ +/* Test scanf formats for nan, nan(), nan(n-char-sequence) types. + Copyright The GNU Toolchain Authors. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +#include + +#define CHECK_SCANF_RET(OK, STR, FMT, ...) \ + do \ + { \ + int ret = sscanf (STR, FMT, __VA_ARGS__); \ + TEST_VERIFY (ret == (OK)); \ + } \ + while (0) + +/* Valid nan types: + 1. nan + 2. nan() + 3. nan([a-zA-Z0-9_]+) + Any other nan format is invalid and should produce a conversion error. + The return value denotes the number of valid conversions. On conversion + error the rest of the input is discarded. */ +static int +do_test (void) +{ + double a, b, c; + int d; + + /* All valid inputs. */ + CHECK_SCANF_RET (1, "nan", "%lf", &a); + CHECK_SCANF_RET (1, "nan()", "%lf", &a); + CHECK_SCANF_RET (1, "nan(12345)", "%lf", &a); + CHECK_SCANF_RET (2, "nan12", "%lf%d", &a, &d); + CHECK_SCANF_RET (2, "nan nan()", "%lf%lf", &a, &b); + CHECK_SCANF_RET (2, "nan nan(12345foo)", "%lf%lf", &a, &b); + CHECK_SCANF_RET (3, "nan nan() 12.234", "%lf%lf%lf", &a, &b, &c); + CHECK_SCANF_RET (4, "nannan()nan(foo)1234", "%lf%lf%lf%d", &a, &b, &c, &d); + + /* Partially valid inputs. */ + CHECK_SCANF_RET (1, "nan( )", "%3lf", &a); + CHECK_SCANF_RET (1, "nan nan(", "%lf%lf", &a, &b); + + /* Invalid inputs. */ + + /* Dangling parentheses. */ + CHECK_SCANF_RET (0, "nan(", "%lf", &a); + CHECK_SCANF_RET (0, "nan(12345", "%lf", &a); + CHECK_SCANF_RET (0, "nan(12345", "%lf%d", &a, &d); + + /* Field width is not sufficient for valid conversion. */ + CHECK_SCANF_RET (0, "nan()", "%4lf", &a); + + /* Space is not a valid character. */ + CHECK_SCANF_RET (0, "nan( )", "%lf", &a); + CHECK_SCANF_RET (0, "nan( )12.34", "%lf%lf", &a, &b); + CHECK_SCANF_RET (0, "nan(12 foo)", "%lf", &a); + + /* Period '.' is not a valid character. */ + CHECK_SCANF_RET (0, "nan(12.34) nan(FooBar)", "%lf%lf", &a, &b); + + return 0; +} + +#include diff --git a/stdio-common/vfscanf-internal.c b/stdio-common/vfscanf-internal.c index 1b82deffa7..e20048dded 100644 --- a/stdio-common/vfscanf-internal.c +++ b/stdio-common/vfscanf-internal.c @@ -2028,7 +2028,54 @@ digits_extended_fail: if (width > 0) --width; char_buffer_add (&charbuf, c); - /* It is "nan". */ + /* It is at least "nan". Now we check for nan() and + nan(n-char-sequence). */ + if (width != 0 && inchar () != EOF) + { + if (c == L_ ('(')) + { + if (width > 0) + --width; + char_buffer_add (&charbuf, c); + /* A '(' was observed, check for a closing ')', there + may or may not be a n-char-sequence in between. We + have to check the longest prefix until there is a + conversion error or closing parenthesis. */ + do + { + if (__builtin_expect (width == 0 + || inchar () == EOF, 0)) + { + /* Conversion error because we ran out of + characters. */ + conv_error (); + break; + } + if (!((c >= L_ ('0') && c <= L_ ('9')) + || (c >= L_ ('A') && c <= L_ ('Z')) + || (c >= L_ ('a') && c <= L_ ('z')) + || c == L_ ('_') || c == L_ (')'))) + { + /* Invalid character was observed. Only valid + characters are [a-zA-Z0-9_] and ')'. */ + conv_error (); + break; + } + if (width > 0) + --width; + char_buffer_add (&charbuf, c); + + /* The loop only exits successfully when ')' is the + last character. */ + if (c == L_ (')')) + break; + } + while (width != 0); + } + else + /* It is only 'nan'. */ + ungetc (c, s); + } goto scan_float; } else if (TOLOWER (c) == L_('i'))