Message ID | 20230622150021.3828261-1-lancasterharp@gmail.com |
---|---|
State | New |
Headers | show |
Series | Fix %Z parsing in strptime [BZ #16088] | expand |
On Jun 22 2023, Stanley Lancaster via Libc-alpha wrote: > %Z parsing terminated on space, should've terminated on next character in format string > > Author: Stanley Lancast <lancasterharp@gmail.com> > --- > time/strptime_l.c | 5 +++-- > time/tst-strptime.c | 1 + > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/time/strptime_l.c b/time/strptime_l.c > index 85c3249fcc..5f96795406 100644 > --- a/time/strptime_l.c > +++ b/time/strptime_l.c > @@ -772,8 +772,9 @@ __strptime_internal (const char *rp, const char *fmt, struct tm *tmp, > /* Read timezone but perform no conversion. */ > while (ISSPACE (*rp)) > rp++; > - while (!ISSPACE (*rp) && *rp != '\0') > - rp++; > + > + while (*rp != *(fmt) && *rp != '\0') > + rp++; The next character could be '%'.
---------- Forwarded message --------- From: Stanley Lancaster <lancasterharp@gmail.com> Date: Thu, Jun 22, 2023 at 10:51 AM Subject: Re: [PATCH] Fix %Z parsing in strptime [BZ #16088] To: Andreas Schwab <schwab@suse.de> Yes. I thought about this possibility. %Z does not provide enough information about what a "time zone name" looks like. The parser cannot know when a timezone name ends without the user telling them. If we have something like "%Z%B" as the format string, and "CSTJUN" as the input, the parser cannot have any idea if CSTJUN is the timezone name, or if CSTJ is the timezone name, or if CSTJU is the timezone name, and so on and so forth... There has to be some type of failure mode here to allow the programmer to understand that they need to provide some sort of terminator for %Z. I'm open to suggestions, if a different failure mode, besides simply parsing the input incorrectly would be appropriate. On Thu, Jun 22, 2023 at 10:21 AM Andreas Schwab <schwab@suse.de> wrote: > On Jun 22 2023, Stanley Lancaster via Libc-alpha wrote: > > > %Z parsing terminated on space, should've terminated on next character > in format string > > > > Author: Stanley Lancast <lancasterharp@gmail.com> > > --- > > time/strptime_l.c | 5 +++-- > > time/tst-strptime.c | 1 + > > 2 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/time/strptime_l.c b/time/strptime_l.c > > index 85c3249fcc..5f96795406 100644 > > --- a/time/strptime_l.c > > +++ b/time/strptime_l.c > > @@ -772,8 +772,9 @@ __strptime_internal (const char *rp, const char > *fmt, struct tm *tmp, > > /* Read timezone but perform no conversion. */ > > while (ISSPACE (*rp)) > > rp++; > > - while (!ISSPACE (*rp) && *rp != '\0') > > - rp++; > > + > > + while (*rp != *(fmt) && *rp != '\0') > > + rp++; > > The next character could be '%'. > > -- > Andreas Schwab, SUSE Labs, schwab@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different." >
On 2023-06-22 10:05, Stanley Lancaster via Libc-alpha wrote: > Yes. I thought about this possibility. %Z does not provide enough > information about what a "time zone name" looks like. tzdb and C-locale POSIX time zone names can contain only ASCII alphanumerics, "+", and "-", and must contain at least three characters. So you could parse only instances of [-+a-zA-Z0-9]{3,}. Although not perfect, that would be better than parsing until the next space. By the way, it was misleading for the glibc manual's strptime section to document %Z as "The timezone name". It's not a timezone name - it's a time zone abbreviation. The correct term is used in the strftime section. I installed the attached documentation patch to fix this issue where I found it in glibc.
I'll rework the patch to parse the string in this manner Thanks so much for your suggestions On Thu, Jun 22, 2023, 3:53 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 2023-06-22 10:05, Stanley Lancaster via Libc-alpha wrote: > > Yes. I thought about this possibility. %Z does not provide enough > > information about what a "time zone name" looks like. > > tzdb and C-locale POSIX time zone names can contain only ASCII > alphanumerics, "+", and "-", and must contain at least three characters. > So you could parse only instances of [-+a-zA-Z0-9]{3,}. Although not > perfect, that would be better than parsing until the next space. > > By the way, it was misleading for the glibc manual's strptime section to > document %Z as "The timezone name". It's not a timezone name - it's a > time zone abbreviation. The correct term is used in the strftime > section. I installed the attached documentation patch to fix this issue > where I found it in glibc.
* Paul Eggert: > From 21fbc0a19366f89638a30eef2b53c6d4baafdb88 Mon Sep 17 00:00:00 2001 > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Thu, 22 Jun 2023 13:44:50 -0700 > Subject: [PATCH] Call "CST" a time zone abbreviation, not a name > > In documentation, call strings like "CST" time zone abbreviations, not > time zone names. This terminology is more precise, and is what tzdb uses. > A string like "CST" is ambiguous and does not fully name a time zone. This patch is okay, thanks. Reviewed-by: Florian Weimer <fweimer@redhat.com> Florian
diff --git a/time/strptime_l.c b/time/strptime_l.c index 85c3249fcc..5f96795406 100644 --- a/time/strptime_l.c +++ b/time/strptime_l.c @@ -772,8 +772,9 @@ __strptime_internal (const char *rp, const char *fmt, struct tm *tmp, /* Read timezone but perform no conversion. */ while (ISSPACE (*rp)) rp++; - while (!ISSPACE (*rp) && *rp != '\0') - rp++; + + while (*rp != *(fmt) && *rp != '\0') + rp++; break; case 'z': /* We recognize four formats: diff --git a/time/tst-strptime.c b/time/tst-strptime.c index 3dae9e0594..6eb1af5c15 100644 --- a/time/tst-strptime.c +++ b/time/tst-strptime.c @@ -37,6 +37,7 @@ static const struct { "C", "2000-01-01", "%Y-%m-%d", 6, 0, 0, 1 }, { "C", "03/03/00", "%D", 5, 62, 2, 3 }, { "C", "9/9/99", "%x", 4, 251, 8, 9 }, + { "C", "CST-2000-01-01", "%Z-%Y-%m-%d", 6, 0, 0, 1}, /* Ensure %Z terminates properly */ { "C", "19990502123412", "%Y%m%d%H%M%S", 0, 121, 4, 2 }, { "C", "2001 20 Mon", "%Y %U %a", 1, 140, 4, 21 }, { "C", "2001 21 Mon", "%Y %W %a", 1, 140, 4, 21 },