diff mbox series

[v3] ungetc: Guarantee single char pushback

Message ID 20241206200400.3987637-1-siddhesh@sourceware.org
State New
Headers show
Series [v3] ungetc: Guarantee single char pushback | expand

Commit Message

Siddhesh Poyarekar Dec. 6, 2024, 8:04 p.m. UTC
The C standard requires that ungetc guarantees at least one pushback, so
put a single byte pushback buffer in the FILE struct to enable that.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
---
Changes from v2:
- Fixed nits
- Used _IO_free_backup_buf in oldfileops and wfileops as well.
- Enhanced test to try some more cases

Changes from v1:

- Drop ungetwc from scope of the patchset
- Fixed nits
- Retain old behaviour for legacy applications
- Minimize changes to fileops
- Namespace-ize free_backup_buf
- Add a test to verify that the subsequent malloc failure results in ungetc
  failure too
- Add GNU Toolchain Authors copyright notice.

 libio/bits/types/struct_FILE.h  |   4 +-
 libio/fileops.c                 |   7 +-
 libio/genops.c                  |  26 ++++++--
 libio/libioP.h                  |   3 +
 libio/oldfileops.c              |   5 +-
 libio/wfileops.c                |   3 +-
 stdio-common/Makefile           |   2 +
 stdio-common/tst-ungetc-nomem.c | 115 ++++++++++++++++++++++++++++++++
 8 files changed, 152 insertions(+), 13 deletions(-)
 create mode 100644 stdio-common/tst-ungetc-nomem.c

Comments

Maciej W. Rozycki Dec. 9, 2024, 1:40 a.m. UTC | #1
On Fri, 6 Dec 2024, Siddhesh Poyarekar wrote:

> The C standard requires that ungetc guarantees at least one pushback, so
> put a single byte pushback buffer in the FILE struct to enable that.

 So what would the problem be if we instead replaced:

  int _flags2;

with

  short int _flags2;
  char _short_backupbuf[1];
  char _unused;

or if we wanted to be super cautious and used a union to prevent any issue 
with alignment with some obscure psABI (which I doubt is needed given that 
the preceding member is of the int type), then:

  union
    {
      int _overlay;
      struct
	{
	  short int _flags2;
	  char _short_backupbuf[1];
	  char _unused;
	};
    };

?  There's no need to change existing code that uses _flags2 and there's 
no need to penalise legacy calls.  I guess saving _unused2 space is hardly 
an argument, though I note that placing a char[1] member first will ask 
for rearrangement to avoid wasting space for padding once a wider member 
follows in the future.

 I've run native `powerpc64le-linux-gnu' regression testing and saw no 
issues with either patch attached applied on top of your change.

  Maciej
Siddhesh Poyarekar Dec. 9, 2024, 12:41 p.m. UTC | #2
On 2024-12-08 20:40, Maciej W. Rozycki wrote:
> On Fri, 6 Dec 2024, Siddhesh Poyarekar wrote:
> 
>> The C standard requires that ungetc guarantees at least one pushback, so
>> put a single byte pushback buffer in the FILE struct to enable that.
> 
>   So what would the problem be if we instead replaced:
> 
>    int _flags2;
> 
> with
> 
>    short int _flags2;
>    char _short_backupbuf[1];
>    char _unused;

I don't see an actual problem other than that of an application maybe 
using the upper bits of _flags2 for their own logic.  One could make the 
same argument for unused2, but the defence there is that there's 
precedent of struct expansion into unused2; the same can't be said about 
_flags2.

I personally don't care to retain such abuses, so it seems OK to shrink 
_flags2.  Florian, what do you think?

Thanks,
Sid
Florian Weimer Dec. 9, 2024, 12:53 p.m. UTC | #3
* Siddhesh Poyarekar:

> On 2024-12-08 20:40, Maciej W. Rozycki wrote:
>> On Fri, 6 Dec 2024, Siddhesh Poyarekar wrote:
>> 
>>> The C standard requires that ungetc guarantees at least one pushback, so
>>> put a single byte pushback buffer in the FILE struct to enable that.
>>   So what would the problem be if we instead replaced:
>>    int _flags2;
>> with
>>    short int _flags2;
>>    char _short_backupbuf[1];
>>    char _unused;
>
> I don't see an actual problem other than that of an application maybe
> using the upper bits of _flags2 for their own logic.  One could make
> the same argument for unused2, but the defence there is that there's
> precedent of struct expansion into unused2; the same can't be said
> about _flags2.
>
> I personally don't care to retain such abuses, so it seems OK to
> shrink _flags2.  Florian, what do you think?

From my perspective, either way is fine.

Thanks,
Florian
Andreas Schwab Dec. 9, 2024, 1:14 p.m. UTC | #4
On Dez 09 2024, Siddhesh Poyarekar wrote:

> On 2024-12-08 20:40, Maciej W. Rozycki wrote:
>> On Fri, 6 Dec 2024, Siddhesh Poyarekar wrote:
>> 
>>> The C standard requires that ungetc guarantees at least one pushback, so
>>> put a single byte pushback buffer in the FILE struct to enable that.
>>   So what would the problem be if we instead replaced:
>>    int _flags2;
>> with
>>    short int _flags2;
>>    char _short_backupbuf[1];
>>    char _unused;
>
> I don't see an actual problem other than that of an application maybe
> using the upper bits of _flags2 for their own logic.  One could make the
> same argument for unused2, but the defence there is that there's precedent
> of struct expansion into unused2; the same can't be said about _flags2.

There is also the question of the probability that we eventually need
more than 16 flags here.
Siddhesh Poyarekar Dec. 9, 2024, 3:05 p.m. UTC | #5
On 2024-12-09 08:14, Andreas Schwab wrote:
> On Dez 09 2024, Siddhesh Poyarekar wrote:
> 
>> On 2024-12-08 20:40, Maciej W. Rozycki wrote:
>>> On Fri, 6 Dec 2024, Siddhesh Poyarekar wrote:
>>>
>>>> The C standard requires that ungetc guarantees at least one pushback, so
>>>> put a single byte pushback buffer in the FILE struct to enable that.
>>>    So what would the problem be if we instead replaced:
>>>     int _flags2;
>>> with
>>>     short int _flags2;
>>>     char _short_backupbuf[1];
>>>     char _unused;
>>
>> I don't see an actual problem other than that of an application maybe
>> using the upper bits of _flags2 for their own logic.  One could make the
>> same argument for unused2, but the defence there is that there's precedent
>> of struct expansion into unused2; the same can't be said about _flags2.
> 
> There is also the question of the probability that we eventually need
> more than 16 flags here.

Maciej's argument (in a previous thread) was that the probability of 
that is low enough that we need not care.  Do you disagree?

Sid
Maciej W. Rozycki Dec. 10, 2024, 1:08 p.m. UTC | #6
On Mon, 9 Dec 2024, Siddhesh Poyarekar wrote:

> > > >    So what would the problem be if we instead replaced:
> > > >     int _flags2;
> > > > with
> > > >     short int _flags2;
> > > >     char _short_backupbuf[1];
> > > >     char _unused;
> > > 
> > > I don't see an actual problem other than that of an application maybe
> > > using the upper bits of _flags2 for their own logic.  One could make the

 I don't thing an app is allowed to poke at FILE in the first place, it's 
supposed to be an opaque data type.  We have a note at the top to this 
effect (in the first sentence):

/* Caution: The contents of this file are not part of the official
   stdio.h API.  However, much of it is part of the official *binary*
   interface, and therefore cannot be changed.  */

It's not clear to me what "much of it" refers to in the second, except 
(obviously) for the size of the structure.

> > > same argument for unused2, but the defence there is that there's precedent
> > > of struct expansion into unused2; the same can't be said about _flags2.
> > 
> > There is also the question of the probability that we eventually need
> > more than 16 flags here.
> 
> Maciej's argument (in a previous thread) was that the probability of that is
> low enough that we need not care.  Do you disagree?

 Also I don't think the flags need to be contiguous.  It's not clear to me 
even why we're using plain integers and explicit masks in the first place 
rather than bit-fields, which I think would make code just a little bit 
cleaner.

 My only concern would be if we ever needed to consume a FILE data object 
produced by a different version of glibc, in which case we'd have to vary 
the new member ordering by the endianness.  I don't think we need, though; 
the only case I can think of would be pulling a different version of libc 
via dlopen from a static binary and passing a FILE reference to it, but we 
don't support such a scenario AFAIK.

 Have I missed anything here?

  Maciej
Andreas Schwab Dec. 10, 2024, 1:21 p.m. UTC | #7
On Dez 10 2024, Maciej W. Rozycki wrote:

>  I don't thing an app is allowed to poke at FILE in the first place, it's 
> supposed to be an opaque data type.

Right, except through the macros provided by <stdio.h>, the only reason
why the details are visible.
Maciej W. Rozycki Dec. 10, 2024, 2:48 p.m. UTC | #8
On Tue, 10 Dec 2024, Andreas Schwab wrote:

> >  I don't thing an app is allowed to poke at FILE in the first place, it's 
> > supposed to be an opaque data type.
> 
> Right, except through the macros provided by <stdio.h>, the only reason
> why the details are visible.

 Indeed, so these parts have become a part of the ABI.  This has fixed 
`_IO_read_ptr', `_IO_read_end', `_IO_write_ptr', `_IO_write_end', and 
`_flags' AFAICT, but not `_flags2' or anything else.

  Maciej
diff mbox series

Patch

diff --git a/libio/bits/types/struct_FILE.h b/libio/bits/types/struct_FILE.h
index d8d26639d1..5d08509078 100644
--- a/libio/bits/types/struct_FILE.h
+++ b/libio/bits/types/struct_FILE.h
@@ -1,4 +1,5 @@ 
 /* Copyright (C) 1991-2024 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -94,8 +95,9 @@  struct _IO_FILE_complete
   void *_freeres_buf;
   struct _IO_FILE **_prevchain;
   int _mode;
+  char _short_backupbuf[1];
   /* Make sure we don't get into trouble again.  */
-  char _unused2[15 * sizeof (int) - 5 * sizeof (void *)];
+  char _unused2[15 * sizeof (int) - 5 * sizeof (void *) - sizeof (char)];
 };
 
 /* These macros are used by bits/stdio.h and internal headers.  */
diff --git a/libio/fileops.c b/libio/fileops.c
index 759d737ec7..d49e489f55 100644
--- a/libio/fileops.c
+++ b/libio/fileops.c
@@ -1,4 +1,5 @@ 
 /* Copyright (C) 1993-2024 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -480,7 +481,7 @@  _IO_new_file_underflow (FILE *fp)
       /* Maybe we already have a push back pointer.  */
       if (fp->_IO_save_base != NULL)
 	{
-	  free (fp->_IO_save_base);
+	  _IO_free_backup_buf (fp, fp->_IO_save_base);
 	  fp->_flags &= ~_IO_IN_BACKUP;
 	}
       _IO_doallocbuf (fp);
@@ -932,7 +933,7 @@  _IO_new_file_seekoff (FILE *fp, off64_t offset, int dir, int mode)
       /* It could be that we already have a pushback buffer.  */
       if (fp->_IO_read_base != NULL)
 	{
-	  free (fp->_IO_read_base);
+	  _IO_free_backup_buf (fp, fp->_IO_read_base);
 	  fp->_flags &= ~_IO_IN_BACKUP;
 	}
       _IO_doallocbuf (fp);
@@ -1282,7 +1283,7 @@  _IO_file_xsgetn (FILE *fp, void *data, size_t n)
       /* Maybe we already have a push back pointer.  */
       if (fp->_IO_save_base != NULL)
 	{
-	  free (fp->_IO_save_base);
+	  _IO_free_backup_buf (fp, fp->_IO_save_base);
 	  fp->_flags &= ~_IO_IN_BACKUP;
 	}
       _IO_doallocbuf (fp);
diff --git a/libio/genops.c b/libio/genops.c
index d7e35e67d5..1a782508e4 100644
--- a/libio/genops.c
+++ b/libio/genops.c
@@ -1,4 +1,5 @@ 
 /* Copyright (C) 1993-2024 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -48,6 +49,13 @@  flush_cleanup (void *not_used)
 }
 #endif
 
+void
+_IO_free_backup_buf (FILE *fp, char *ptr)
+{
+  if (_IO_vtable_offset (fp) == 0 && ptr != fp->_short_backupbuf)
+    free (ptr);
+}
+
 /* Fields in struct _IO_FILE after the _lock field are internal to
    glibc and opaque to applications.  We can change them as long as
    the size of struct _IO_FILE is unchanged, which is checked as the
@@ -212,7 +220,7 @@  _IO_free_backup_area (FILE *fp)
 {
   if (_IO_in_backup (fp))
     _IO_switch_to_main_get_area (fp);  /* Just in case. */
-  free (fp->_IO_save_base);
+  _IO_free_backup_buf (fp, fp->_IO_save_base);
   fp->_IO_save_base = NULL;
   fp->_IO_save_end = NULL;
   fp->_IO_backup_base = NULL;
@@ -260,7 +268,7 @@  save_for_backup (FILE *fp, char *end_p)
 	memcpy (new_buffer + avail,
 		fp->_IO_read_base + least_mark,
 		needed_size);
-      free (fp->_IO_save_base);
+      _IO_free_backup_buf (fp, fp->_IO_save_base);
       fp->_IO_save_base = new_buffer;
       fp->_IO_save_end = new_buffer + avail + needed_size;
     }
@@ -636,7 +644,7 @@  _IO_default_finish (FILE *fp, int dummy)
 
   if (fp->_IO_save_base)
     {
-      free (fp->_IO_save_base);
+      _IO_free_backup_buf (fp, fp->_IO_save_base);
       fp->_IO_save_base = NULL;
     }
 
@@ -998,11 +1006,17 @@  _IO_default_pbackfail (FILE *fp, int c)
 	  else if (!_IO_have_backup (fp))
 	    {
 	      /* No backup buffer: allocate one. */
-	      /* Use nshort buffer, if unused? (probably not)  FIXME */
 	      int backup_size = 128;
 	      char *bbuf = (char *) malloc (backup_size);
 	      if (bbuf == NULL)
-		return EOF;
+		{
+		  /* Guarantee a 1-char pushback, except for legacy code where
+		     we don't have the extended part of FILE.  */
+		  if (__glibc_unlikely (_IO_vtable_offset (fp) != 0))
+		    return EOF;
+		  bbuf = fp->_short_backupbuf;
+		  backup_size = 1;
+		}
 	      fp->_IO_save_base = bbuf;
 	      fp->_IO_save_end = fp->_IO_save_base + backup_size;
 	      fp->_IO_backup_base = fp->_IO_save_end;
@@ -1022,7 +1036,7 @@  _IO_default_pbackfail (FILE *fp, int c)
 	    return EOF;
 	  memcpy (new_buf + (new_size - old_size), fp->_IO_read_base,
 		  old_size);
-	  free (fp->_IO_read_base);
+	  _IO_free_backup_buf (fp, fp->_IO_read_base);
 	  _IO_setg (fp, new_buf, new_buf + (new_size - old_size),
 		    new_buf + new_size);
 	  fp->_IO_backup_base = fp->_IO_read_ptr;
diff --git a/libio/libioP.h b/libio/libioP.h
index 34bf91fcd8..90ef8e90be 100644
--- a/libio/libioP.h
+++ b/libio/libioP.h
@@ -1,4 +1,5 @@ 
 /* Copyright (C) 1993-2024 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -357,6 +358,8 @@  typedef FILE *_IO_ITER;
 
 /* Generic functions */
 
+extern void _IO_free_backup_buf (FILE *, char *);
+libc_hidden_proto (_IO_free_backup_buf)
 extern void _IO_switch_to_main_get_area (FILE *) __THROW;
 extern void _IO_switch_to_backup_area (FILE *) __THROW;
 extern int _IO_switch_to_get_mode (FILE *);
diff --git a/libio/oldfileops.c b/libio/oldfileops.c
index 8f775c9094..03f4d76a57 100644
--- a/libio/oldfileops.c
+++ b/libio/oldfileops.c
@@ -1,4 +1,5 @@ 
 /* Copyright (C) 1993-2024 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -311,7 +312,7 @@  _IO_old_file_underflow (FILE *fp)
       /* Maybe we already have a push back pointer.  */
       if (fp->_IO_save_base != NULL)
 	{
-	  free (fp->_IO_save_base);
+	  _IO_free_backup_buf (fp, fp->_IO_save_base);
 	  fp->_flags &= ~_IO_IN_BACKUP;
 	}
       _IO_doallocbuf (fp);
@@ -464,7 +465,7 @@  _IO_old_file_seekoff (FILE *fp, off64_t offset, int dir, int mode)
       /* It could be that we already have a pushback buffer.  */
       if (fp->_IO_read_base != NULL)
 	{
-	  free (fp->_IO_read_base);
+	  _IO_free_backup_buf (fp, fp->_IO_read_base);
 	  fp->_flags &= ~_IO_IN_BACKUP;
 	}
       _IO_doallocbuf (fp);
diff --git a/libio/wfileops.c b/libio/wfileops.c
index 16beab1f3a..a96bfa589b 100644
--- a/libio/wfileops.c
+++ b/libio/wfileops.c
@@ -1,4 +1,5 @@ 
 /* Copyright (C) 1993-2024 Free Software Foundation, Inc.
+   Copyright The GNU Toolchain Authors.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -175,7 +176,7 @@  _IO_wfile_underflow (FILE *fp)
       /* Maybe we already have a push back pointer.  */
       if (fp->_IO_save_base != NULL)
 	{
-	  free (fp->_IO_save_base);
+	  _IO_free_backup_buf (fp, fp->_IO_save_base);
 	  fp->_flags &= ~_IO_IN_BACKUP;
 	}
       _IO_doallocbuf (fp);
diff --git a/stdio-common/Makefile b/stdio-common/Makefile
index e76e40e587..b1a04fd064 100644
--- a/stdio-common/Makefile
+++ b/stdio-common/Makefile
@@ -1,4 +1,5 @@ 
 # Copyright (C) 1991-2024 Free Software Foundation, Inc.
+# Copyright The GNU Toolchain Authors.
 # This file is part of the GNU C Library.
 
 # The GNU C Library is free software; you can redistribute it and/or
@@ -303,6 +304,7 @@  tests := \
   tst-tmpnam \
   tst-ungetc \
   tst-ungetc-leak \
+  tst-ungetc-nomem \
   tst-unlockedio \
   tst-vfprintf-mbs-prec \
   tst-vfprintf-user-type \
diff --git a/stdio-common/tst-ungetc-nomem.c b/stdio-common/tst-ungetc-nomem.c
new file mode 100644
index 0000000000..49db33fff3
--- /dev/null
+++ b/stdio-common/tst-ungetc-nomem.c
@@ -0,0 +1,115 @@ 
+/* Test ungetc behavior with malloc failures.
+   Copyright The GNU Toolchain Authors.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/support.h>
+#include <support/temp_file.h>
+#include <support/xstdio.h>
+
+extern void *__libc_malloc (size_t)
+     __attribute__ ((malloc)) __attribute__ ((alloc_size (1)));
+
+static volatile bool fail = false;
+
+void *
+malloc (size_t sz)
+{
+  if (fail)
+    return NULL;
+
+  return __libc_malloc (sz);
+}
+
+static int
+do_test (void)
+{
+  char *filename = NULL;
+  struct stat props = {};
+  size_t bufsz = 0;
+
+  create_temp_file ("tst-ungetc-nomem.", &filename);
+  if (stat (filename, &props) != 0)
+    FAIL_EXIT1 ("Could not get file status: %m\n");
+
+  FILE *fp = fopen (filename, "w");
+
+  /* The libio buffer sizes are the same as block size.  */
+  bufsz = props.st_blksize + 2;
+
+  char *buf = xmalloc (bufsz);
+  memset (buf, 'a', bufsz);
+
+  if (fwrite (buf, sizeof (char), bufsz, fp) != bufsz)
+    FAIL_EXIT1 ("fwrite failed: %m\n");
+  xfclose (fp);
+
+  /* Begin test.  */
+  fp = xfopen (filename, "r");
+
+  while (!feof (fp))
+    {
+      /* Reset the pushback buffer state.  */
+      fseek (fp, 0, SEEK_CUR);
+
+      fail = true;
+      /* 1: First ungetc should always succeed, as the standard requires.  */
+      TEST_COMPARE (ungetc ('y', fp), 'y');
+
+      /* 2: This will result in resizing, which should fail.  */
+      TEST_COMPARE (ungetc ('w', fp), EOF);
+
+      /* 3: Now allow the resizing, which should immediately fill up the buffer
+         too, since this allocates only double the current buffer, i.e.
+         2-bytes.  */
+      fail = false;
+      TEST_COMPARE (ungetc ('x', fp), 'x');
+
+      /* 4: And fail again because this again forces an alloc, which fails.  */
+      fail = true;
+      TEST_COMPARE (ungetc ('x', fp), EOF);
+
+      /* 5: Enable allocations again so that we now get a 4-byte buffer.  Now
+         both calls should work.  */
+      fail = false;
+      TEST_COMPARE (ungetc ('x', fp), 'x');
+      fail = true;
+      TEST_COMPARE (ungetc ('x', fp), 'x');
+
+      /* Drain out the x's.  */
+      TEST_COMPARE (fgetc (fp), 'x');
+      TEST_COMPARE (fgetc (fp), 'x');
+      TEST_COMPARE (fgetc (fp), 'x');
+
+      /* Finally, drain out the first char we had pushed back, followed by one more char
+	 from the stream, if present.  */
+      TEST_COMPARE (fgetc (fp), 'y');
+      char c = fgetc (fp);
+      if (!feof (fp))
+	TEST_COMPARE (c, 'a');
+    }
+
+  /* Final sanity check before we're done.  */
+  TEST_COMPARE (ferror (fp), 0);
+  xfclose (fp);
+
+  return 0;
+}
+
+#include <support/test-driver.c>