diff mbox series

Document further requirement on mixing streams / file descriptors

Message ID 46008e45-db75-c168-70fe-3c5b5009a9b5@redhat.com
State New
Headers show
Series Document further requirement on mixing streams / file descriptors | expand

Commit Message

Joseph Myers Sept. 25, 2024, 9:28 p.m. UTC
The gilbc manual has some documentation in llio.texi of requirements
for moving between I/O on FILE * streams and file descriptors on the
same open file description.

The documentation of what must be done on a FILE * stream to move from
it to either a file descriptor or another FILE * for the same open
file description seems to match POSIX.  However, there is an
additional requirement in POSIX on the *second* of the two handles
being moved between, which is not mentioned in the glibc manual: "If
any previous active handle has been used by a function that explicitly
changed the file offset, except as required above for the first
handle, the application shall perform an lseek() or fseek() (as
appropriate to the type of handle) to an appropriate location.".

Document this requirement on seeking in the glibc manual.  Note that
I'm not sure what the "except as required above for the first handle"
is meant to be about, so I haven't documented anything for it.  As far
as I can tell, nothing specified for moving from the first handle
actually list calling a seek function as one of the steps to be done.
(Current POSIX doesn't seem to have any relevant rationale for this
section.  The rationale in the 1996 edition says "In requiring the
seek to an appropriate location for the new handle, the application is
required to know what it is doing if it is passing streams with seeks
involved.  If the required seek is not done, the results are undefined
(and in fact the program probably will not work on many common
implementations)." - which also doesn't help in understanding the
purpose of "except as required above for the first handle".)

Tested with "make info" and "make pdf".

Comments

Florian Weimer Sept. 26, 2024, 8:28 a.m. UTC | #1
* Joseph Myers:

> The gilbc manual has some documentation in llio.texi of requirements

Typo: g[lib]c
> diff --git a/manual/llio.texi b/manual/llio.texi
> index a035c3e20f..3ea5c352ee 100644
> --- a/manual/llio.texi
> +++ b/manual/llio.texi
> @@ -1097,6 +1097,21 @@ streams persist in other processes, their file positions become
>  undefined as a result.  To prevent this, you must clean up the streams
>  before destroying them.
>  
> +In addition to cleaning up a stream before doing I/O using another
> +linked channel, additional precautions are needed to ensure a
> +well-defined file position indicator in some cases.  If both the
> +following conditions hold, you must set the file position indicator on
> +the new channel (either a stream or a descriptor) using a function
> +such as @code{fseek} or @code{lseek}.
> +
> +@itemize @bullet
> +@item At least one of the old and new linked channels is a stream.
> +
> +@item The file position indicator was previously set (using the old
> +linked channel or a previous channel linked to it) with a function
> +such as @code{fseek} or @code{lseek}.
> +@end itemize

For context, this updates the Linked Channels subsection, which is about
channels with the same underlying file description.

I do not think this rules accurate.  The standard streams are linked
channels, typically with descriptors for the file description in the
parent process.  They are streams.  A freshly started program does not
know if another program seeked any of the descriptors before.  Does this
mean programs need to add fseek calls for the standard streams?  What if
those streams are not seekable?

I think we have a step missing in the cleaning process: the new channel
may indeed need seeking.  The current manual suggests that cleaning is
only needed on the old channel, but I don't think this is accurate, for
both linked and independent channels.  For example, an input stream may
have old file contents buffered.

Thanks,
Florian
Joseph Myers Sept. 26, 2024, 10:39 p.m. UTC | #2
On Thu, 26 Sep 2024, Florian Weimer wrote:

> I do not think this rules accurate.  The standard streams are linked
> channels, typically with descriptors for the file description in the
> parent process.  They are streams.  A freshly started program does not
> know if another program seeked any of the descriptors before.  Does this
> mean programs need to add fseek calls for the standard streams?  What if
> those streams are not seekable?

This seems like it might be an omission in the POSIX specification.

What I'd expect is: after execve, the standard streams are set up from the 
relevant file descriptors.  If the previous process seeked on a handle for 
that open file description, then it (possibly in the child after fork) 
must make the file descriptor active, including seeking on it to get a 
defined offset, but then after execve nothing more is needed regarding 
seeking on the stream (assuming that other processes aren't using the 
same open file description at the same time).

I can't however find anything in POSIX that says that this is what happens 
with handles for file descriptors 0, 1, 2 on execve (and, in particular, 
that the requirement to seek on the stream does not apply).  If we think 
this is what the semantics should be for glibc, we could still document it 
as such.

> I think we have a step missing in the cleaning process: the new channel
> may indeed need seeking.  The current manual suggests that cleaning is
> only needed on the old channel, but I don't think this is accurate, for
> both linked and independent channels.  For example, an input stream may
> have old file contents buffered.

For an input stream with old contents buffered (that is the new, linked 
handle), I think it would have been the active handle earlier, and so have 
needed to be cleaned when it ceased to be the active handle.  (In the case 
of independent channels, the manual already says "You should clean an 
input stream before reading data that may have been modified using an 
independent channel.  Otherwise, you might read obsolete data that had 
been in the stream's buffer.".)
Florian Weimer Sept. 30, 2024, 10:10 a.m. UTC | #3
* Joseph Myers:

> On Thu, 26 Sep 2024, Florian Weimer wrote:
>
>> I do not think this rules accurate.  The standard streams are linked
>> channels, typically with descriptors for the file description in the
>> parent process.  They are streams.  A freshly started program does not
>> know if another program seeked any of the descriptors before.  Does this
>> mean programs need to add fseek calls for the standard streams?  What if
>> those streams are not seekable?
>
> This seems like it might be an omission in the POSIX specification.
>
> What I'd expect is: after execve, the standard streams are set up from the 
> relevant file descriptors.  If the previous process seeked on a handle for 
> that open file description, then it (possibly in the child after fork) 
> must make the file descriptor active, including seeking on it to get a 
> defined offset, but then after execve nothing more is needed regarding 
> seeking on the stream (assuming that other processes aren't using the 
> same open file description at the same time).

I puzzled by this seeking requirement on the newly created descriptors.
Why would one have to seek after a dup on the new descriptor?  Do you
think that's relevant in a GNU/Linux context?  After all, the new
descriptor shares the underlying file description, and does not maintain
its offset.

>> I think we have a step missing in the cleaning process: the new channel
>> may indeed need seeking.  The current manual suggests that cleaning is
>> only needed on the old channel, but I don't think this is accurate, for
>> both linked and independent channels.  For example, an input stream may
>> have old file contents buffered.
>
> For an input stream with old contents buffered (that is the new, linked 
> handle), I think it would have been the active handle earlier, and so have 
> needed to be cleaned when it ceased to be the active handle.  (In the case 
> of independent channels, the manual already says "You should clean an 
> input stream before reading data that may have been modified using an 
> independent channel.  Otherwise, you might read obsolete data that had 
> been in the stream's buffer.".)

That's a good point.  Maybe it's possible to tweak the language you
proposed to apply only if the new stream was previously active?

The standard streams were not active before, so maybe that change is
sufficient to avoid the unnecessary requirement about seeking for them?

Thanks,
Florian
diff mbox series

Patch

diff --git a/manual/llio.texi b/manual/llio.texi
index a035c3e20f..3ea5c352ee 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -1097,6 +1097,21 @@  streams persist in other processes, their file positions become
 undefined as a result.  To prevent this, you must clean up the streams
 before destroying them.
 
+In addition to cleaning up a stream before doing I/O using another
+linked channel, additional precautions are needed to ensure a
+well-defined file position indicator in some cases.  If both the
+following conditions hold, you must set the file position indicator on
+the new channel (either a stream or a descriptor) using a function
+such as @code{fseek} or @code{lseek}.
+
+@itemize @bullet
+@item At least one of the old and new linked channels is a stream.
+
+@item The file position indicator was previously set (using the old
+linked channel or a previous channel linked to it) with a function
+such as @code{fseek} or @code{lseek}.
+@end itemize
+
 @node Independent Channels
 @subsection Independent Channels
 @cindex independent channels