Message ID | 20231108221638.37101-2-alx@kernel.org |
---|---|
State | New |
Headers | show |
Series | stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string | expand |
On 11/8/23 14:17, Alejandro Colomar wrote:
> These copy*from* a string
Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be
a string.
By the way, have you looked at the recent (i.e., this-year) changes to
the glibc manual's string section? They're relevant.
Paul Eggert <eggert@cs.ucla.edu> writes: > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be > a string. But it will be treated as one, for the purposes of this function.
Hi Paul, On Wed, Nov 08, 2023 at 03:06:40PM -0800, Paul Eggert wrote: > On 11/8/23 14:17, Alejandro Colomar wrote: > > These copy*from* a string > > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a > string. Pedantically, true. But since it's quite rare to copy from a fixed-width null-padded array into another, I didn't want to waste space on that and possibly confuse readers. In such a case, the source buffer must be at least as large as the destination buffer, and will likely be the same size (because having fixed-width stuff, why make it different), so memcpy(3) will probably be simpler. > > By the way, have you looked at the recent (i.e., this-year) changes to the > glibc manual's string section? They're relevant. I hadn't; after your message, I have. <https://sourceware.org/glibc/manual/2.38/html_mono/libc.html#String-and-Array-Utilities> I like how it connects all the functions, and it explains the concepts and gives advice (e.g., avoid truncation as it's usually evil), and compares the different functions. However, I think it misses a few things: - strncpy(3) and strncat(3) are not related at all. They don't have the same relation that strcpy(3) and strcat(3) have. You can't write the following code in any case: strncpy(dst, foo, sizeof(dst)); strncat(dst, bar, sizeof(dst)); as you would with strcpy(3) or strlcpy(3). strncpy(3) and strncat(3) are opposite functions: the former reads from a string and writes to a fixed-width null-padded buffer, and the latter reads from a fixed-width buffer and writes to a string. (You can use them in other cases, pedantically, as you said above, but those cases are rather unreal.) - strncpy(3) is in a section that starts by saying: > The functions described in this section copy or concatenate the > possibly-truncated contents of a string or array to another This may mislead programmers to believe it is useful for producing strings, when it's not. In general, I would like the manual to put some more distance between these functions and the term "string". As DJ mentioned, it might be useful to mention utmp(5) and tar(1) as niche use cases for st[rp]ncpy(3). And now for some typo: - In the following sentence under "5.2 String and Array Conventions": > The array arguments and return values for these functions have type > void * or wchar_t. I believe it meant `void *` or `wchar_t *` Cheers, Alex
On Wed, Nov 08, 2023 at 23:17:07 +0100, Alejandro Colomar wrote: > These copy *from* a string. But the destination is a simple character > sequence within an array; not a string. > > Suggested-by: DJ Delorie <dj@redhat.com> > Cc: Jonny Grant <jg@jguk.org> > Cc: Matthew House <mattlloydhouse@gmail.com> > Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> > Cc: Thorsten Kukuk <kukuk@suse.com> > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > Cc: Zack Weinberg <zack@owlfolio.org> > Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> > Cc: Carlos O'Donell <carlos@redhat.com> > Signed-off-by: Alejandro Colomar <alx@kernel.org> > --- I like the "with bytes from a string" wording. Good call. - Oskari > > Resending, including the mailing lists, which I forgot. > > man3/stpncpy.3 | 17 +++++++++++++---- > man7/string_copying.7 | 20 ++++++++++---------- > 2 files changed, 23 insertions(+), 14 deletions(-) > > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 > index b6bbfd0a3..f86ff8c29 100644 > --- a/man3/stpncpy.3 > +++ b/man3/stpncpy.3 > @@ -6,9 +6,8 @@ > .TH stpncpy 3 (date) "Linux man-pages (unreleased)" > .SH NAME > stpncpy, strncpy > -\- zero a fixed-width buffer and > -copy a string into a character sequence with truncation > -and zero the rest of it > +\- > +fill a fixed-width null-padded buffer with bytes from a string > .SH LIBRARY > Standard C library > .RI ( libc ", " \-lc ) > @@ -37,7 +36,7 @@ .SH SYNOPSIS > _GNU_SOURCE > .fi > .SH DESCRIPTION > -These functions copy the string pointed to by > +These functions copy bytes from the string pointed to by > .I src > into a null-padded character sequence at the fixed-width buffer pointed to by > .IR dst . > @@ -110,6 +109,16 @@ .SH CAVEATS > These functions produce a null-padded character sequence, > not a string (see > .BR string_copying (7)). > +For example: > +.P > +.in +4n > +.EX > +strncpy(buf, "1", 5); // { \[aq]1\[aq], 0, 0, 0, 0 } > +strncpy(buf, "1234", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], 0 } > +strncpy(buf, "12345", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } > +strncpy(buf, "123456", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } > +.EE > +.in > .P > It's impossible to distinguish truncation by the result of the call, > from a character sequence that just fits the destination buffer; > diff --git a/man7/string_copying.7 b/man7/string_copying.7 > index cadf1c539..0e179ba34 100644 > --- a/man7/string_copying.7 > +++ b/man7/string_copying.7 > @@ -41,15 +41,11 @@ .SS Strings > .\" ----- SYNOPSIS :: Null-padded character sequences --------/ > .SS Null-padded character sequences > .nf > -// Zero a fixed-width buffer, and > -// copy a string into a character sequence with truncation. > -.BI "char *stpncpy(char " dst "[restrict ." sz "], \ > +// Fill a fixed-width null-padded buffer with bytes from a string. > +.BI "char *strncpy(char " dst "[restrict ." sz "], \ > const char *restrict " src , > .BI " size_t " sz ); > -.P > -// Zero a fixed-width buffer, and > -// copy a string into a character sequence with truncation. > -.BI "char *strncpy(char " dst "[restrict ." sz "], \ > +.BI "char *stpncpy(char " dst "[restrict ." sz "], \ > const char *restrict " src , > .BI " size_t " sz ); > .P > @@ -240,14 +236,18 @@ .SS Truncate or not? > .\" ----- DESCRIPTION :: Null-padded character sequences --------------/ > .SS Null-padded character sequences > For historic reasons, > -some standard APIs, > +some standard APIs and file formats, > such as > -.BR utmpx (5), > +.BR utmpx (5) > +and > +.BR tar (1), > use null-padded character sequences in fixed-width buffers. > To interface with them, > specialized functions need to be used. > .P > -To copy strings into them, use > +To copy bytes from strings into these buffers, use > +.BR strncpy (3) > +or > .BR stpncpy (3). > .P > To copy from an unterminated string within a fixed-width buffer into a string, > -- > 2.42.0
On 08/11/2023 23:06, Paul Eggert wrote: > On 11/8/23 14:17, Alejandro Colomar wrote: >> These copy*from* a string > > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string. > > By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant. That's a great reference page Paul, lots of useful information in the manual. https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html Re this man page: https://man7.org/linux/man-pages/man3/string.3.html Obsolete functions char *strncpy(char dest[restrict .n], const char src[restrict .n], size_t n); Copy at most n bytes from string src to dest, returning a pointer to the start of dest. It could clarify "Copy at most n bytes from string src to ARRAY dest, returning a pointer to the start of ARRAY dest." (caps for my emphasis in this email) Kind regards Jonny
Hi Jonny, On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote: > On 08/11/2023 23:06, Paul Eggert wrote: > > On 11/8/23 14:17, Alejandro Colomar wrote: > >> These copy*from* a string > > > > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string. > > > > By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant. > > That's a great reference page Paul, lots of useful information in the manual. > https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html > > Re this man page: > > https://man7.org/linux/man-pages/man3/string.3.html > > Obsolete functions > char *strncpy(char dest[restrict .n], const char src[restrict .n], > size_t n); > Copy at most n bytes from string src to dest, returning a > pointer to the start of dest. Uh, I forgot about that page. I'll have a look at it and update it. At least, I need to remove that "Obsolete functions". > > > It could clarify > "Copy at most n bytes from string src to ARRAY dest, returning a > pointer to the start of ARRAY dest." I think I prefer DJ's suggestion: "Fill a fixed‐width null‐padded buffer with bytes from a string." Thanks! Alex > > (caps for my emphasis in this email) > > Kind regards > Jonny
On 09/11/2023 14:35, Alejandro Colomar wrote: > Hi Jonny, > > On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote: >> On 08/11/2023 23:06, Paul Eggert wrote: >>> On 11/8/23 14:17, Alejandro Colomar wrote: >>>> These copy*from* a string >>> >>> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string. >>> >>> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant. >> >> That's a great reference page Paul, lots of useful information in the manual. >> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html >> >> Re this man page: >> >> https://man7.org/linux/man-pages/man3/string.3.html >> >> Obsolete functions >> char *strncpy(char dest[restrict .n], const char src[restrict .n], >> size_t n); >> Copy at most n bytes from string src to dest, returning a >> pointer to the start of dest. > > Uh, I forgot about that page. I'll have a look at it and update it. At > least, I need to remove that "Obsolete functions". > >> >> >> It could clarify >> "Copy at most n bytes from string src to ARRAY dest, returning a >> pointer to the start of ARRAY dest." > > I think I prefer DJ's suggestion: > > "Fill a fixed‐width null‐padded buffer with bytes from a string." Better to make it clear it's null-padded after? "Fill a fixed‐width buffer with bytes from a string and pad with null bytes." I'll leave it with you. Kind regards Jonny
On Thu, Nov 09, 2023 at 02:47:05PM +0000, Jonny Grant wrote: > >> It could clarify > >> "Copy at most n bytes from string src to ARRAY dest, returning a > >> pointer to the start of ARRAY dest." > > > > I think I prefer DJ's suggestion: > > > > "Fill a fixed‐width null‐padded buffer with bytes from a string." > > Better to make it clear it's null-padded after? > > "Fill a fixed‐width buffer with bytes from a string and pad with null bytes." Yes, that looks even better. And I wasn't very happy with "bytes". Maybe: "Fill a fixed-width buffer with characters from a string and pad with null bytes." Thanks, Alex > > I'll leave it with you. > > Kind regards > Jonny
Alejandro Colomar <alx@kernel.org> writes: > "Fill a fixed-width buffer with characters from a string and pad with > null bytes." The pedant in me says it should be NUL bytes (or NUL's), not null bytes. nul/NUL is a character, null/NULL is a pointer.
On Nov 09 2023, DJ Delorie wrote: > The pedant in me says it should be NUL bytes (or NUL's), not null bytes. > nul/NUL is a character, null/NULL is a pointer. NUL is the ASCII abbreviation for Null (see RFC 20).
Hi DJ, On Thu, Nov 09, 2023 at 12:30:17PM -0500, DJ Delorie wrote: > Alejandro Colomar <alx@kernel.org> writes: > > "Fill a fixed-width buffer with characters from a string and pad with > > null bytes." > > The pedant in me says it should be NUL bytes (or NUL's), not null bytes. > nul/NUL is a character, null/NULL is a pointer. Here's what man-pages(7) (written by Michael Kerrisk) says: NULL, NUL, null pointer, and null byte A null pointer is a pointer that points to nothing, and is nor‐ mally indicated by the constant NULL. On the other hand, NUL is the null byte, a byte with the value 0, represented in C via the character constant '\0'. The preferred term for the pointer is "null pointer" or simply "NULL"; avoid writing "NULL pointer". The preferred term for the byte is "null byte". Avoid writing "NUL", since it is too easily confused with "NULL". Avoid also the terms "zero byte" and "null character". The byte that termi‐ nates a C string should be described as "the terminating null byte"; strings may be described as "null‐terminated", but avoid the use of "NUL‐terminated". I don't necessarily agree with all of that, but mostly. I don't agree with not saying null character, because as well as we have the null wide character (L'\0'), using null character for '\0' makes it symmetric. Other than that, I mostly agree with Michael. Here's what I think of these terms: - NULL is a null pointer constant (as well as 0 is another null pointer constant). - A null pointer is a more generic term that includes a run-time null pointer as well. - The null byte is 0. - The null character, '\0', is composed of a null byte. - The null wide character, L'\0' is composed of several null bytes. - NUL is the ASCII name of the null byte, or maybe is it null character here? It's a bit muddy. I use null byte for padding, and null character for the string terminator, to make a stronger difference between strings and null-padded fixed-width arrays. I need to review string_copying(7) to make sure I was consistent in this regard. Colloquially, I find it fine to write NULL instead of null pointer (even for non-constant cases), and NUL instead of any of "null character", "null byte", or "null wide character", but for being precise, I prefer "null something". Cheers, Alex
On 09/11/2023 17:30, DJ Delorie wrote: > Alejandro Colomar <alx@kernel.org> writes: >> "Fill a fixed-width buffer with characters from a string and pad with >> null bytes." > > The pedant in me says it should be NUL bytes (or NUL's), not null bytes. > nul/NUL is a character, null/NULL is a pointer. > NUL would be a big improvement. Kind regards, Jonny
diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 index b6bbfd0a3..f86ff8c29 100644 --- a/man3/stpncpy.3 +++ b/man3/stpncpy.3 @@ -6,9 +6,8 @@ .TH stpncpy 3 (date) "Linux man-pages (unreleased)" .SH NAME stpncpy, strncpy -\- zero a fixed-width buffer and -copy a string into a character sequence with truncation -and zero the rest of it +\- +fill a fixed-width null-padded buffer with bytes from a string .SH LIBRARY Standard C library .RI ( libc ", " \-lc ) @@ -37,7 +36,7 @@ .SH SYNOPSIS _GNU_SOURCE .fi .SH DESCRIPTION -These functions copy the string pointed to by +These functions copy bytes from the string pointed to by .I src into a null-padded character sequence at the fixed-width buffer pointed to by .IR dst . @@ -110,6 +109,16 @@ .SH CAVEATS These functions produce a null-padded character sequence, not a string (see .BR string_copying (7)). +For example: +.P +.in +4n +.EX +strncpy(buf, "1", 5); // { \[aq]1\[aq], 0, 0, 0, 0 } +strncpy(buf, "1234", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], 0 } +strncpy(buf, "12345", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } +strncpy(buf, "123456", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } +.EE +.in .P It's impossible to distinguish truncation by the result of the call, from a character sequence that just fits the destination buffer; diff --git a/man7/string_copying.7 b/man7/string_copying.7 index cadf1c539..0e179ba34 100644 --- a/man7/string_copying.7 +++ b/man7/string_copying.7 @@ -41,15 +41,11 @@ .SS Strings .\" ----- SYNOPSIS :: Null-padded character sequences --------/ .SS Null-padded character sequences .nf -// Zero a fixed-width buffer, and -// copy a string into a character sequence with truncation. -.BI "char *stpncpy(char " dst "[restrict ." sz "], \ +// Fill a fixed-width null-padded buffer with bytes from a string. +.BI "char *strncpy(char " dst "[restrict ." sz "], \ const char *restrict " src , .BI " size_t " sz ); -.P -// Zero a fixed-width buffer, and -// copy a string into a character sequence with truncation. -.BI "char *strncpy(char " dst "[restrict ." sz "], \ +.BI "char *stpncpy(char " dst "[restrict ." sz "], \ const char *restrict " src , .BI " size_t " sz ); .P @@ -240,14 +236,18 @@ .SS Truncate or not? .\" ----- DESCRIPTION :: Null-padded character sequences --------------/ .SS Null-padded character sequences For historic reasons, -some standard APIs, +some standard APIs and file formats, such as -.BR utmpx (5), +.BR utmpx (5) +and +.BR tar (1), use null-padded character sequences in fixed-width buffers. To interface with them, specialized functions need to be used. .P -To copy strings into them, use +To copy bytes from strings into these buffers, use +.BR strncpy (3) +or .BR stpncpy (3). .P To copy from an unterminated string within a fixed-width buffer into a string,
These copy *from* a string. But the destination is a simple character sequence within an array; not a string. Suggested-by: DJ Delorie <dj@redhat.com> Cc: Jonny Grant <jg@jguk.org> Cc: Matthew House <mattlloydhouse@gmail.com> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> Cc: Thorsten Kukuk <kukuk@suse.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Zack Weinberg <zack@owlfolio.org> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- Resending, including the mailing lists, which I forgot. man3/stpncpy.3 | 17 +++++++++++++---- man7/string_copying.7 | 20 ++++++++++---------- 2 files changed, 23 insertions(+), 14 deletions(-)