manual: Clarify the documentation of strverscmp [BZ #20524]

On 08/30/2016 12:06 AM, Michael Kerrisk wrote:

> s/version comparison/version-comparison/
>
>> +implementation is based on a finite state machine, whose behavior is
>
> s/finite state machine/finite-state machine/

Thank you for your corrections.

>>  @item
>> -fractional/fractional: the things become a bit more complex.
>> -If the common prefix contains only leading zeroes, the longest part is less
>> -than the other one; else the comparison behaves normally.
>> +Corresponding non-digit sequences in both strings are compared
>> +lexicographically.  If their lengths differ, the shorter non-digit
>
> Should this be "If their lengths differ, and the shorter string is
> equal to the corresponding prefix in the longer string..."?

I don't think it matters.  If the shorter sequence is not a prefix of 
the other sequence, the lexicographic ordering will find a difference 
before the extension character.

What about this?

@item
Corresponding non-digit sequences in both strings are compared
lexicographically if their lengths are equal.  If the lengths differ,
the shorter non-digit sequence is extended with the input string
character immediately following it (which can be the null terminator),
the other sequence is truncated to be of the same (extended) length, and
these two sequences are compared lexicographically.  In this last case,
the sequence comparison determines the result of the function because
the extension character (or some character before it) is necessarily
different from the character at the same offset in the other input
string.

>> +@item
>> +If the two digit sequences have no leading zeros, they are compared as
>> +integers, that is, the string with the longer digit sequence is deemed
>> +larger, and if both sequences are of equal length, they are compared
>> +lexicographically.
>> +
>> +@item
>> +If both digit sequences have an equal, positive number of leading zeros,
>
> Why is the word "positive" here?

It's used as a synonym for “non-zero” (to discriminate this case from 
the previous one).

>> +they are compared lexicographically.  If their length differs, another
>> +character is added to to the shorter sequence,
>
> "another character is added to to the shorter sequence" is vague. You
> want wording like you used above.

Is this better?

@item
If both digit sequences have an equal, positive number of leading zeros,
they are compared lexicographically if their lengths are the same.  If
the lengths differ, the shorter sequence is extended with the following
character in its input string, and the other sequence is truncated to
the same length, and both sequences are compared lexicographically
(similar to the non-digit sequence case above).

I'm attaching a new patch.

> I've not completely checked all of the details, but what you write
> certainly matches what I did check, and is a great deal better than
> the existing text. But, the algorithm it describes *is* strange.

The description is based on fuzz-strverscmp-alt.c, which is intended to 
run under a fuzzer to show that both implementations are equivalent. 
fuzz-strverscmp.c  was a previous attempt at resolving the state 
machine, but it was still not very clear.  fuzz-strverscmp3.c tests that 
the comparison is indeed a linear order.  fuzz-strverscmp3-old2.c 
demonstrates that this works because with a start file of 
\003\0031.11.21.3, it quickly finds the ordering violations in the old 
implementation (from  commit 4546646233574f321f9deedff928b980d82f4fc7).

Thanks,
Florian

manual: Clarify the documentation of strverscmp [BZ #20524]

Commit Message

Comments

Patch