From patchwork Mon Aug 29 20:21:45 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 663814 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sNNNN2QrWz9ryT for ; Tue, 30 Aug 2016 06:21:56 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b=J0DVdHqK; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:to:subject:mime-version:content-type :content-transfer-encoding:message-id:from; q=dns; s=default; b= Qibj9nUWyffAbIX3kCZpk75+s2UjFRm2A5rz0bp+6d3luyHLZoKejIgSumThHeWC OI4+DM5cWWu/YyOc/j3U4+4eqwZp3bK3g47Flj8vvakdQa028xJxNkZknv1Kn/Wg o6iPCsrO+nxy3H4ORYEm0VdKzcM+9v1lOSzbPx1/t+4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:to:subject:mime-version:content-type :content-transfer-encoding:message-id:from; s=default; bh=YaWTq9 1QPhygRnSktcLi9ph3BDE=; b=J0DVdHqKonvU+bV1oXn5NhF8WiP/iewFMZSL4x jtvypQndY3CNR0y3bafMq2da9L1GBUuj0xP6ZruN03HQoBvKTZ6i5p4S0OoxJJ/z CK5GGs5awL63zHCKNBBHviPtheHdXMmVHm3MMxg/y5xYWGdNOA1z6Qu/BCjCZc+E ahOI0= Received: (qmail 40536 invoked by alias); 29 Aug 2016 20:21:50 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 40519 invoked by uid 89); 29 Aug 2016 20:21:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.4 required=5.0 tests=BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=integers, lengths, sk:lexicog, shorter X-HELO: mx1.redhat.com Date: Mon, 29 Aug 2016 22:21:45 +0200 To: libc-alpha@sourceware.org Subject: [PATCH] manual: Clarify the documentation of strverscmp [BZ #20524] User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Message-Id: <20160829202145.A2548403DC4A9@oldenburg.str.redhat.com> From: fweimer@redhat.com (Florian Weimer) 2016-08-29 Florian Weimer [BZ #20524] * manual/string.texi (String/Array Comparison): Clarify the strverscmp behavior. diff --git a/manual/string.texi b/manual/string.texi index bce81a7..3577017 100644 --- a/manual/string.texi +++ b/manual/string.texi @@ -1374,46 +1374,70 @@ The @code{strverscmp} function compares the string @var{s1} against @var{s2}, considering them as holding indices/version numbers. The return value follows the same conventions as found in the @code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no -digits, @code{strverscmp} behaves like @code{strcmp}. +digits, @code{strverscmp} behaves like @code{strcmp} +(in the sense that the sign of the result is the same). -Basically, we compare strings normally (byte by byte), until -we find a digit in each string - then we enter a special comparison -mode, where each sequence of digits is taken as a whole. If we reach the -end of these two parts without noticing a difference, we return to the -standard comparison mode. There are two types of numeric parts: -"integral" and "fractional" (those begin with a '0'). The types -of the numeric parts affect the way we sort them: +The comparison algorithm which the @code{strverscmp} function implements +differs slightly from other version comparison algorithms. The +implementation is based on a finite state machine, whose behavior is +approximated below. @itemize @bullet @item -integral/integral: we compare values as you would expect. +The input strings are each split into sequences of non-digits and +digits. These sequences can be empty at the beginning and end of the +string. Digits are determined by the @code{isdigit} function and are +thus subject to the current locale. @item -fractional/integral: the fractional part is less than the integral one. -Again, no surprise. +Comparison starts with a (possibly empty) non-digit sequence. The first +non-equal sequences of non-digits or digits determines the outcome of +the comparison. @item -fractional/fractional: the things become a bit more complex. -If the common prefix contains only leading zeroes, the longest part is less -than the other one; else the comparison behaves normally. +Corresponding non-digit sequences in both strings are compared +lexicographically. If their lengths differ, the shorter non-digit +sequence is amended with the input string character immediately +following it (which can be the null terminator). This character is +compared against the non-digit character in the other sequence, at the +same string offset, and their difference determines the outcome of the +comparison. + +@item +For two digit sequences, the number of leading zeros is counted (which +can be zero). If the count differs, the string with more leading zeros +in the digit sequence is considered smaller than the other string. + +@item +If the two digit sequences have no leading zeros, they are compared as +integers, that is, the string with the longer digit sequence is deemed +larger, and if both sequences are of equal length, they are compared +lexicographically. + +@item +If both digit sequences have an equal, positive number of leading zeros, +they are compared lexicographically. If their length differs, another +character is added to to the shorter sequence, to determine the outcome +of the comparison, as described above for the non-digit sequence case. @end itemize +The treatment of leading zeros and the tie-breaking characters (which in +effect propagate across non-digit/digit sequence boundaries) differ from +other version comparison algorithms. + @smallexample strverscmp ("no digit", "no digit") @result{} 0 /* @r{same behavior as strcmp.} */ strverscmp ("item#99", "item#100") @result{} <0 /* @r{same prefix, but 99 < 100.} */ strverscmp ("alpha1", "alpha001") - @result{} >0 /* @r{fractional part inferior to integral one.} */ + @result{} >0 /* @r{different number of leading zeros (0 and 2).} */ strverscmp ("part1_f012", "part1_f01") - @result{} >0 /* @r{two fractional parts.} */ + @result{} >0 /* @r{lexicographical comparison with leading zeros.} */ strverscmp ("foo.009", "foo.0") - @result{} <0 /* @r{idem, but with leading zeroes only.} */ + @result{} <0 /* @r{different number of leading zeros (2 and 1).} */ @end smallexample -This function is especially useful when dealing with filename sorting, -because filenames frequently hold indices/version numbers. - @code{strverscmp} is a GNU extension. @end deftypefun