From patchwork Mon Jun 1 08:55:46 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?T25kxZllaiBCw61sa2E=?= X-Patchwork-Id: 478857 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FD3414027F for ; Mon, 1 Jun 2015 18:56:04 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=sourceware.org header.i=@sourceware.org header.b=lpfetre4; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:references :mime-version:content-type:content-transfer-encoding :in-reply-to; q=dns; s=default; b=d7fDEiPu3a3Niiq2o30hK4DqX2+GFP qNYd/mJCiivIf4b1eqKmKXpnpC5Zrfv/1Ll9uQ/Y35+NKKpynwijSpvm6x47yzFD Fqmon1sdczxqyAP67paG0NrIe19ySHmjeH8b3GG5ktpnvHglYWEj5IMVTeYykRME 59ItNMyA9BCH0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:references :mime-version:content-type:content-transfer-encoding :in-reply-to; s=default; bh=AlTYNURxPn1bAuNLnnSoxYZsjIU=; b=lpfe tre4icGjFwRweUMNgboM/+TbHaqnJuK7bhZt5EU2RMR5jGZwI9veiY25fVmTk/CC Y28OcfWe91rgo+PdYBCLs0T+EvgFYlAH+XgFMONfvzdakKpJSHgVZCNaQ42gL2xr ERqVmAi2wzYXGyzxJqtCH1cmo07igXb7CXmrVts= Received: (qmail 100821 invoked by alias); 1 Jun 2015 08:55:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 100809 invoked by uid 89); 1 Jun 2015 08:55:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.7 required=5.0 tests=AWL, BAYES_50, FREEMAIL_FROM, SPF_NEUTRAL autolearn=no version=3.3.2 X-HELO: popelka.ms.mff.cuni.cz Date: Mon, 1 Jun 2015 10:55:46 +0200 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: libc-alpha@sourceware.org Subject: [RFC] Avoid cltz in comparison functions. Message-ID: <20150601085546.GA17458@domone> References: <20150531190013.GA8530@domone> <20150531204725.GA9108@domone> <20150531220131.GA666@domone> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150531220131.GA666@domone> User-Agent: Mutt/1.5.20 (2009-06-14) On Mon, Jun 01, 2015 at 12:01:31AM +0200, Ondřej Bílka wrote: > And here is updated version of comparing functions. > I recalled another trick that I knew but couldn't use for sse2 code. A trick for strcmp is that you don't need access individual bytes to get inequality. Just mask bytes after first difference and compare them. That could be done with following. // find first bit where they differ. mask = (x ^ y) ^ ((x ^ y) - 1) // set whole bytes of mask mask = (mask & ones * 255) // also remove bytes after terminating 0 mask &= zero ^ (zero -1); if (x & mask > y & mask) return 1; if (x & mask < y & mask) return -1; return 0; Then you could replace these checks with branchless ones as I did in following RFC. A problem is that it would make diff skeleton more messy as sometimes its used for finding byte location like in strcasecmp. So how is best way to integrate tricks like that? diff --git a/sysdeps/generic/string_vector.h b/sysdeps/generic/string_vector.h index a3c4741..18ca118 100644 --- a/sysdeps/generic/string_vector.h +++ b/sysdeps/generic/string_vector.h @@ -191,6 +191,47 @@ bytes_equal_nocarry (vector_int x, vector_int y) } #endif +#ifndef CUSTOM_COMPARE_BYTES +# if __BYTE_ORDER == __LITTLE_ENDIAN +static __always_inline +int +compare_bytes (vector_int x, vector_int y, vector_int zero_mask) +{ + /* Removes bytes beyond terminating '\0'. When zero mask its 0 then it + evaluates to ~0 and doesn't mask anything. */ + zero_mask = zero_mask ^ (zero_mask - 1); + + /* Find first bit that differs. Make mask with bits upto that bit set + to 1. */ + vector_int and_mask = (((x ^ y) ^ ((x ^ y) - 1)) + + /* With difference in highest bit we can't compare them by subtraction. */ + + if (and_mask == ~((vector_int) 0)) + { + if (x > y) + return 1; + if (x < y) + return -1; + return 0; + } + + /* Convert mask to bytewise one that has in given byte 255 or 0 if bitwise + mask had some bit selected or not. */ + + and_mask = (and_mask & ones) * 255; + + and_mask = and_mask & zero_mask; + + /* As check above ensures that difference fits signed vector_int we could + subtract them. */ + + vector_int diff = (x & and_mask) - (y_greater & and_mask); + return (int)(diff >> (8 * (LSIZE - sizeof (int)))); +} +# endif +#endif + /* When you have hardware ctz/clz its probably best bet. However for softare emulation you could get better than generic one as you