From patchwork Tue May 12 23:53:39 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?T25kxZllaiBCw61sa2E=?= X-Patchwork-Id: 471791 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 19DFE140D16 for ; Wed, 13 May 2015 19:48:39 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=sourceware.org header.i=@sourceware.org header.b=bXI+5qAN; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id :mime-version:content-type; q=dns; s=default; b=XQU7mlmeYFCkLES2 swNAAx9lxanlQAwJwcrqvVVxXVeVUDSrucLSowmE3gLuRPIlYhz5qWHt6SVXXc4y 5W1e2bdP+M16ChY8WI2FdHTAMPLp++zCXzRYiL4EkrL+asa3jHhvzhGCzyCsgyE0 ugMJ3GRh5Ca7wPRCX1Cp2bXKH+0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id :mime-version:content-type; s=default; bh=zMpuTNVDsllAR9oIqBpwA2 d6UVw=; b=bXI+5qAN3+7FSt0eqneJjuEjWLaPMJTjNuM9w80OtyQV5uHQgfAPwm M782XWkP7ZFiO5eY3X14bB31C3jYcSe8BDRk46DJ6RasKYoZGY+U9OM9+JvUmP2a M2Rwkv5VwPy0x2UpSp8YPkqOze2vFClW7JFoVJSk8lIRuca9rZLbg= Received: (qmail 26886 invoked by alias); 13 May 2015 09:48:33 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 26862 invoked by uid 89); 13 May 2015 09:48:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.1 required=5.0 tests=AWL, BAYES_00, DATE_IN_PAST_06_12, FREEMAIL_FROM, SPF_NEUTRAL autolearn=no version=3.3.2 X-HELO: popelka.ms.mff.cuni.cz Date: Wed, 13 May 2015 01:53:39 +0200 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: libc-alpha@sourceware.org Cc: eggert@cs.ucla.edu Subject: [PATCH v2] Improve fnmatch performance. Message-ID: <20150512235339.GA27716@domone> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Hi, this is revised improvement of fnmatch. It improves queries of locate command around three times for UTF locale on x64 by finding start of pattern with fast strstr. It uses fast locale type check as introduced in strdiff. How to synchronize this with gnulib? Only implementation specific detail is utf8 detection. There are several possible improvements. Main idea is reject strings as fast as possible. If string is matched then likely you will do some expensive operation with it. This for simplicity just tries to find starting quarted of strstr. it doesn't apply on patterns that start with special character. Natural extension would be parse pattern, find quartet sequence that must occur and do same strstr test. Second would be use whole characters in casefold strstr with nonascii UTF. Passes test, OK to commit this? * posix/fnmatch.c (fnmatch): Improve performance. diff --git a/posix/fnmatch.c b/posix/fnmatch.c index a707847..7152055 100644 --- a/posix/fnmatch.c +++ b/posix/fnmatch.c @@ -333,7 +333,48 @@ fnmatch (pattern, string, flags) int flags; { # if HANDLE_MULTIBYTE - if (__builtin_expect (MB_CUR_MAX, 1) != 1) + + struct __locale_data *current = _NL_CURRENT_LOCALE->__locales[LC_COLLATE]; + uint_fast32_t encoding = + current->values[_NL_ITEM_INDEX (_NL_COLLATE_ENCODING_TYPE)].word; + + /* ASCII with \+/.*?[{(@! excluded. */ + static unsigned char normal[256] = { + 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, + 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 + }; + + if (encoding == !__cet_other) + { + char start[16]; + char *string2; + size_t i; + for (i = 0; i < 4 && normal[(unsigned char) pattern[i]]; i++) + start[i] = pattern[i]; + start[i] = 0; + if (flags & FNM_CASEFOLD) + string2 = strcasestr (string, start); + else + string2 = strstr (string, start); + if (!string2) + return FNM_NOMATCH; + } + + if (MB_CUR_MAX != 1) { mbstate_t ps; size_t n;