From patchwork Sat Nov 30 22:14:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stephen Finucane X-Patchwork-Id: 1202757 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47QQgW2DrKz9sRD for ; Sun, 1 Dec 2019 09:15:51 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=that.guru Authentication-Results: ozlabs.org; dkim=fail reason="key not found in DNS" (0-bit key; unprotected) header.d=that.guru header.i=@that.guru header.b="tY2mVCT2"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 47QQgW0fQpzDqtR for ; Sun, 1 Dec 2019 09:15:51 +1100 (AEDT) X-Original-To: patchwork@lists.ozlabs.org Delivered-To: patchwork@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=that.guru (client-ip=160.202.107.15; helo=q2relay15.mxroute.com; envelope-from=stephen@that.guru; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=that.guru Authentication-Results: lists.ozlabs.org; dkim=fail reason="key not found in DNS" (0-bit key; unprotected) header.d=that.guru header.i=@that.guru header.b="tY2mVCT2"; dkim-atps=neutral Received: from q2relay15.mxroute.com (q2relay15.mxroute.com [160.202.107.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 47QQfb1WLTzDqsp for ; Sun, 1 Dec 2019 09:15:02 +1100 (AEDT) Received: from filter003.mxroute.com [168.235.111.26] (Authenticated sender: mN4UYu2MZsgR) by q2relay15.mxroute.com (ZoneMTA) with ESMTPSA id 16ebe619664000f0dc.002 for (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256); Sat, 30 Nov 2019 22:14:56 +0000 X-Zone-Loop: cc3997de759702df827c4f878e5f03f7139bf9062222 X-Originating-IP: [168.235.111.26] Received: from one.mxroute.com (one.mxroute.com [195.201.59.211]) by filter003.mxroute.com (Postfix) with ESMTPS id 3DD64610D0; Sat, 30 Nov 2019 22:14:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=that.guru; s=default; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=kjBaY4ajMWGqIcbvd+1ypt4GDApVE0cjaNZX83cVXgo=; b=tY2mVCT2g9JLGJmLCbqMzv/AQb 74Z2G5ta1KaGfbCZ3bnOwcUHDNF7vNPeLqWArZ8xXZ7vo/NW2OEOrmjavBtYObTjIZbuqjofnyfvH 3RFgqF0mqJZlnavOms+Mt9D2nVvYXRIBrTu2w32eLXAWQbuNIYWnfPyg0EsZfRXZmgS2JlALipim4 nbtM5h2g8hyEKXzpdvE5LrlTHQCLAMou9eVD9dpwuffLuAW9Ymo9IAZO6rIPwZoTWVI0j3uVaERZs VXafr8YOiVdUf0qT/+wZFA5ilcKAdswmPhUddG7W/7JZEDwMKlDjBzlmEE/H87SqFjhhJ8qYftzJ4 4Nwo0Lzg==; From: Stephen Finucane To: patchwork@lists.ozlabs.org Subject: [PATCH v3 2/2] parser: Use a second query to weed out duplicate series Date: Sat, 30 Nov 2019 22:14:32 +0000 Message-Id: <20191130221432.73118-2-stephen@that.guru> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191130221432.73118-1-stephen@that.guru> References: <20191130221432.73118-1-stephen@that.guru> MIME-Version: 1.0 X-AuthUser: stephen@that.guru X-BeenThere: patchwork@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Patchwork development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Patchwork" Annoyingly, not all email clients properly thread emails using the message ID fields originally specified in RFC 822 [1]. Worse, some MTAs (cough, outlook.com, cough) actually override what the client configures, breaking the world in the process. Realising this is an issue, Patchwork supports threading using arbitrary metadata in addition to the RFC 822 metadata. Specifically, it uses a combination of submitter and list-id extracted from the headers along with the series version and total count metadata extracted from the subject. In addition to this, we timebox things so that two or more series that match on all of this metadata but which are sent some time apart from each other aren't combined by accident. This does leave one edge case - duplicate series received within the timebox will be combined. We've resigned ourselves to this fact on the basis that it's extremely unlikely for all of these things to go wrong at once. Given all the above, there should be no reason that attempting to find series by series markers should return more than one series. The timeboxing will prevent us grouping similar looking series by accident and the only other reason for this to happen is because we lost a race and we should try again. [1] https://tools.ietf.org/html/rfc822 Signed-off-by: Stephen Finucane Cc: Daniel Axtens --- patchwork/parser.py | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/patchwork/parser.py b/patchwork/parser.py index 425684f6..563338ff 100644 --- a/patchwork/parser.py +++ b/patchwork/parser.py @@ -298,7 +298,8 @@ def _find_series_by_markers(project, mail, author): return Series.objects.get( submitter=author, project=project, version=version, total=total, date__range=[start_date, end_date]) - except (Series.DoesNotExist, Series.MultipleObjectsReturned): + except Series.DoesNotExist: + # we're creating a new series return @@ -1130,8 +1131,14 @@ def parse_mail(mail, list_id=None): except SeriesReference.DoesNotExist: SeriesReference.objects.create( msgid=ref, project=project, series=series) + + # attempt to pull the series in again, raising an exception + # if we lost the race when creating a series and force us + # to go through this again + series = find_series(project, mail, author) + break - except IntegrityError: + except (IntegrityError, Series.MultipleObjectsReturned): # we lost the race so go again logger.warning('Conflict while saving series. This is ' 'probably because multiple patches belonging '