Patch not showing in patchwork

Message ID	20140611103312.GE24193@spoyarek.pnq.redhat.com
State	New
Headers	show Return-Path: <libc-alpha-return-50694-incoming=patchwork.ozlabs.org@sourceware.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:in-reply-to; q=dns; s=default; b=nHTy +vbxi/lW3/+bLXlNUupf4DZt2UtVpxBqKCdKpMf+cfnIpCktfU78MSED/7OorNFr 2V1kJppa13PU7OYCSgkYLP7sTdU/qS3XnKVZ01v7MerXApf0EMvJFSkP4gfPaeOu b3qgVAi7ycAqIFWIgJMqgCLiZF6nVpahbexqu0E= Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org Date: Wed, 11 Jun 2014 16:03:12 +0530 From: Siddhesh Poyarekar <siddhesh@redhat.com> To: Pedro Alves <palves@redhat.com> Cc: "Joseph S. Myers" <joseph@codesourcery.com>, libc-alpha@sourceware.org Subject: Re: Patch not showing in patchwork Message-ID: <20140611103312.GE24193@spoyarek.pnq.redhat.com> References: <Pine.LNX.4.64.1406101636120.8581@digraph.polyomino.org.uk> <53976045.8050602@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="bi5JUZtvcfApsciF" Content-Disposition: inline In-Reply-To: <53976045.8050602@redhat.com> User-Agent: Mutt/1.5.22.1-rc1 (2013-10-16)

Message ID

20140611103312.GE24193@spoyarek.pnq.redhat.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:date:from:to:cc:subject:message-id:references
	:mime-version:content-type:in-reply-to; q=dns; s=default; b=nHTy
	+vbxi/lW3/+bLXlNUupf4DZt2UtVpxBqKCdKpMf+cfnIpCktfU78MSED/7OorNFr
	2V1kJppa13PU7OYCSgkYLP7sTdU/qS3XnKVZ01v7MerXApf0EMvJFSkP4gfPaeOu
	b3qgVAi7ycAqIFWIgJMqgCLiZF6nVpahbexqu0E=
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
Date: Wed, 11 Jun 2014 16:03:12 +0530
From: Siddhesh Poyarekar <siddhesh@redhat.com>
To: Pedro Alves <palves@redhat.com>
Cc: "Joseph S. Myers" <joseph@codesourcery.com>, libc-alpha@sourceware.org
Subject: Re: Patch not showing in patchwork
Message-ID: <20140611103312.GE24193@spoyarek.pnq.redhat.com>
References: <Pine.LNX.4.64.1406101636120.8581@digraph.polyomino.org.uk>
	<53976045.8050602@redhat.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="bi5JUZtvcfApsciF"
Content-Disposition: inline
In-Reply-To: <53976045.8050602@redhat.com>
User-Agent: Mutt/1.5.22.1-rc1 (2013-10-16)

Commit Message

Siddhesh Poyarekar June 11, 2014, 10:33 a.m. UTC

On Tue, Jun 10, 2014 at 08:45:09PM +0100, Pedro Alves wrote:
> On 06/10/2014 05:39 PM, Joseph S. Myers wrote:
> > I noticed that my patch 
> > <https://sourceware.org/ml/libc-alpha/2014-06/msg00077.html> does not 
> > appear in the list of patches in patchwork (it appears not to have a patch 
> > entry in patchwork at all, rather than having been mistakenly marked as 
> > committed).  Is there some way to tell why it didn't get recognised by 
> > patchwork?
> > 
> 
> I'll guess due to:
> 
>  Content-Type: text/plain; charset="X-UNKNOWN"

This was the problem.  I have pushed the patch manually:

http://patchwork.sourceware.org/patch/1430/

I have also applied the following patch to patchwork to allow it to
fall back on alternate encodings if the charset is unknown.  I'll post
it to the patchwork mailing list.

Siddhesh

Comments

Joseph Myers June 11, 2014, 12:11 p.m. UTC | #1

On Wed, 11 Jun 2014, Siddhesh Poyarekar wrote:

> +                try_charsets = ['utf-8', 'windows-1252', 'ascii', 'iso-8859']

I suspect you mean iso-8859-1.  (Both windows-1252 and iso-8859-1 provide 
useful fallbacks for mixed-character-set patches - what's relevant is 
actually the bytes of the patch, after all, when it's changing files that 
don't all use the same character set, and misinterpreting those bytes as 
the wrong characters doesn't matter so much.  This particular patch was 
actually iso-8859-1 - +/- signs in pre-existing comments in patch 
context.)

Siddhesh Poyarekar June 11, 2014, 12:55 p.m. UTC | #2

On Wed, Jun 11, 2014 at 12:11:12PM +0000, Joseph S. Myers wrote:
> On Wed, 11 Jun 2014, Siddhesh Poyarekar wrote:
> 
> > +                try_charsets = ['utf-8', 'windows-1252', 'ascii', 'iso-8859']
> 
> I suspect you mean iso-8859-1.  (Both windows-1252 and iso-8859-1 provide 
> useful fallbacks for mixed-character-set patches - what's relevant is 
> actually the bytes of the patch, after all, when it's changing files that 
> don't all use the same character set, and misinterpreting those bytes as 
> the wrong characters doesn't matter so much.  This particular patch was 
> actually iso-8859-1 - +/- signs in pre-existing comments in patch 
> context.)

Thanks, fixed.

Siddhesh

--- a/apps/patchwork/bin/parsemail.py	2014-06-11 15:53:12.685666812 +0530
+++ b/apps/patchwork/bin/parsemail.py	2014-06-11 15:53:03.991667186 +0530
@@ -147,6 +147,13 @@ 
         return match.group(1)
     return None
 
+def try_decode(payload, charset):
+    try:
+        payload = unicode(payload, charset)
+    except UnicodeDecodeError:
+        return None
+    return payload
+
 def find_content(project, mail):
     patchbuf = None
     commentbuf = ''
@@ -157,15 +164,27 @@ 
             continue
 
         payload = part.get_payload(decode=True)
-        charset = part.get_content_charset()
         subtype = part.get_content_subtype()
 
-        # if we don't have a charset, assume utf-8
-        if charset is None:
-            charset = 'utf-8'
-
         if not isinstance(payload, unicode):
-            payload = unicode(payload, charset)
+            charset = part.get_content_charset()
+
+            # If there is no charset or if it is unknown, then try some common
+            # charsets before we fail.
+            if charset is None or charset == 'x-unknown':
+                try_charsets = ['utf-8', 'windows-1252', 'ascii', 'iso-8859']
+            else:
+                try_charsets = [charset]
+
+            for cset in try_charsets:
+                decoded_payload = try_decode(payload, cset)
+                if decoded_payload is not None:
+                    break
+            payload = decoded_payload
+
+            # Could not find a valid decoded payload.  Fail.
+            if payload is None:
+                return (None, None)
 
         if subtype in ['x-patch', 'x-diff']:
             patchbuf = payload