Message ID | 20191111222741.u77idj6ijpljvetx@chatter.i7.local |
---|---|
State | Superseded |
Headers | show |
Series | Improve pull request URL matching regex | expand |
On 12/11/19 9:27 am, Konstantin Ryabitsev wrote: > Existing regex was missing several important use cases, such as: > > - tag/branch info wrapping to the next line, e.g.: > > ---- > are available in the Git repository at: > > https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/ > tags/v5.4-next-soc > > ---- > (see example: https://patchwork.kernel.org/patch/11236893/) > > - tag/branch info being wrapped to the next line with a backslash, e.g.: > > ---- > are available in the Git repository at: > > https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/ \ > tags/v5.4-next-soc > > ---- > (no example, but I've seen this before) > > The proposed change deals with these edge-cases. > > Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> This needs a test :) Should be as simple as adding the examples you link to in patchwork/tests/mail, and then adding a couple of one-line test cases in PatchParseTest in patchwork/tests/test_parser.py. > --- > patchwork/parser.py | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/patchwork/parser.py b/patchwork/parser.py > index c794f09..d25c0df 100644 > --- a/patchwork/parser.py > +++ b/patchwork/parser.py > @@ -939,11 +939,11 @@ def parse_patch(content): > def parse_pull_request(content): > git_re = re.compile(r'^The following changes since commit.*' > r'^are available in the git repository at:\n' > - r'^\s*([\S]+://[^\n]+)$', > + r'^\s*([\w+-]+(?:://|@)[\w/.@:~-]+[\s\\]*[\w/._-]*)\s*$', > re.DOTALL | re.MULTILINE | re.IGNORECASE) > match = git_re.search(content) > if match: > - return match.group(1) > + return re.sub('\s+', ' ', match.group(1)).strip() > return None > > > > base-commit: 239fbd2ca1bf140bc61fdee922944624b23c812c >
diff --git a/patchwork/parser.py b/patchwork/parser.py index c794f09..d25c0df 100644 --- a/patchwork/parser.py +++ b/patchwork/parser.py @@ -939,11 +939,11 @@ def parse_patch(content): def parse_pull_request(content): git_re = re.compile(r'^The following changes since commit.*' r'^are available in the git repository at:\n' - r'^\s*([\S]+://[^\n]+)$', + r'^\s*([\w+-]+(?:://|@)[\w/.@:~-]+[\s\\]*[\w/._-]*)\s*$', re.DOTALL | re.MULTILINE | re.IGNORECASE) match = git_re.search(content) if match: - return match.group(1) + return re.sub('\s+', ' ', match.group(1)).strip() return None
Existing regex was missing several important use cases, such as: - tag/branch info wrapping to the next line, e.g.: ---- are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/ tags/v5.4-next-soc ---- (see example: https://patchwork.kernel.org/patch/11236893/) - tag/branch info being wrapped to the next line with a backslash, e.g.: ---- are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/ \ tags/v5.4-next-soc ---- (no example, but I've seen this before) The proposed change deals with these edge-cases. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> --- patchwork/parser.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) base-commit: 239fbd2ca1bf140bc61fdee922944624b23c812c