[ovs-dev] CONTRIBUTING: Strengthen testing guidelines

Message ID	1453734923-36714-1-git-send-email-mark.b.kavanagh@intel.com
State	Deferred
Headers	show Return-Path: <dev-bounces@openvswitch.org> Received-SPF: pass (mx3-pf1.cudamail.com: SPF record at intel.com designates 192.55.52.88 as permitted sender) From: Mark Kavanagh <mark.b.kavanagh@intel.com> To: dev@openvswitch.org Date: Mon, 25 Jan 2016 15:15:23 +0000 CONTRIBUTING: Strengthen testing guidelines Message-Id: <1453734923-36714-1-git-send-email-mark.b.kavanagh@intel.com> Subject: [ovs-dev] [PATCH] CONTRIBUTING: Strengthen testing guidelines Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dev-bounces@openvswitch.org Sender: "dev" <dev-bounces@openvswitch.org>

Mark Kavanagh Jan. 25, 2016, 3:15 p.m. UTC

Current testing guidelines do not take account of two distinct
OVS builds:
   a) 'standard' OVS
   b) OVS with DPDK integration

It is critical that all patches are tested against both builds; if not,
there is a strong possibility that code which adversely affects one
of the builds may be upstreamed, leading to issues such as broken build,
e.g. http://openvswitch.org/pipermail/dev/2016-January/064328.html

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
---
 CONTRIBUTING.md | 3 +++
 1 file changed, 3 insertions(+)

Ben Pfaff Feb. 6, 2016, 12:35 a.m. UTC | #1

On Mon, Jan 25, 2016 at 03:15:23PM +0000, Mark Kavanagh wrote:
> Current testing guidelines do not take account of two distinct
> OVS builds:
>    a) 'standard' OVS
>    b) OVS with DPDK integration
> 
> It is critical that all patches are tested against both builds; if not,
> there is a strong possibility that code which adversely affects one
> of the builds may be upstreamed, leading to issues such as broken build,
> e.g. http://openvswitch.org/pipermail/dev/2016-January/064328.html
> 
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> ---
>  CONTRIBUTING.md | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
> index 086b6e8..fe9400a 100644
> --- a/CONTRIBUTING.md
> +++ b/CONTRIBUTING.md
> @@ -35,6 +35,9 @@ Testing is also important:
>    - A patch that adds or deletes files should also be tested with
>      `make distcheck` before submission.
> 
> +  - The above tests should be performed with OVS compiled as 'normal',
> +    and also compiled with DPDK support.

It's something of a pain to do that when DPDK is so hardware dependent.

I think the example that you show is actually a pretty good example of
how quickly we catch problems.

Mark Kavanagh Feb. 8, 2016, 5:34 p.m. UTC | #2

>
>On Mon, Jan 25, 2016 at 03:15:23PM +0000, Mark Kavanagh wrote:
>> Current testing guidelines do not take account of two distinct
>> OVS builds:
>>    a) 'standard' OVS
>>    b) OVS with DPDK integration
>>
>> It is critical that all patches are tested against both builds; if not,
>> there is a strong possibility that code which adversely affects one
>> of the builds may be upstreamed, leading to issues such as broken build,
>> e.g. http://openvswitch.org/pipermail/dev/2016-January/064328.html
>>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> ---
>>  CONTRIBUTING.md | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
>> index 086b6e8..fe9400a 100644
>> --- a/CONTRIBUTING.md
>> +++ b/CONTRIBUTING.md
>> @@ -35,6 +35,9 @@ Testing is also important:
>>    - A patch that adds or deletes files should also be tested with
>>      `make distcheck` before submission.
>>
>> +  - The above tests should be performed with OVS compiled as 'normal',
>> +    and also compiled with DPDK support.
>
>It's something of a pain to do that when DPDK is so hardware dependent.
>

Fair point. 

>I think the example that you show is actually a pretty good example of
>how quickly we catch problems.

Sure. At the same time though, this is a reactive approach, rather than a proactive one, which inevitably results in more work for everyone (think fixing broken builds, failed CI builds, root-causing issues, etc, etc.). 

If the validation can't be performed by patch authors, would you perhaps consider setting up an automated build environment on Nicera's end? This could be as simple as a git hook that would automatically build and test patches that have been approved by maintainers, when they are pushed to the upstream repo. If a test or build fails, the patch is rejected. It may be a little bit of work upfront, but could reduce the effort involved in fixing 'erroneous' patches, ultimately saving time for everyone.

Thanks in advance,
Mark

Justin Pettit Feb. 8, 2016, 6:03 p.m. UTC | #3

> On Feb 8, 2016, at 9:34 AM, Kavanagh, Mark B <mark.b.kavanagh@intel.com> wrote:
> 
>> I think the example that you show is actually a pretty good example of
>> how quickly we catch problems.
> 
> Sure. At the same time though, this is a reactive approach, rather than a proactive one, which inevitably results in more work for everyone (think fixing broken builds, failed CI builds, root-causing issues, etc, etc.). 
> 
> If the validation can't be performed by patch authors, would you perhaps consider setting up an automated build environment on Nicera's end? This could be as simple as a git hook that would automatically build and test patches that have been approved by maintainers, when they are pushed to the upstream repo. If a test or build fails, the patch is rejected. It may be a little bit of work upfront, but could reduce the effort involved in fixing 'erroneous' patches, ultimately saving time for everyone.

We don't have the infrastructure to host that, but I don't think it would be difficult to get a third party to handle it.  If it can't be made to work someplace like Travis, maybe Intel could host it.  :-)

I would suggest that an email be generated and sent to the author and the mailing list rather than revert the change, which could be difficult if further patches were applied after the commit but before the build-check completed.  The Linux kernel does something similar.  Shame is often a good motivator.

--Justin

Russell Bryant Feb. 8, 2016, 6:54 p.m. UTC | #4

On 02/08/2016 01:03 PM, Justin Pettit wrote:
> 
>> On Feb 8, 2016, at 9:34 AM, Kavanagh, Mark B <mark.b.kavanagh@intel.com> wrote:
>>
>>> I think the example that you show is actually a pretty good example of
>>> how quickly we catch problems.
>>
>> Sure. At the same time though, this is a reactive approach, rather than a proactive one, which inevitably results in more work for everyone (think fixing broken builds, failed CI builds, root-causing issues, etc, etc.). 
>>
>> If the validation can't be performed by patch authors, would you perhaps consider setting up an automated build environment on Nicera's end? This could be as simple as a git hook that would automatically build and test patches that have been approved by maintainers, when they are pushed to the upstream repo. If a test or build fails, the patch is rejected. It may be a little bit of work upfront, but could reduce the effort involved in fixing 'erroneous' patches, ultimately saving time for everyone.
> 
> We don't have the infrastructure to host that, but I don't think it
> would be difficult to get a third party to handle it.  If it can't be
> made to work someplace like Travis, maybe Intel could host it.  :-)

travis-ci is already doing a dpdk build at least, right?  So, build
errors are already quickly caught by CI (visibility of those failures is
another question).

So the issue is running the test suite?  I'd love to see a way to run
the test suite with OVS+DPDK that can be run in a virtual test
environment.  That'd remove the barriers from devs running it and also
let us run it in travis-ci.

> I would suggest that an email be generated and sent to the author and
> the mailing list rather than revert the change, which could be
> difficult if further patches were applied after the commit but before
> the build-check completed.  

There are solutions to this problem, but it requires letting a CI system
do the merging of patches for you.

> The Linux kernel does something similar.
> Shame is often a good motivator.

I don't think shame is exactly the culture I'd like to promote.
However, having travis-ci email committers whenever CI fails on a patch
they pushed would be helpful.  It's easy to miss otherwise.

Maybe that's possible and I'm just missing the option in travis-ci ...

Justin Pettit Feb. 8, 2016, 11:18 p.m. UTC | #5

> On Feb 8, 2016, at 10:54 AM, Russell Bryant <russell@ovn.org> wrote:
> 
> On 02/08/2016 01:03 PM, Justin Pettit wrote:
> 
> There are solutions to this problem, but it requires letting a CI system
> do the merging of patches for you.
> 
>> The Linux kernel does something similar.
>> Shame is often a good motivator.
> 
> I don't think shame is exactly the culture I'd like to promote.
> However, having travis-ci email committers whenever CI fails on a patch
> they pushed would be helpful.  It's easy to miss otherwise.
> 
> Maybe that's possible and I'm just missing the option in travis-ci ...

The "shame" comment was meant tongue-in-cheek, but I didn't want to overwhelm the message with smiley faces.  Sorry that didn't come through.

I just don't think this has been enough of an issue at this point that we need to introduce new commit infrastructure and procedures.  I think notifying people shortly after a commit from something like Travis that there is an issue will be sufficient.  If we start seeing regular problems that are not addressed promptly, then we should look more seriously at how to address it.

My guess is that switching to something like a gating infrastructure will introduce its own set of problem.  OVS works on a number of different platforms and gating on all those different builds and test suite runs will likely mean that commits are delayed by many minutes and possibly hours.  In my experience, those systems are often fragile, so it could result in indefinite delays for commits.

Do you think this has been that serious of a problem in practice?

--Justin

Russell Bryant Feb. 9, 2016, 1:38 a.m. UTC | #6

On 02/08/2016 06:18 PM, Justin Pettit wrote:
> 
>> On Feb 8, 2016, at 10:54 AM, Russell Bryant <russell@ovn.org> wrote:
>>
>> On 02/08/2016 01:03 PM, Justin Pettit wrote:
>>
>> There are solutions to this problem, but it requires letting a CI system
>> do the merging of patches for you.
>>
>>> The Linux kernel does something similar.
>>> Shame is often a good motivator.
>>
>> I don't think shame is exactly the culture I'd like to promote.
>> However, having travis-ci email committers whenever CI fails on a patch
>> they pushed would be helpful.  It's easy to miss otherwise.
>>
>> Maybe that's possible and I'm just missing the option in travis-ci ...
> 
> The "shame" comment was meant tongue-in-cheek, but I didn't want to
> overwhelm the message with smiley faces.  Sorry that didn't come
> through.

Ah, well I'm sorry too.  You had indeed already met your smiley limit of
1, what were you to do?!

Maybe we should be using github pull requests exclusively, for the sole
reason of having access to emoji.

> I just don't think this has been enough of an issue at this point
> that we need to introduce new commit infrastructure and procedures.
> I think notifying people shortly after a commit from something like
> Travis that there is an issue will be sufficient.  If we start seeing
> regular problems that are not addressed promptly, then we should look
> more seriously at how to address it.
> 
> My guess is that switching to something like a gating infrastructure
> will introduce its own set of problem.  OVS works on a number of
> different platforms and gating on all those different builds and test
> suite runs will likely mean that commits are delayed by many minutes
> and possibly hours.  In my experience, those systems are often
> fragile, so it could result in indefinite delays for commits.
> 
> Do you think this has been that serious of a problem in practice?

I do not.

Somewhat related - I'd like to be more proactive at keeping an eye on
travis-ci failures.  I just signed up for the build mailing list.  It
looks like travis-ci is configured to post there, but I don't see any
posts from travis-ci on that list since about Feb 2015 or so.  Could it
be a ML config issue?

Justin Pettit Feb. 9, 2016, 7:45 a.m. UTC | #7

> On Feb 8, 2016, at 5:38 PM, Russell Bryant <russell@ovn.org> wrote:
> 
> On 02/08/2016 06:18 PM, Justin Pettit wrote:
>> 
>> The "shame" comment was meant tongue-in-cheek, but I didn't want to
>> overwhelm the message with smiley faces.  Sorry that didn't come
>> through.
> 
> Ah, well I'm sorry too.  You had indeed already met your smiley limit of
> 1, what were you to do?!

You and my wife should get together and discuss my inability to open myself up emotionally through email.

> Somewhat related - I'd like to be more proactive at keeping an eye on
> travis-ci failures.  I just signed up for the build mailing list.  It
> looks like travis-ci is configured to post there, but I don't see any
> posts from travis-ci on that list since about Feb 2015 or so.  Could it
> be a ML config issue?

I'd need to check with Ben, but I think we needed to respond to some cookie in order to allow posting from travis-ci.

One concern we had about sending messages to ovs-dev was that the tests are somewhat unreliable due to timing issues, so they could end up raising so many false alarms that people ignored them (and, worse, annoyed everyone else on the list).  What we really need to do is improve the tests to be reliable so that when there's a reported error we know there's actually a bug.  Barring actually doing things right, I wonder if we could re-run failed tests to see if they pass on a second run before emailing out a warning...

--Justin

Justin Pettit Feb. 10, 2016, 7:13 a.m. UTC | #8

> On Feb 8, 2016, at 5:38 PM, Russell Bryant <russell@ovn.org> wrote:
> 
> Somewhat related - I'd like to be more proactive at keeping an eye on
> travis-ci failures.  I just signed up for the build mailing list.  It
> looks like travis-ci is configured to post there, but I don't see any
> posts from travis-ci on that list since about Feb 2015 or so.  Could it
> be a ML config issue?

I think I fixed the issue(s).  There were a couple of travis-ci failure notifications today.  Did you receive them a little while ago?

--Justin

Russell Bryant Feb. 10, 2016, 1:21 p.m. UTC | #9

On 02/10/2016 02:13 AM, Justin Pettit wrote:
> 
>> On Feb 8, 2016, at 5:38 PM, Russell Bryant <russell@ovn.org> wrote:
>>
>> Somewhat related - I'd like to be more proactive at keeping an eye on
>> travis-ci failures.  I just signed up for the build mailing list.  It
>> looks like travis-ci is configured to post there, but I don't see any
>> posts from travis-ci on that list since about Feb 2015 or so.  Could it
>> be a ML config issue?
> 
> I think I fixed the issue(s).  There were a couple of travis-ci
> failure notifications today.  Did you receive them a little while
> ago?

I see several in the archives here:

http://www.openvswitch.org/pipermail/build/2016-February/date.html

but I only received one:

http://www.openvswitch.org/pipermail/build/2016-February/000367.html

I don't see the rest in my spam folder, either.  I've been missing some
messages from the -dev list as well, though.  I don't know what the
problem is.

[ovs-dev] CONTRIBUTING: Strengthen testing guidelines

Commit Message

Comments

Patch