diff mbox

[BUG] net_cls: Panic occured when net_cls subsystem use

Message ID 1243605269.3966.28.camel@dogo.mojatatu.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

jamal May 29, 2009, 1:54 p.m. UTC
Hi Minoru,

I hate to do this to you after i made suggestions on how to
make the changes.... (as an adult i hate it when people do it to
me;->)
But sometimes code helps. So what i meant is the attached patch.
I havent even compiled it yet.
If it works, please submit it and add yourself as the author
(and a signed-off from me). Then we can revisit the init()
issue in cls_group..
You should also cc tgraf in your cls_grp config questions.

cheers,
jamal

On Fri, 2009-05-29 at 09:46 -0400, jamal wrote:
> 
> This is incorrect. tp may already exist and you dont want to destroy
> for failure to change its parameters. You also dont want to reattach
> an existing tp because it succeeded in parameter change. 
> So the soln is to check if this is a new tp and then do what you did
> above...
> Did that make sense?
> 
> cheers,
> jamal

Comments

Jarek Poplawski May 29, 2009, 10:59 p.m. UTC | #1
On Fri, May 29, 2009 at 09:54:29AM -0400, jamal wrote:
> Hi Minoru,
> 
> I hate to do this to you after i made suggestions on how to
> make the changes.... (as an adult i hate it when people do it to
> me;->)
> But sometimes code helps. So what i meant is the attached patch.
> I havent even compiled it yet.
> If it works, please submit it and add yourself as the author
> (and a signed-off from me). Then we can revisit the init()
> issue in cls_group..
> You should also cc tgraf in your cls_grp config questions.
> 
> cheers,
> jamal
> 
> On Fri, 2009-05-29 at 09:46 -0400, jamal wrote:
> > 
> > This is incorrect. tp may already exist and you dont want to destroy
> > for failure to change its parameters. You also dont want to reattach
> > an existing tp because it succeeded in parameter change. 
> > So the soln is to check if this is a new tp and then do what you did
> > above...
> > Did that make sense?
> > 
> > cheers,
> > jamal

> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 0759f32..8760a48 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -266,11 +266,6 @@ replay:
>  			goto errout;
>  		}
>  
> -		spin_lock_bh(root_lock);
> -		tp->next = *back;
> -		*back = tp;
> -		spin_unlock_bh(root_lock);
> -
>  	} else if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind))
>  		goto errout;
>  
...
        } else {
                switch (n->nlmsg_type) {
                case RTM_NEWTFILTER:
                        err = -EEXIST;
                        if (n->nlmsg_flags & NLM_F_EXCL)
                                goto errout;
                        break;

Probably this case needs tcf_destroy() too.

Cheers,
Jarek P.

> @@ -314,8 +309,21 @@ replay:
>  	}
>  
>  	err = tp->ops->change(tp, cl, t->tcm_handle, tca, &fh);
> -	if (err == 0)
> +	if (err == 0) {
> +		if (n->nlmsg_type == RTM_NEWTFILTER &&
> +		    (n->nlmsg_flags&NLM_F_CREATE)) {
> +			spin_lock_bh(root_lock);
> +			tp->next = *back;
> +			*back = tp;
> +			spin_unlock_bh(root_lock);
> +		}
>  		tfilter_notify(skb, n, tp, fh, RTM_NEWTFILTER);
> +	} else {
> +		if (n->nlmsg_type == RTM_NEWTFILTER &&
> +		    (n->nlmsg_flags&NLM_F_CREATE)) {
> +			tcf_destroy(tp);
> +		}
> +	}
>  
>  errout:
>  	if (cl)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 30, 2009, 11:45 a.m. UTC | #2
On Sat, May 30, 2009 at 08:17:02PM +0900, Minoru Usui wrote:
> Hi Jamal and Jarek
> 
> Thank you for your review and advice.

But there is a bit more...

> 
> 2009/5/30 Jarek Poplawski <jarkao2@gmail.com>
> 
> > On Fri, May 29, 2009 at 09:54:29AM -0400, jamal wrote:
> > > Hi Minoru,
> > >
> > > I hate to do this to you after i made suggestions on how to
> > > make the changes.... (as an adult i hate it when people do it to
> > > me;->)
> > > But sometimes code helps. So what i meant is the attached patch.
> > > I havent even compiled it yet.
> > > If it works, please submit it and add yourself as the author
> > > (and a signed-off from me). Then we can revisit the init()
> > > issue in cls_group..
> > > You should also cc tgraf in your cls_grp config questions.
> > >
> > > cheers,
> > > jamal
> > >
> > > On Fri, 2009-05-29 at 09:46 -0400, jamal wrote:
> > > >
> > > > This is incorrect. tp may already exist and you dont want to destroy
> > > > for failure to change its parameters. You also dont want to reattach
> > > > an existing tp because it succeeded in parameter change.
> > > > So the soln is to check if this is a new tp and then do what you did
> > > > above...
> > > > Did that make sense?
> > > >
> > > > cheers,
> > > > jamal
> >
> > > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> > > index 0759f32..8760a48 100644
> > > --- a/net/sched/cls_api.c
> > > +++ b/net/sched/cls_api.c
> > > @@ -266,11 +266,6 @@ replay:
> > >                       goto errout;
> > >               }
> > >
> > > -             spin_lock_bh(root_lock);
> > > -             tp->next = *back;
> > > -             *back = tp;
> > > -             spin_unlock_bh(root_lock);
> > > -
> > >       } else if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND],
> > tp->ops->kind))
> > >               goto errout;
> > >
> > ...
> >        } else {
> >                switch (n->nlmsg_type) {
> >                case RTM_NEWTFILTER:
> >                        err = -EEXIST;
> >                        if (n->nlmsg_flags & NLM_F_EXCL)
> >                                goto errout;
> >                        break;
> >
> > Probably this case needs tcf_destroy() too.
> 
> 
> > Cheers,
> > Jarek P.
> 
> 
> I'll check and test it next week.
> 
> 
> >
> > > @@ -314,8 +309,21 @@ replay:
> > >       }
> > >
> > >       err = tp->ops->change(tp, cl, t->tcm_handle, tca, &fh);
> > > -     if (err == 0)
> > > +     if (err == 0) {
> > > +             if (n->nlmsg_type == RTM_NEWTFILTER &&
> > > +                 (n->nlmsg_flags&NLM_F_CREATE)) {

Since "tc filter replace" uses this type and flag too without creating
tp, this check is not enough. I guess we could simply use a variable
like tp_created etc. Anyway, changing this place looks tricky to me,
so maybe it would be safer to do a separate cls_cgroup fix just for
-stable, and this one patch for -next only?

Jarek P.

> > > +                     spin_lock_bh(root_lock);
> > > +                     tp->next = *back;
> > > +                     *back = tp;
> > > +                     spin_unlock_bh(root_lock);
> > > +             }
> > >               tfilter_notify(skb, n, tp, fh, RTM_NEWTFILTER);
> > > +     } else {
> > > +             if (n->nlmsg_type == RTM_NEWTFILTER &&
> > > +                 (n->nlmsg_flags&NLM_F_CREATE)) {
> > > +                     tcf_destroy(tp);
> > > +             }
> > > +     }
> > >
> > >  errout:
> > >       if (cl)
> >
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 30, 2009, 11:50 a.m. UTC | #3
On Sat, 2009-05-30 at 00:59 +0200, Jarek Poplawski wrote:
  
> ...
>         } else {
>                 switch (n->nlmsg_type) {
>                 case RTM_NEWTFILTER:
>                         err = -EEXIST;
>                         if (n->nlmsg_flags & NLM_F_EXCL)
>                                 goto errout;
>                         break;
> 
> Probably this case needs tcf_destroy() too.

No - that if stmnt will fail if this is a new filter being
created.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 30, 2009, 11:56 a.m. UTC | #4
On Sat, 2009-05-30 at 13:45 +0200, Jarek Poplawski wrote:

> > > >       }
> > > >
> > > >       err = tp->ops->change(tp, cl, t->tcm_handle, tca, &fh);
> > > > -     if (err == 0)
> > > > +     if (err == 0) {
> > > > +             if (n->nlmsg_type == RTM_NEWTFILTER &&
> > > > +                 (n->nlmsg_flags&NLM_F_CREATE)) {
> 
> Since "tc filter replace" uses this type and flag too without creating
> tp, this check is not enough. I guess we could simply use a variable
> like tp_created etc. 

It will be superfluos. 
tp_created is the check
n->nlmsg_type == RTM_NEWTFILTER && n->nlmsg_flags&NLM_F_CREATE
replace will be
n->nlmsg_type == RTM_NEWTFILTER && n->nlmsg_flags & NLM_F_EXCL

> Anyway, changing this place looks tricky to me,
> so maybe it would be safer to do a separate cls_cgroup fix just for
> -stable, and this one patch for -next only?

I think they are two separate issues.
The fact that we dont destroy an allocated tp on failure is an issue
regardless of what cls_group does. In the case of Minoru's issue
it is because he is misconfiguring cls_group.

cheers,
jamal



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 30, 2009, 12:07 p.m. UTC | #5
On Sat, May 30, 2009 at 07:56:34AM -0400, jamal wrote:
> On Sat, 2009-05-30 at 13:45 +0200, Jarek Poplawski wrote:
> 
> > > > >       }
> > > > >
> > > > >       err = tp->ops->change(tp, cl, t->tcm_handle, tca, &fh);
> > > > > -     if (err == 0)
> > > > > +     if (err == 0) {
> > > > > +             if (n->nlmsg_type == RTM_NEWTFILTER &&
> > > > > +                 (n->nlmsg_flags&NLM_F_CREATE)) {
> > 
> > Since "tc filter replace" uses this type and flag too without creating
> > tp, this check is not enough. I guess we could simply use a variable
> > like tp_created etc. 
> 
> It will be superfluos. 
> tp_created is the check
> n->nlmsg_type == RTM_NEWTFILTER && n->nlmsg_flags&NLM_F_CREATE
> replace will be
> n->nlmsg_type == RTM_NEWTFILTER && n->nlmsg_flags & NLM_F_EXCL

Hmm... Probably I miss something, but I've just seen this prink during
tc filter replace with:

err = tp->ops->change();
if (n->nlmsg_type == RTM_NEWTFILTER && (n->nlmsg_flags&NLM_F_CREATE))
	printk(...);

> 
> > Anyway, changing this place looks tricky to me,
> > so maybe it would be safer to do a separate cls_cgroup fix just for
> > -stable, and this one patch for -next only?
> 
> I think they are two separate issues.
> The fact that we dont destroy an allocated tp on failure is an issue
> regardless of what cls_group does. In the case of Minoru's issue
> it is because he is misconfiguring cls_group.

Sure, but we don't want people to get oops in such a case, I guess.

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 30, 2009, 12:31 p.m. UTC | #6
On Sat, 2009-05-30 at 14:07 +0200, Jarek Poplawski wrote:
> On Sat, May 30, 2009 at 07:56:34AM -0400, jamal wrote:

> > tp_created is the check
> > n->nlmsg_type == RTM_NEWTFILTER && n->nlmsg_flags&NLM_F_CREATE
> > replace will be
> > n->nlmsg_type == RTM_NEWTFILTER && n->nlmsg_flags & NLM_F_EXCL
> 
> Hmm... Probably I miss something, but I've just seen this prink during
> tc filter replace with:
> 
> err = tp->ops->change();
> if (n->nlmsg_type == RTM_NEWTFILTER && (n->nlmsg_flags&NLM_F_CREATE))
> 	printk(...);

That sounds right. 
Remeber, you could have NLM_F_EXCL|NLM_F_CREATE to indicate "create this
thing if it doesnt exist; if it exists  it is an error"
If it doesnt exist we will enter that (tp == NULL) path
also fh will be 0 ==> So you will never enter the code
path you are refering to.
If it exists (i.e you found it) and you enter the code path you refer
to, then you surely dont want to destroy it if NLM_F_EXCL is set.

> > I think they are two separate issues.
> > The fact that we dont destroy an allocated tp on failure is an issue
> > regardless of what cls_group does. In the case of Minoru's issue
> > it is because he is misconfiguring cls_group.
> 
> Sure, but we don't want people to get oops in such a case, I guess.
> 

The ops is caused by the code fixed in the patch - did i miss something?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 30, 2009, 12:45 p.m. UTC | #7
On Sat, May 30, 2009 at 08:31:23AM -0400, jamal wrote:
> On Sat, 2009-05-30 at 14:07 +0200, Jarek Poplawski wrote:
> > On Sat, May 30, 2009 at 07:56:34AM -0400, jamal wrote:
> 
> > > tp_created is the check
> > > n->nlmsg_type == RTM_NEWTFILTER && n->nlmsg_flags&NLM_F_CREATE
> > > replace will be
> > > n->nlmsg_type == RTM_NEWTFILTER && n->nlmsg_flags & NLM_F_EXCL
> > 
> > Hmm... Probably I miss something, but I've just seen this prink during
> > tc filter replace with:
> > 
> > err = tp->ops->change();
> > if (n->nlmsg_type == RTM_NEWTFILTER && (n->nlmsg_flags&NLM_F_CREATE))
> > 	printk(...);
> 
> That sounds right. 
> Remeber, you could have NLM_F_EXCL|NLM_F_CREATE to indicate "create this
> thing if it doesnt exist; if it exists  it is an error"
> If it doesnt exist we will enter that (tp == NULL) path
> also fh will be 0 ==> So you will never enter the code
> path you are refering to.
> If it exists (i.e you found it) and you enter the code path you refer
> to, then you surely dont want to destroy it if NLM_F_EXCL is set.

I mean we don't want to link it again or destroy after ->change() err
if we run replace (n->nlmsg_type == RTM_NEWTFILTER &&
(n->nlmsg_flags&NLM_F_CREATE)).

> 
> > > I think they are two separate issues.
> > > The fact that we dont destroy an allocated tp on failure is an issue
> > > regardless of what cls_group does. In the case of Minoru's issue
> > > it is because he is misconfiguring cls_group.
> > 
> > Sure, but we don't want people to get oops in such a case, I guess.
> > 
> 
> The ops is caused by the code fixed in the patch - did i miss something?

IMHO it could be fixed "old way" in cls_group code too.

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 30, 2009, 1:03 p.m. UTC | #8
On Sat, 2009-05-30 at 14:45 +0200, Jarek Poplawski wrote:
> On Sat, May 30, 2009 at 08:31:23AM -0400, jamal wrote:

> > Remeber, you could have NLM_F_EXCL|NLM_F_CREATE to indicate "create this
> > thing if it doesnt exist; if it exists  it is an error"
> > If it doesnt exist we will enter that (tp == NULL) path
> > also fh will be 0 ==> So you will never enter the code
> > path you are refering to.
> > If it exists (i.e you found it) and you enter the code path you refer
> > to, then you surely dont want to destroy it if NLM_F_EXCL is set.
> 
> I mean we don't want to link it again or destroy after ->change() err
> if we run replace (n->nlmsg_type == RTM_NEWTFILTER &&
> (n->nlmsg_flags&NLM_F_CREATE)).

excellent point: an additional flag is needed then
n->nlmsg_flags& (NLM_F_CREATE|NLM_F_EXCL).
Minoru, please add this change in the patch before testing...

> > The ops is caused by the code fixed in the patch - did i miss something?
> 
> IMHO it could be fixed "old way" in cls_group code too.

Is the code oopsing in cls_group? It didnt seem to be so to me, and
yes cls_group is quarky in its practise (but thats a separate issue)
and even if it did oops in cls_group - this change above is a memory
leak and needs to be fixed in -stable. 

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 30, 2009, 1:20 p.m. UTC | #9
On Sat, May 30, 2009 at 09:03:48AM -0400, jamal wrote:
> On Sat, 2009-05-30 at 14:45 +0200, Jarek Poplawski wrote:
> > On Sat, May 30, 2009 at 08:31:23AM -0400, jamal wrote:
...
> > > The ops is caused by the code fixed in the patch - did i miss something?
> > 
> > IMHO it could be fixed "old way" in cls_group code too.
> 
> Is the code oopsing in cls_group? It didnt seem to be so to me, and
> yes cls_group is quarky in its practise (but thats a separate issue)
> and even if it did oops in cls_group - this change above is a memory
> leak and needs to be fixed in -stable. 

Yes it oopses in cls_cgroup_classify(). Here is the first report:
http://permalink.gmane.org/gmane.linux.network/128593

We add/link unconfigured tp, but it could be destroyed later, so I
wouldn't call this a memory leak.

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 30, 2009, 1:31 p.m. UTC | #10
On Sat, 2009-05-30 at 15:20 +0200, Jarek Poplawski wrote:

> Yes it oopses in cls_cgroup_classify(). Here is the first report:
> http://permalink.gmane.org/gmane.linux.network/128593
> 

Oopsing in classify is after the fact though. It should not have been
linked to begin with (because wrong config was passed from user space).
As far as cls_group is concerned that is an illegitimate config - thats
why it failed it.

What were you suggesting to change in cls_group to avoid this oops?

> We add/link unconfigured tp, but it could be destroyed later, so I
> wouldn't call this a memory leak.

Indeed - call it "We add/link unconfigured tp".

Anyways, a nice sun just came out over here and i am off to run 
some chores. If you respond you will hear from me in a few hours.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 30, 2009, 2 p.m. UTC | #11
On Sat, May 30, 2009 at 09:31:37AM -0400, jamal wrote:
> On Sat, 2009-05-30 at 15:20 +0200, Jarek Poplawski wrote:
> 
> > Yes it oopses in cls_cgroup_classify(). Here is the first report:
> > http://permalink.gmane.org/gmane.linux.network/128593
> > 
> 
> Oopsing in classify is after the fact though. It should not have been
> linked to begin with (because wrong config was passed from user space).
> As far as cls_group is concerned that is an illegitimate config - thats
> why it failed it.
> 
> What were you suggesting to change in cls_group to avoid this oops?

I think checking the head (tp->root) for NULL like in cls_fw or
cls_route should work.

> 
> > We add/link unconfigured tp, but it could be destroyed later, so I
> > wouldn't call this a memory leak.
> 
> Indeed - call it "We add/link unconfigured tp".

Anyway, it's worked for other classifiers like this for some time...

> Anyways, a nice sun just came out over here and i am off to run 
> some chores. If you respond you will hear from me in a few hours.

Sure, we could probably stop this till Monday.

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Minoru Usui May 30, 2009, 10:22 p.m. UTC | #12
Hi, jamal

2009/5/30 jamal <hadi@cyberus.ca>
>
> On Sat, 2009-05-30 at 14:45 +0200, Jarek Poplawski wrote:
> > On Sat, May 30, 2009 at 08:31:23AM -0400, jamal wrote:
>
> > > Remeber, you could have NLM_F_EXCL|NLM_F_CREATE to indicate "create this
> > > thing if it doesnt exist; if it exists  it is an error"
> > > If it doesnt exist we will enter that (tp == NULL) path
> > > also fh will be 0 ==> So you will never enter the code
> > > path you are refering to.
> > > If it exists (i.e you found it) and you enter the code path you refer
> > > to, then you surely dont want to destroy it if NLM_F_EXCL is set.
> >
> > I mean we don't want to link it again or destroy after ->change() err
> > if we run replace (n->nlmsg_type == RTM_NEWTFILTER &&
> > (n->nlmsg_flags&NLM_F_CREATE)).
>
> excellent point: an additional flag is needed then
> n->nlmsg_flags& (NLM_F_CREATE|NLM_F_EXCL).
> Minoru, please add this change in the patch before testing...

OK.
I'll do it after understanding above code.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Minoru Usui May 30, 2009, 10:38 p.m. UTC | #13
Hi, jarek

2009/5/30 Jarek Poplawski <jarkao2@gmail.com>:
> On Sat, May 30, 2009 at 09:31:37AM -0400, jamal wrote:
>> On Sat, 2009-05-30 at 15:20 +0200, Jarek Poplawski wrote:
>>
>> > Yes it oopses in cls_cgroup_classify(). Here is the first report:
>> > http://permalink.gmane.org/gmane.linux.network/128593
>> >
>>
>> Oopsing in classify is after the fact though. It should not have been
>> linked to begin with (because wrong config was passed from user space).
>> As far as cls_group is concerned that is an illegitimate config - thats
>> why it failed it.
>>
>> What were you suggesting to change in cls_group to avoid this oops?
>
> I think checking the head (tp->root) for NULL like in cls_fw or
> cls_route should work.

I agree.
I think cls_cgroup should check head(tp->root) whether NULL or not
like other classifiers, too.

But I think it's problem adding/linking unconfigured tp, too.

>>
>> > We add/link unconfigured tp, but it could be destroyed later, so I
>> > wouldn't call this a memory leak.
>>
>> Indeed - call it "We add/link unconfigured tp".
>
> Anyway, it's worked for other classifiers like this for some time...

I'm sorry, I don't investigate well, so I don't understand where
unconfigured tp is destroyed, yet.
If you don't mind, could you tell me where it's destoryed?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 30, 2009, 11:08 p.m. UTC | #14
On Sat, 2009-05-30 at 16:00 +0200, Jarek Poplawski wrote:
> On Sat, May 30, 2009 at 09:31:37AM -0400, jamal wrote:
> > What were you suggesting to change in cls_group to avoid this oops?
> 
> I think checking the head (tp->root) for NULL like in cls_fw or
> cls_route should work.

IMHO that is a workaround for the tp linking bug
[IOW, there is no need to check for tp->root == NULL (in the fast path)
if such an illegal tp was never linked to begin with (on the slow
path)].

So those classifiers you point to need to be fixed afterwards (but 
not -stable material).
My thinking of fixing them to do proper init/get as well later.

> Anyway, it's worked for other classifiers like this for some time...

Would you agree that it is/was a bandaid?
Or maybe you have some other fear that this may break something else and
prefer the workaround instead?

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 30, 2009, 11:13 p.m. UTC | #15
On Sun, 2009-05-31 at 07:22 +0900, Minoru Usui wrote:

> I'll do it after understanding above code.

Current patch has two places with:

+               if (n->nlmsg_type == RTM_NEWTFILTER &&
+                   (n->nlmsg_flags&NLM_F_CREATE)) {

The change (in the two spots) is:
+               if (n->nlmsg_type == RTM_NEWTFILTER &&
+                   (n->nlmsg_flags&(NLM_F_CREATE|NLM_F_EXCL))) {

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 30, 2009, 11:34 p.m. UTC | #16
On Sun, 2009-05-31 at 07:38 +0900, Minoru Usui wrote:

> I agree.
> I think cls_cgroup should check head(tp->root) whether NULL or not
> like other classifiers, too.
> 

I dont think this is necessary if the adding/linking unconfigured tp
doesnt happen on failed config.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Minoru Usui May 31, 2009, 1:45 a.m. UTC | #17
2009/5/31 jamal <hadi@cyberus.ca>:
> On Sun, 2009-05-31 at 07:38 +0900, Minoru Usui wrote:
>
>> I agree.
>> I think cls_cgroup should check head(tp->root) whether NULL or not
>> like other classifiers, too.
>>
>
> I dont think this is necessary if the adding/linking unconfigured tp
> doesnt happen on failed config.

Ah, you are right.
Sorry, I don't explain enough.

I mean it needs just for safety.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 31, 2009, 7:55 a.m. UTC | #18
On Sat, May 30, 2009 at 07:08:53PM -0400, jamal wrote:
> On Sat, 2009-05-30 at 16:00 +0200, Jarek Poplawski wrote:
> > On Sat, May 30, 2009 at 09:31:37AM -0400, jamal wrote:
> > > What were you suggesting to change in cls_group to avoid this oops?
> > 
> > I think checking the head (tp->root) for NULL like in cls_fw or
> > cls_route should work.
> 
> IMHO that is a workaround for the tp linking bug
> [IOW, there is no need to check for tp->root == NULL (in the fast path)
> if such an illegal tp was never linked to begin with (on the slow
> path)].
> 
> So those classifiers you point to need to be fixed afterwards (but 
> not -stable material).
> My thinking of fixing them to do proper init/get as well later.

Sure, after fixing it properly we should get rid of unneeded checks.

> > Anyway, it's worked for other classifiers like this for some time...
> 
> Would you agree that it is/was a bandaid?
> Or maybe you have some other fear that this may break something else and
> prefer the workaround instead?

If somebody decided to do it this way instead of the "proper" fix then
it looks to me more like a bandaid "by design". And, yes, I have some
fear we could break some strange configs, which could depend on this
wrong but working design.

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 31, 2009, 8:07 a.m. UTC | #19
On Sat, May 30, 2009 at 07:13:14PM -0400, jamal wrote:
> On Sun, 2009-05-31 at 07:22 +0900, Minoru Usui wrote:
> 
> > I'll do it after understanding above code.
> 
> Current patch has two places with:
> 
> +               if (n->nlmsg_type == RTM_NEWTFILTER &&
> +                   (n->nlmsg_flags&NLM_F_CREATE)) {
> 
> The change (in the two spots) is:
> +               if (n->nlmsg_type == RTM_NEWTFILTER &&
> +                   (n->nlmsg_flags&(NLM_F_CREATE|NLM_F_EXCL))) {
> 

Sorry, but I don't think this change is enough; tc filter replace
with only this (n->nlmsg_type == RTM_NEWTFILTER &&
(n->nlmsg_flags&(NLM_F_CREATE))) can get here with an "old" tp
and will relink it or destroy depending on the ->change() return.

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 31, 2009, 8:12 a.m. UTC | #20
On Sat, May 30, 2009 at 07:50:18AM -0400, jamal wrote:
> On Sat, 2009-05-30 at 00:59 +0200, Jarek Poplawski wrote:
>   
> > ...
> >         } else {
> >                 switch (n->nlmsg_type) {
> >                 case RTM_NEWTFILTER:
> >                         err = -EEXIST;
> >                         if (n->nlmsg_flags & NLM_F_EXCL)
> >                                 goto errout;
> >                         break;
> > 
> > Probably this case needs tcf_destroy() too.
> 
> No - that if stmnt will fail if this is a new filter being
> created.

If somebody runs tc add filter with a new priority but existing handle
a newly created (and not linked now) tc would be handled with goto
errout and would leak, I guess.

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 31, 2009, 8:15 a.m. UTC | #21
On Sun, May 31, 2009 at 07:38:51AM +0900, Minoru Usui wrote:
> Hi, jarek
Hi, Minoru
...
> I'm sorry, I don't investigate well, so I don't understand where
> unconfigured tp is destroyed, yet.
> If you don't mind, could you tell me where it's destoryed?

Just like configured ones, by deleting it or the whole tree.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 31, 2009, 1:17 p.m. UTC | #22
On Sun, 2009-05-31 at 09:55 +0200, Jarek Poplawski wrote:

> Sure, after fixing it properly we should get rid of unneeded checks.

> > > Anyway, it's worked for other classifiers like this for some time...
> > 
> > Would you agree that it is/was a bandaid?
> > Or maybe you have some other fear that this may break something else and
> > prefer the workaround instead?
> 
> If somebody decided to do it this way instead of the "proper" fix then
> it looks to me more like a bandaid "by design". 

I think your and my definition of "proper" are at odds here ;->
My logic says there's a causality effect and you always fix the cause
in such a situation. 
Here's the anology of our conversation (some captured above) as i see it
centred around a bug of leaky pipe which just messed up the carpet
overnight in some room:
 
handyman Jamal: fix the pipe so it doesnt leak.
handyman Jarek: put a little bucket below the dripping spot
handyman Jamal: fixing this pipe is cheap - it is on warranty too
handyman Jarek: I fixed two other rooms by putting buckets
handyman Jamal: Proper: fix the pipe - you dont need the buckets forever
handyman Jarek: Proper fix is to put the bucket below any drip

Of course i am exagerating to make a point on our logical approaches:
leaking pipes dont quiet match software bugs i.e.
a dripping pipe will have the short term sense of emergency of needing
buckets but in this case the cost of the damage and time is the same
if you put a bucket or fixed the pipe.

> And, yes, I have some
> fear we could break some strange configs, which could depend on this
> wrong but working design.

To continue our discussion
handyman Jamal: What is your fear about fixing the pipe?
handyman Jarek: Someone may have plants which depend on the drips;
so if you fix it his plants wont get water anymore.

I hope the above doesnt offend you - it is meant in good spirit.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 31, 2009, 1:24 p.m. UTC | #23
On Sun, 2009-05-31 at 10:12 +0200, Jarek Poplawski wrote:

> If somebody runs tc add filter with a new priority but existing handle
> a newly created (and not linked now) tc would be handled with goto
> errout and would leak, I guess.

This would imply the classifier is buggy. I will stare at the different
classifier - and if any exhibits such traits it needs to be fixed

> The change (in the two spots) is:
> > +               if (n->nlmsg_type == RTM_NEWTFILTER &&
> > +                   (n->nlmsg_flags&(NLM_F_CREATE|NLM_F_EXCL))) {
> > 
> 
> Sorry, but I don't think this change is enough; tc filter replace
> with only this (n->nlmsg_type == RTM_NEWTFILTER &&
> (n->nlmsg_flags&(NLM_F_CREATE))) can get here with an "old" tp
> and will relink it or destroy depending on the ->change() return.
> 

Excellent point - there could be buggy user space apps that will do
that. Minoru change the check to:

+               if (n->nlmsg_type == RTM_NEWTFILTER &&
+                   (n->nlmsg_flags&NLM_F_CREATE &&
+                    n->nlmsg_flags&NLM_F_EXCL)) {

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 31, 2009, 1:43 p.m. UTC | #24
On Sun, 2009-05-31 at 09:24 -0400, jamal wrote:

> This would imply the classifier is buggy. I will stare at the different
> classifier - and if any exhibits such traits it needs to be fixed

I couldnt find any case where this is possible in the current code. If
you have a specific example, we need to fix it.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 31, 2009, 7:55 p.m. UTC | #25
On Sun, May 31, 2009 at 09:24:53AM -0400, jamal wrote:
> On Sun, 2009-05-31 at 10:12 +0200, Jarek Poplawski wrote:
> 
> > If somebody runs tc add filter with a new priority but existing handle
> > a newly created (and not linked now) tc would be handled with goto
> > errout and would leak, I guess.
> 
> This would imply the classifier is buggy. I will stare at the different
> classifier - and if any exhibits such traits it needs to be fixed
> 
> > The change (in the two spots) is:
> > > +               if (n->nlmsg_type == RTM_NEWTFILTER &&
> > > +                   (n->nlmsg_flags&(NLM_F_CREATE|NLM_F_EXCL))) {
> > > 
> > 
> > Sorry, but I don't think this change is enough; tc filter replace
> > with only this (n->nlmsg_type == RTM_NEWTFILTER &&
> > (n->nlmsg_flags&(NLM_F_CREATE))) can get here with an "old" tp
> > and will relink it or destroy depending on the ->change() return.
> > 
> 
> Excellent point - there could be buggy user space apps that will do
> that. Minoru change the check to:
> 
> +               if (n->nlmsg_type == RTM_NEWTFILTER &&
> +                   (n->nlmsg_flags&NLM_F_CREATE &&
> +                    n->nlmsg_flags&NLM_F_EXCL)) {


But then, there could be "tc filter replace" with only this
(n->nlmsg_type == RTM_NEWTFILTER && (n->nlmsg_flags&(NLM_F_CREATE)))
which can't get here with a newly created tp, I guess.

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 31, 2009, 8:33 p.m. UTC | #26
On Sun, May 31, 2009 at 09:17:10AM -0400, jamal wrote:
> On Sun, 2009-05-31 at 09:55 +0200, Jarek Poplawski wrote:
> 
> > Sure, after fixing it properly we should get rid of unneeded checks.
> 
> > > > Anyway, it's worked for other classifiers like this for some time...
> > > 
> > > Would you agree that it is/was a bandaid?
> > > Or maybe you have some other fear that this may break something else and
> > > prefer the workaround instead?
> > 
> > If somebody decided to do it this way instead of the "proper" fix then
> > it looks to me more like a bandaid "by design". 
> 
> I think your and my definition of "proper" are at odds here ;->
> My logic says there's a causality effect and you always fix the cause
> in such a situation. 
> Here's the anology of our conversation (some captured above) as i see it
> centred around a bug of leaky pipe which just messed up the carpet
> overnight in some room:
>  
> handyman Jamal: fix the pipe so it doesnt leak.
> handyman Jarek: put a little bucket below the dripping spot
> handyman Jamal: fixing this pipe is cheap - it is on warranty too
> handyman Jarek: I fixed two other rooms by putting buckets
> handyman Jamal: Proper: fix the pipe - you dont need the buckets forever
> handyman Jarek: Proper fix is to put the bucket below any drip
> 
> Of course i am exagerating to make a point on our logical approaches:
> leaking pipes dont quiet match software bugs i.e.
> a dripping pipe will have the short term sense of emergency of needing
> buckets but in this case the cost of the damage and time is the same
> if you put a bucket or fixed the pipe.

Yes, you're exagerating; my point is:
handyman Jarek: Proper fix is to fix the pipe, but since this hole
is so old and there were other handymans who couldn't miss this let's
do it with caution (2 way).

> 
> > And, yes, I have some
> > fear we could break some strange configs, which could depend on this
> > wrong but working design.
> 
> To continue our discussion
> handyman Jamal: What is your fear about fixing the pipe?
> handyman Jarek: Someone may have plants which depend on the drips;
> so if you fix it his plants wont get water anymore.
> 
> I hope the above doesnt offend you - it is meant in good spirit.

No, dead plants don't offend me. But, if there is any kitten harmed -
better beware! ;-)

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski May 31, 2009, 9:12 p.m. UTC | #27
On Sun, May 31, 2009 at 09:43:47AM -0400, jamal wrote:
> On Sun, 2009-05-31 at 09:24 -0400, jamal wrote:
> 
> > This would imply the classifier is buggy. I will stare at the different
> > classifier - and if any exhibits such traits it needs to be fixed
> 
> I couldnt find any case where this is possible in the current code. If
> you have a specific example, we need to fix it.

I'm almost sure these commands (right or wrong) can trigger such a leak:

$ sudo tc qdisc add dev lo root handle 1: htb
$ sudo tc filter add dev lo proto ip pref 1 handle 800::1 u32 match u32 0 0 classid 1:1
$ sudo tc filter add dev lo proto ip pref 2 handle 800::1 u32 match u32 0 0 classid 1:1
RTNETLINK answers: File exists

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal May 31, 2009, 11:40 p.m. UTC | #28
On Sun, 2009-05-31 at 21:55 +0200, Jarek Poplawski wrote:

> But then, there could be "tc filter replace" with only this
> (n->nlmsg_type == RTM_NEWTFILTER && (n->nlmsg_flags&(NLM_F_CREATE)))
> which can't get here with a newly created tp, I guess.

Right - because in that case tp would already exist and would
be found in the lookup.

> I'm almost sure these commands (right or wrong) can trigger such a 
> leak:
> 
> $ sudo tc qdisc add dev lo root handle 1: htb
> $ sudo tc filter add dev lo proto ip pref 1 handle 800::1 u32 match
u32 0 0 classid 1:1
> $ sudo tc filter add dev lo proto ip pref 2 handle 800::1 u32 match
u32 0 0 classid 1:1
> RTNETLINK answers: File exists

You are good - A tip of my hat and a wag of my finger at you;->
Ok, now i looked a lot closer at all 8 classifiers
and u32 is the only one that can fault. It just needs to be fixed.
I will wait until Minoru's patch and then i will submit a fix for it.
Of course the commands from user space are a little rude ;->

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski June 1, 2009, 6:06 a.m. UTC | #29
On Sun, May 31, 2009 at 07:40:16PM -0400, jamal wrote:
> On Sun, 2009-05-31 at 21:55 +0200, Jarek Poplawski wrote:
> 
> > But then, there could be "tc filter replace" with only this
> > (n->nlmsg_type == RTM_NEWTFILTER && (n->nlmsg_flags&(NLM_F_CREATE)))
> > which can't get here with a newly created tp, I guess.
> 
> Right - because in that case tp would already exist and would
> be found in the lookup.

But how about that (of course extremely rude) case "tc filter replace"
is run with a new prio?

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0759f32..8760a48 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -266,11 +266,6 @@  replay:
 			goto errout;
 		}
 
-		spin_lock_bh(root_lock);
-		tp->next = *back;
-		*back = tp;
-		spin_unlock_bh(root_lock);
-
 	} else if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind))
 		goto errout;
 
@@ -314,8 +309,21 @@  replay:
 	}
 
 	err = tp->ops->change(tp, cl, t->tcm_handle, tca, &fh);
-	if (err == 0)
+	if (err == 0) {
+		if (n->nlmsg_type == RTM_NEWTFILTER &&
+		    (n->nlmsg_flags&NLM_F_CREATE)) {
+			spin_lock_bh(root_lock);
+			tp->next = *back;
+			*back = tp;
+			spin_unlock_bh(root_lock);
+		}
 		tfilter_notify(skb, n, tp, fh, RTM_NEWTFILTER);
+	} else {
+		if (n->nlmsg_type == RTM_NEWTFILTER &&
+		    (n->nlmsg_flags&NLM_F_CREATE)) {
+			tcf_destroy(tp);
+		}
+	}
 
 errout:
 	if (cl)