Message ID | 20170817000251.38469-1-mahesh@bandewar.net |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On (08/16/17 17:02), Mahesh Bandewar wrote: > > From: Mahesh Bandewar <maheshb@google.com> > > If the ARP processing creates a neigh entry, it's immediately marked > as STALE without timer and stays that way in that state as long as > host do not send traffic to that neighbour. Perhaps I dont understand the specific packet exchange case that your patch is trying to fix, but if the neighbor entry is created as a result of an incoming packet (but we have not yet sent anything to this neighbor) then it should be marked STALE? IOW, STALE means "Ingress path claims this adjacency, but egress path has not been verified". Is the problem that the neigh never goes into PROBE? > + if (neigh) { > + if (neigh->nud_state & NUD_VALID) > + neigh_update(neigh, lladdr, NUD_STALE, > + NEIGH_UPDATE_F_OVERRIDE, 0); > + else > + neigh_event_send(neigh, NULL); > + } NUD_VALID is already a mask of (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY) are you sure you want to force the transition of probe/delay -> stale here? Maybe it woudl help to describe the exact wire packet exchange that is broken today, but fixed by your patch. --Sowmini
Hi, 2017-08-17 9:02 GMT+09:00 Mahesh Bandewar <mahesh@bandewar.net>: > From: Mahesh Bandewar <maheshb@google.com> > > If the ARP processing creates a neigh entry, it's immediately marked > as STALE without timer and stays that way in that state as long as > host do not send traffic to that neighbour. > > I observed this on hosts which are in IPv6 environment, where there is > very little to no IPv4 traffic and neigh-entries are stuck in STALE > mode. Ideally, the host should have PROBEd these neighbours before it > can send the first packet out. No, we do not probe neighbors until we have packet for/through it. > > It happens as a result of following call sequence in an environment > where host is mostly quiet as far as IPv4 traffic but few connected > hosts/gateways are sending ARPs. > > arp_process() > neigh_event_ns() > neigh_lookup() > neigh_create() > neigh_alloc() > nud_state=NUD_NONE > neigh_update(nud_state=NUD_STALE) > > In the above scenario, the neighbour entry does not get a chance to get > PROBEd as subsequent call to neigh_update() marks this entry STALE. > This patch initializes the neigh-entry correctly if it was created as a > result of neigh_lookup instead of just updating it in neigh_event_ns() > right after creating it. > > Signed-off-by: Mahesh Bandewar <maheshb@google.com> > --- > net/core/neighbour.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 16a1a4c4eb57..d8a35db6c43b 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -1300,9 +1300,13 @@ struct neighbour *neigh_event_ns(struct neigh_table *tbl, > { > struct neighbour *neigh = __neigh_lookup(tbl, saddr, dev, > lladdr || !dev->addr_len); > - if (neigh) > - neigh_update(neigh, lladdr, NUD_STALE, > - NEIGH_UPDATE_F_OVERRIDE, 0); > + if (neigh) { > + if (neigh->nud_state & NUD_VALID) > + neigh_update(neigh, lladdr, NUD_STALE, > + NEIGH_UPDATE_F_OVERRIDE, 0); > + else > + neigh_event_send(neigh, NULL); > + } > return neigh; > } > EXPORT_SYMBOL(neigh_event_ns); > -- > 2.14.1.480.gb18f417b89-goog >
On Wed, Aug 16, 2017 at 5:32 PM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote: > On (08/16/17 17:02), Mahesh Bandewar wrote: >> >> From: Mahesh Bandewar <maheshb@google.com> >> >> If the ARP processing creates a neigh entry, it's immediately marked >> as STALE without timer and stays that way in that state as long as >> host do not send traffic to that neighbour. > > Perhaps I dont understand the specific packet exchange case > that your patch is trying to fix, but if the neighbor entry > is created as a result of an incoming packet (but we have not > yet sent anything to this neighbor) then it should be marked STALE? > IOW, STALE means "Ingress path claims this adjacency, but egress > path has not been verified". Is the problem that the neigh never > goes into PROBE? > Correct. The entry gets created (NUD_NONE) and few jiffies later it gets marked as STALE. It's doesn't even get a chance to get PROBEd. >> + if (neigh) { >> + if (neigh->nud_state & NUD_VALID) >> + neigh_update(neigh, lladdr, NUD_STALE, >> + NEIGH_UPDATE_F_OVERRIDE, 0); >> + else >> + neigh_event_send(neigh, NULL); >> + } > > NUD_VALID is already a mask of > (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY) > are you sure you want to force the transition of probe/delay -> stale > here? It's not forced and neigh_update() has guards / provision to consider these type of transition and wont force it. Just that if the state is not-valid (NUD_INCOMPLETE | NUD_NONE) etc then it marks it STALE and deletes the timer. > Maybe it woudl help to describe the exact wire packet > exchange that is broken today, but fixed by your patch. > > --Sowmini > >
From: Mahesh Bandewar <mahesh@bandewar.net> Date: Wed, 16 Aug 2017 17:02:51 -0700 > From: Mahesh Bandewar <maheshb@google.com> > > If the ARP processing creates a neigh entry, it's immediately marked > as STALE without timer and stays that way in that state as long as > host do not send traffic to that neighbour. > > I observed this on hosts which are in IPv6 environment, where there is > very little to no IPv4 traffic and neigh-entries are stuck in STALE > mode. Ideally, the host should have PROBEd these neighbours before it > can send the first packet out. > > It happens as a result of following call sequence in an environment > where host is mostly quiet as far as IPv4 traffic but few connected > hosts/gateways are sending ARPs. > > arp_process() > neigh_event_ns() > neigh_lookup() > neigh_create() > neigh_alloc() > nud_state=NUD_NONE > neigh_update(nud_state=NUD_STALE) > > In the above scenario, the neighbour entry does not get a chance to get > PROBEd as subsequent call to neigh_update() marks this entry STALE. > This patch initializes the neigh-entry correctly if it was created as a > result of neigh_lookup instead of just updating it in neigh_event_ns() > right after creating it. > > Signed-off-by: Mahesh Bandewar <maheshb@google.com> Please provide, in the commit message, a list of events including an example packet exchange and timers firing which explains how this situation arises. Thank you.
diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 16a1a4c4eb57..d8a35db6c43b 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1300,9 +1300,13 @@ struct neighbour *neigh_event_ns(struct neigh_table *tbl, { struct neighbour *neigh = __neigh_lookup(tbl, saddr, dev, lladdr || !dev->addr_len); - if (neigh) - neigh_update(neigh, lladdr, NUD_STALE, - NEIGH_UPDATE_F_OVERRIDE, 0); + if (neigh) { + if (neigh->nud_state & NUD_VALID) + neigh_update(neigh, lladdr, NUD_STALE, + NEIGH_UPDATE_F_OVERRIDE, 0); + else + neigh_event_send(neigh, NULL); + } return neigh; } EXPORT_SYMBOL(neigh_event_ns);