From patchwork Fri Mar 14 17:49:57 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 330414 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id B66492C00B8 for ; Sat, 15 Mar 2014 04:50:17 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754776AbaCNRuI (ORCPT ); Fri, 14 Mar 2014 13:50:08 -0400 Received: from e34.co.us.ibm.com ([32.97.110.152]:44942 "EHLO e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754478AbaCNRuE (ORCPT ); Fri, 14 Mar 2014 13:50:04 -0400 Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 14 Mar 2014 11:50:04 -0600 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 14 Mar 2014 11:50:02 -0600 Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 604A11FF0029 for ; Fri, 14 Mar 2014 11:50:01 -0600 (MDT) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by b03cxnp08027.gho.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s2EHnUfA11338108 for ; Fri, 14 Mar 2014 18:49:30 +0100 Received: from d03av01.boulder.ibm.com (localhost [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s2EHo06Z001992 for ; Fri, 14 Mar 2014 11:50:01 -0600 Received: from lab1.dls ([9.80.50.47]) by d03av01.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s2EHnx1W001897 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 14 Mar 2014 11:50:00 -0600 Received: from lab1 (localhost.localdomain [127.0.0.1]) by lab1.dls (8.14.7/8.14.7) with ESMTP id s2EHnvgc030976; Fri, 14 Mar 2014 13:49:58 -0400 Message-Id: <201403141749.s2EHnvgc030976@lab1.dls> From: David L Stevens To: David Miller , Stephen Hemminger cc: netdev@vger.kernel.org Subject: [PATCH VXLAN] fix nonfunctional neigh_reduce Date: Fri, 14 Mar 2014 13:49:57 -0400 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14031417-1542-0000-0000-0000005F869C Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The VXLAN neigh_reduce() code is completely non-functional since check-in. Specific errors: 1) The original code drops all packets with a multicast destination address, even though neighbor solicitations are sent to the solicited-node address, a multicast address. The code after this check was never run. 2) The neighbor table lookup used the IPv6 header destination, which is the solicited node address, rather than the target address from the neighbor solicitation. So neighbor lookups would always fail if it got this far. Also for L3MISSes. 3) The code calls ndisc_send_na(), which does a send on the tunnel device. The context for neigh_reduce() is the transmit path, vxlan_xmit(), where the host or a bridge-attached neighbor is trying to transmit a neighbor solicitation. To respond to it, the tunnel endpoint needs to do a *receive* of the appropriate neighbor advertisement. Doing a send, would only try to send the advertisement, encapsulated, to the remote destinations in the fdb -- hosts that definitely did not do the corresponding solicitation. 4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the isrouter flag in the advertisement. This has nothing to do with whether or not the target is a router, and generally won't be set since the tunnel endpoint is bridging, not routing, traffic. The patch below creates a proxy neighbor advertisement to respond to neighbor solicitions as intended, providing proper IPv6 support for neighbor reduction. Signed-Off-By: David L Stevens --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index eb59b14..b905d57 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1336,14 +1336,109 @@ out: } #if IS_ENABLED(CONFIG_IPV6) + +static struct sk_buff *vxlan_na_create(struct sk_buff *request, + struct neighbour *n, bool isrouter) +{ + struct net_device *dev = request->dev; + struct sk_buff *reply = NULL; + struct nd_msg *ns, *na; + struct ipv6hdr *pip6 = ipv6_hdr(reply); + u8 *daddr; + int olen = 8; /* opt hdr + ETH_ALEN for target */ + int i, len; + + if (dev == NULL) + return NULL; + + ns = (struct nd_msg *)skb_transport_header(request); + + len = LL_RESERVED_SPACE(dev) + sizeof(struct ipv6hdr) + + sizeof(*na) + olen + dev->needed_tailroom; + reply = alloc_skb(len, GFP_ATOMIC); + if (reply == NULL) + goto out; + + reply->protocol = htons(ETH_P_IPV6); + reply->dev = dev; + skb_reserve(reply, LL_RESERVED_SPACE(request->dev)); + skb_push(reply, sizeof(struct ethhdr)); + skb_set_mac_header(reply, 0); + + daddr = eth_hdr(request)->h_source; + olen = request->len - skb_transport_offset(request) - sizeof(*ns); + for (i = 0; i < olen-1; i += (ns->opt[i+1]<<3)) { + if (ns->opt[i] == ND_OPT_SOURCE_LL_ADDR) { + daddr = ns->opt + i + sizeof(struct nd_opt_hdr); + break; + } + } + + /* Ethernet header */ + memcpy(eth_hdr(reply)->h_dest, daddr, ETH_ALEN); + memcpy(eth_hdr(reply)->h_source, n->ha, ETH_ALEN); + eth_hdr(reply)->h_proto = htons(ETH_P_IPV6); + reply->protocol = htons(ETH_P_IPV6); + + skb_pull(reply, sizeof(struct ethhdr)); + skb_set_network_header(reply, 0); + skb_put(reply, sizeof(struct ipv6hdr)); + + /* IPv6 header */ + + pip6 = ipv6_hdr(reply); + memset(pip6, 0, sizeof(struct ipv6hdr)); + pip6->version = 6; + pip6->priority = ipv6_hdr(request)->priority; + pip6->nexthdr = IPPROTO_ICMPV6; + pip6->hop_limit = 255; + pip6->daddr = ipv6_hdr(request)->saddr; + pip6->saddr = *(struct in6_addr *)n->primary_key; + + skb_pull(reply, sizeof(struct ipv6hdr)); + skb_set_transport_header(reply, 0); + + olen = 8; /* ND_OPT_TARGET_LL_ADDR */ + na = (struct nd_msg *)skb_put(reply, sizeof(*na) + olen); + + /* Neighbor Advertisement */ + memset(na, 0, sizeof(*na)+olen); + na->icmph.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT; + na->icmph.icmp6_router = isrouter; + na->icmph.icmp6_override = 1; + na->icmph.icmp6_solicited = 1; + na->target = ns->target; + memcpy(&na->opt[2], n->ha, ETH_ALEN); + na->opt[0] = ND_OPT_TARGET_LL_ADDR; + na->opt[1] = 1; /* 8 bytes */ + + na->icmph.icmp6_cksum = csum_ipv6_magic(&pip6->saddr, + &pip6->daddr, sizeof(*na)+olen, IPPROTO_ICMPV6, + csum_partial(na, sizeof(*na)+olen, 0)); + + pip6->payload_len = htons(sizeof(*na)+olen); + + skb_push(reply, sizeof(struct ipv6hdr)); + + reply->ip_summed = CHECKSUM_UNNECESSARY; + + return reply; +out: + if (reply) + kfree_skb(reply); + return 0; +} + static int neigh_reduce(struct net_device *dev, struct sk_buff *skb) { struct vxlan_dev *vxlan = netdev_priv(dev); + struct vxlan_fdb *f; struct neighbour *n; union vxlan_addr ipa; const struct ipv6hdr *iphdr; const struct in6_addr *saddr, *daddr; struct nd_msg *msg; + struct sk_buff *reply; struct inet6_dev *in6_dev = NULL; in6_dev = __in6_dev_get(dev); @@ -1357,8 +1452,7 @@ static int neigh_reduce(struct net_device *dev, struct sk_buff *skb) saddr = &iphdr->saddr; daddr = &iphdr->daddr; - if (ipv6_addr_loopback(daddr) || - ipv6_addr_is_multicast(daddr)) + if (ipv6_addr_loopback(daddr)) goto out; msg = (struct nd_msg *)skb_transport_header(skb); @@ -1366,33 +1460,35 @@ static int neigh_reduce(struct net_device *dev, struct sk_buff *skb) msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION) goto out; - n = neigh_lookup(ipv6_stub->nd_tbl, daddr, dev); + n = neigh_lookup(ipv6_stub->nd_tbl, &msg->target, dev); - if (n) { - struct vxlan_fdb *f; - - if (!(n->nud_state & NUD_CONNECTED)) { - neigh_release(n); - goto out; + if (!n) { + if (vxlan->flags & VXLAN_F_L3MISS) { + ipa.sin6.sin6_addr = msg->target; + ipa.sa.sa_family = AF_INET6; + vxlan_ip_miss(dev, &ipa); } + goto out; + } - f = vxlan_find_mac(vxlan, n->ha); - if (f && vxlan_addr_any(&(first_remote_rcu(f)->remote_ip))) { - /* bridge-local neighbor */ - neigh_release(n); - goto out; - } + if (!(n->nud_state & NUD_CONNECTED)) { + neigh_release(n); + goto out; + } - ipv6_stub->ndisc_send_na(dev, n, saddr, &msg->target, - !!in6_dev->cnf.forwarding, - true, false, false); + f = vxlan_find_mac(vxlan, n->ha); + if (f && vxlan_addr_any(&(first_remote_rcu(f)->remote_ip))) { + /* bridge-local neighbor */ neigh_release(n); - } else if (vxlan->flags & VXLAN_F_L3MISS) { - ipa.sin6.sin6_addr = *daddr; - ipa.sa.sa_family = AF_INET6; - vxlan_ip_miss(dev, &ipa); + goto out; } + reply = vxlan_na_create(skb, n, !!(f ? f->flags & NTF_ROUTER : 0)); + + neigh_release(n); + + if (reply && netif_rx_ni(reply) == NET_RX_DROP) + dev->stats.rx_dropped++; out: consume_skb(skb); return NETDEV_TX_OK;