Message ID | 19363.14702.909265.380669@gargle.gargle.HOWL |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Le vendredi 19 mars 2010 à 09:44 +0100, Robert Olsson a écrit : > > Hi, > Here is patch to manipulate packet node allocation and implicitly > how packets are DMA'd etc. > > The flag NODE_ALLOC enables the function and numa_node_id(); > when enabled it can also be explicitly controlled via a new > node parameter > > Tested this with 10 Intel 82599 ports w. TYAN S7025 E5520 CPU's. > Was able to TX/DMA ~80 Gbit/s to Ethernet wires. > > Cheers > --ro > I cannot understand how this can help. __netdev_alloc_skb() is supposed to already take into account NUMA properties : int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1; If this doesnt work, we should correct core stack, not only pktgen :) Are you allocating memory in the node where pktgen CPU is running or the node close to the NIC ? Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet writes: > Le vendredi 19 mars 2010 à 09:44 +0100, Robert Olsson a écrit : > > I cannot understand how this can help. > > __netdev_alloc_skb() is supposed to already take into account NUMA > properties : > > int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1; > > If this doesnt work, we should correct core stack, not only pktgen :) > > Are you allocating memory in the node where pktgen CPU is running or the > node close to the NIC ? I didn't say it should help the idea was to give some hooks to experiment and see effects with different node memory allocations. There are many degrees of freedom wrt buses(device)/CPU/menory. Cheers --ro -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 19 mars 2010 à 14:35 +0100, robert@herjulf.net a écrit : > Eric Dumazet writes: > > Le vendredi 19 mars 2010 à 09:44 +0100, Robert Olsson a écrit : > > > > I cannot understand how this can help. > > > > __netdev_alloc_skb() is supposed to already take into account NUMA > > properties : > > > > int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1; > > > > If this doesnt work, we should correct core stack, not only pktgen :) > > > > Are you allocating memory in the node where pktgen CPU is running or the > > node close to the NIC ? > > I didn't say it should help the idea was to give some hooks to > experiment and see effects with different node memory allocations. > There are many degrees of freedom wrt buses(device)/CPU/menory. > Well, you said "Tested this with 10 Intel 82599 ports w. TYAN S7025 E5520 CPU's. Was able to TX/DMA ~80 Gbit/s to Ethernet wires." I am interested to know what particular setup you did to maximize throughput then, or are you saing you managed to reduce it ? :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: robert@herjulf.net Date: Fri, 19 Mar 2010 14:35:22 +0100 > > Eric Dumazet writes: > > Le vendredi 19 mars 2010 à 09:44 +0100, Robert Olsson a écrit : > > > > I cannot understand how this can help. > > > > __netdev_alloc_skb() is supposed to already take into account NUMA > > properties : > > > > int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1; > > > > If this doesnt work, we should correct core stack, not only pktgen :) > > > > Are you allocating memory in the node where pktgen CPU is running or the > > node close to the NIC ? > > I didn't say it should help the idea was to give some hooks to > experiment and see effects with different node memory allocations. > There are many degrees of freedom wrt buses(device)/CPU/menory. I think it's a useful feature and by default the netdev alloc is still used, so... applied to net-next-2.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet writes: > Well, you said "Tested this with 10 Intel 82599 ports w. TYAN S7025 > E5520 CPU's. Was able to TX/DMA ~80 Gbit/s to Ethernet wires." > > I am interested to know what particular setup you did to maximize > throughput then, or are you saing you managed to reduce it ? :) Some notes from the experiment, It's getting complex and hairy. Anyway results from the first tests to give you an idea... My colleague Olof might have some comments/details pktgen sending on 10 * 10g interfaces. [From pktgen script] fn() { i=$1 #ifname c=$2 #queue / cpu core n=$3 # numa node PGDEV=/proc/net/pktgen/kpktgend_$c pgset "add_device eth$i@$c " PGDEV=/proc/net/pktgen/eth$i@$c pgset "node $n" pgset "$COUNT" pgset "flag NODE_ALLOC" pgset "$CLONE_SKB" pgset "$PKT_SIZE" pgset "$DELAY" pgset "dst 10.0.0.0" } remove_all # Setup # TYAN S7025 with two nodes. # Each node has own bus with it's own TYLERSBURG bridge # so eth0-eth3 is closest to node0 which in turn "owns" # CPU-cores 0-3 in this HW setup. So we setup so # pktgen according to this. clone_skb=1000000. # Used slots are PCIe-x16 except when PCIe-x8 is indicated. # eth0 queue=0(CPU) node=0 fn 0 0 0 fn 1 1 0 fn 2 2 0 fn 3 3 0 fn 4 4 1 fn 5 5 1 fn 6 6 1 fn 7 7 1 fn 8 12 1 fn 9 13 1 Result "manually" tuned. eth0 9617.7 M bit/s 822 k pps eth1 9619.1 M bit/s 823 k pps eth2 9619.1 M bit/s 823 k pps eth3 9619.2 M bit/s 823 k pps eth4 5995.2 M bit/s 512 k pps <- PCIe-x8 eth5 5995.3 M bit/s 512 k pps <- PCIe-x8 eth6 9619.2 M bit/s 823 k pps eth7 9619.2 M bit/s 823 k pps eth8 9619.1 M bit/s 823 k pps eth9 9619.0 M bit/s 823 k pps > 90 Gbit/s Result "manually" mistuned by switching node 0 and 1. eth0 9613.6 M bit/s 822 k pps eth1 9614.9 M bit/s 822 k pps eth2 9615.0 M bit/s 822 k pps eth3 9615.1 M bit/s 822 k pps eth4 2918.5 M bit/s 249 k pps <- PCIe-x8 eth5 2918.4 M bit/s 249 k pps <- PCIe-x8 eth6 8597.0 M bit/s 735 k pps eth7 8597.0 M bit/s 735 k pps eth8 8568.3 M bit/s 733 k pps eth9 8568.3 M bit/s 733 k pps A lot things is to be investgated... Cheers --ro -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 22 mars 2010 à 07:24 +0100, Robert Olsson a écrit : > Eric Dumazet writes: > > > Well, you said "Tested this with 10 Intel 82599 ports w. TYAN S7025 > > E5520 CPU's. Was able to TX/DMA ~80 Gbit/s to Ethernet wires." > > > > I am interested to know what particular setup you did to maximize > > throughput then, or are you saing you managed to reduce it ? :) > > > Some notes from the experiment, It's getting > complex and hairy. Anyway results from the first > tests to give you an idea... My colleague Olof > might have some comments/details > > pktgen sending on 10 * 10g interfaces. > > [From pktgen script] > fn() > { > i=$1 #ifname > c=$2 #queue / cpu core > n=$3 # numa node > PGDEV=/proc/net/pktgen/kpktgend_$c > pgset "add_device eth$i@$c " > PGDEV=/proc/net/pktgen/eth$i@$c > pgset "node $n" > pgset "$COUNT" > pgset "flag NODE_ALLOC" > pgset "$CLONE_SKB" > pgset "$PKT_SIZE" > pgset "$DELAY" > pgset "dst 10.0.0.0" > } > > remove_all > # Setup > > # TYAN S7025 with two nodes. > # Each node has own bus with it's own TYLERSBURG bridge > # so eth0-eth3 is closest to node0 which in turn "owns" > # CPU-cores 0-3 in this HW setup. So we setup so > # pktgen according to this. clone_skb=1000000. > # Used slots are PCIe-x16 except when PCIe-x8 is indicated. > > # eth0 queue=0(CPU) node=0 > fn 0 0 0 > fn 1 1 0 > fn 2 2 0 > fn 3 3 0 > fn 4 4 1 > fn 5 5 1 > fn 6 6 1 > fn 7 7 1 > fn 8 12 1 > fn 9 13 1 > > Result "manually" tuned. > > eth0 9617.7 M bit/s 822 k pps > eth1 9619.1 M bit/s 823 k pps > eth2 9619.1 M bit/s 823 k pps > eth3 9619.2 M bit/s 823 k pps > eth4 5995.2 M bit/s 512 k pps <- PCIe-x8 > eth5 5995.3 M bit/s 512 k pps <- PCIe-x8 > eth6 9619.2 M bit/s 823 k pps > eth7 9619.2 M bit/s 823 k pps > eth8 9619.1 M bit/s 823 k pps > eth9 9619.0 M bit/s 823 k pps > > > 90 Gbit/s > > Result "manually" mistuned by switching node 0 and 1. > > eth0 9613.6 M bit/s 822 k pps > eth1 9614.9 M bit/s 822 k pps > eth2 9615.0 M bit/s 822 k pps > eth3 9615.1 M bit/s 822 k pps > eth4 2918.5 M bit/s 249 k pps <- PCIe-x8 > eth5 2918.4 M bit/s 249 k pps <- PCIe-x8 > eth6 8597.0 M bit/s 735 k pps > eth7 8597.0 M bit/s 735 k pps > eth8 8568.3 M bit/s 733 k pps > eth9 8568.3 M bit/s 733 k pps > > A lot things is to be investgated... Sure :) I wonder why eth0-eth3 results are unchanged after a node flip. Thanks for sharing -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet writes: > > Result "manually" tuned. > > > > eth0 9617.7 M bit/s 822 k pps > > eth1 9619.1 M bit/s 823 k pps > > eth2 9619.1 M bit/s 823 k pps > > eth3 9619.2 M bit/s 823 k pps > > eth4 5995.2 M bit/s 512 k pps <- PCIe-x8 > > eth5 5995.3 M bit/s 512 k pps <- PCIe-x8 > > eth6 9619.2 M bit/s 823 k pps > > eth7 9619.2 M bit/s 823 k pps > > eth8 9619.1 M bit/s 823 k pps > > eth9 9619.0 M bit/s 823 k pps > > > > > 90 Gbit/s DMA potential this box is about four 10g ports. > > Result "manually" mistuned by switching node 0 and 1. > > > > eth0 9613.6 M bit/s 822 k pps > > eth1 9614.9 M bit/s 822 k pps > > eth2 9615.0 M bit/s 822 k pps > > eth3 9615.1 M bit/s 822 k pps > > eth4 2918.5 M bit/s 249 k pps <- PCIe-x8 > > eth5 2918.4 M bit/s 249 k pps <- PCIe-x8 > > eth6 8597.0 M bit/s 735 k pps > > eth7 8597.0 M bit/s 735 k pps > > eth8 8568.3 M bit/s 733 k pps > > eth9 8568.3 M bit/s 733 k pps > > > I wonder why eth0-eth3 results are unchanged after a node flip. Yes it's strange. With clone_skb=1 we could see differences with just one GIGE interface using 64 byte pkts so it might be very different on 10g. We're getting unfortunely closer to hardware... Cheers --ro -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 4392381..c195fd0 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -169,7 +169,7 @@ #include <asm/dma.h> #include <asm/div64.h> /* do_div */ -#define VERSION "2.72" +#define VERSION "2.73" #define IP_NAME_SZ 32 #define MAX_MPLS_LABELS 16 /* This is the max label stack depth */ #define MPLS_STACK_BOTTOM htonl(0x00000100) @@ -190,6 +190,7 @@ #define F_IPSEC_ON (1<<12) /* ipsec on for flows */ #define F_QUEUE_MAP_RND (1<<13) /* queue map Random */ #define F_QUEUE_MAP_CPU (1<<14) /* queue map mirrors smp_processor_id() */ +#define F_NODE (1<<15) /* Node memory alloc*/ /* Thread control flag bits */ #define T_STOP (1<<0) /* Stop run */ @@ -372,6 +373,7 @@ struct pktgen_dev { u16 queue_map_min; u16 queue_map_max; + int node; /* Memory node */ #ifdef CONFIG_XFRM __u8 ipsmode; /* IPSEC mode (config) */ @@ -607,6 +609,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v) if (pkt_dev->traffic_class) seq_printf(seq, " traffic_class: 0x%02x\n", pkt_dev->traffic_class); + if (pkt_dev->node >= 0) + seq_printf(seq, " node: %d\n", pkt_dev->node); + seq_printf(seq, " Flags: "); if (pkt_dev->flags & F_IPV6) @@ -660,6 +665,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v) if (pkt_dev->flags & F_SVID_RND) seq_printf(seq, "SVID_RND "); + if (pkt_dev->flags & F_NODE) + seq_printf(seq, "NODE_ALLOC "); + seq_puts(seq, "\n"); /* not really stopped, more like last-running-at */ @@ -1074,6 +1082,21 @@ static ssize_t pktgen_if_write(struct file *file, pkt_dev->dst_mac_count); return count; } + if (!strcmp(name, "node")) { + len = num_arg(&user_buffer[i], 10, &value); + if (len < 0) + return len; + + i += len; + + if(node_possible(value)) { + pkt_dev->node = value; + sprintf(pg_result, "OK: node=%d", pkt_dev->node); + } + else + sprintf(pg_result, "ERROR: node not possible"); + return count; + } if (!strcmp(name, "flag")) { char f[32]; memset(f, 0, 32); @@ -1166,12 +1189,18 @@ static ssize_t pktgen_if_write(struct file *file, else if (strcmp(f, "!IPV6") == 0) pkt_dev->flags &= ~F_IPV6; + else if (strcmp(f, "NODE_ALLOC") == 0) + pkt_dev->flags |= F_NODE; + + else if (strcmp(f, "!NODE_ALLOC") == 0) + pkt_dev->flags &= ~F_NODE; + else { sprintf(pg_result, "Flag -:%s:- unknown\nAvailable flags, (prepend ! to un-set flag):\n%s", f, "IPSRC_RND, IPDST_RND, UDPSRC_RND, UDPDST_RND, " - "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, MPLS_RND, VID_RND, SVID_RND, FLOW_SEQ, IPSEC\n"); + "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, MPLS_RND, VID_RND, SVID_RND, FLOW_SEQ, IPSEC, NODE_ALLOC\n"); return count; } sprintf(pg_result, "OK: flags=0x%x", pkt_dev->flags); @@ -2572,9 +2601,27 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev, mod_cur_headers(pkt_dev); datalen = (odev->hard_header_len + 16) & ~0xf; - skb = __netdev_alloc_skb(odev, - pkt_dev->cur_pkt_size + 64 - + datalen + pkt_dev->pkt_overhead, GFP_NOWAIT); + + if(pkt_dev->flags & F_NODE) { + int node; + + if(pkt_dev->node >= 0) + node = pkt_dev->node; + else + node = numa_node_id(); + + skb = __alloc_skb(NET_SKB_PAD + pkt_dev->cur_pkt_size + 64 + + datalen + pkt_dev->pkt_overhead, GFP_NOWAIT, 0, node); + if (likely(skb)) { + skb_reserve(skb, NET_SKB_PAD); + skb->dev = odev; + } + } + else + skb = __netdev_alloc_skb(odev, + pkt_dev->cur_pkt_size + 64 + + datalen + pkt_dev->pkt_overhead, GFP_NOWAIT); + if (!skb) { sprintf(pkt_dev->result, "No memory"); return NULL; @@ -3674,6 +3721,7 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname) pkt_dev->svlan_p = 0; pkt_dev->svlan_cfi = 0; pkt_dev->svlan_id = 0xffff; + pkt_dev->node = -1; err = pktgen_setup_dev(pkt_dev, ifname); if (err)
Hi, Here is patch to manipulate packet node allocation and implicitly how packets are DMA'd etc. The flag NODE_ALLOC enables the function and numa_node_id(); when enabled it can also be explicitly controlled via a new node parameter Tested this with 10 Intel 82599 ports w. TYAN S7025 E5520 CPU's. Was able to TX/DMA ~80 Gbit/s to Ethernet wires. Cheers --ro Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html