Message ID | 1548934830-2389-1-git-send-email-makita.toshiaki@lab.ntt.co.jp |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] virtio_net: Account for tx bytes and packets on sending xdp_frames | expand |
On Thu, Jan 31, 2019 at 08:40:30PM +0900, Toshiaki Makita wrote: > Previously virtnet_xdp_xmit() did not account for device tx counters, > which caused confusions. > To be consistent with SKBs, account them on freeing xdp_frames. > > Reported-by: David Ahern <dsahern@gmail.com> > Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Well we count them on receive so I guess it makes sense for consistency Acked-by: Michael S. Tsirkin <mst@redhat.com> however, I really wonder whether adding more and more standard net stack things like this will end up costing most of XDP its speed. Should we instead make sure *not* to account XDP packets in any counters at all? XDP programs can use maps to do their own counting... > --- > drivers/net/virtio_net.c | 20 ++++++++++++++++---- > 1 file changed, 16 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 2594481..4cfceb7 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -503,6 +503,8 @@ static int virtnet_xdp_xmit(struct net_device *dev, > struct bpf_prog *xdp_prog; > struct send_queue *sq; > unsigned int len; > + int packets = 0; > + int bytes = 0; > int drops = 0; > int kicks = 0; > int ret, err; > @@ -526,10 +528,18 @@ static int virtnet_xdp_xmit(struct net_device *dev, > > /* Free up any pending old buffers before queueing new ones. */ > while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { > - if (likely(is_xdp_frame(ptr))) > - xdp_return_frame(ptr_to_xdp(ptr)); > - else > - napi_consume_skb(ptr, false); > + if (likely(is_xdp_frame(ptr))) { > + struct xdp_frame *frame = ptr_to_xdp(ptr); > + > + bytes += frame->len; > + xdp_return_frame(frame); > + } else { > + struct sk_buff *skb = ptr; > + > + bytes += skb->len; > + napi_consume_skb(skb, false); > + } > + packets++; > } > > for (i = 0; i < n; i++) { > @@ -549,6 +559,8 @@ static int virtnet_xdp_xmit(struct net_device *dev, > } > out: > u64_stats_update_begin(&sq->stats.syncp); > + sq->stats.bytes += bytes; > + sq->stats.packets += packets; > sq->stats.xdp_tx += n; > sq->stats.xdp_tx_drops += drops; > sq->stats.kicks += kicks; > -- > 1.8.3.1 >
From: "Michael S. Tsirkin" <mst@redhat.com> Date: Thu, 31 Jan 2019 10:25:17 -0500 > On Thu, Jan 31, 2019 at 08:40:30PM +0900, Toshiaki Makita wrote: >> Previously virtnet_xdp_xmit() did not account for device tx counters, >> which caused confusions. >> To be consistent with SKBs, account them on freeing xdp_frames. >> >> Reported-by: David Ahern <dsahern@gmail.com> >> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> > > Well we count them on receive so I guess it makes sense for consistency > > Acked-by: Michael S. Tsirkin <mst@redhat.com> > > however, I really wonder whether adding more and more standard net stack > things like this will end up costing most of XDP its speed. > > Should we instead make sure *not* to account XDP packets > in any counters at all? XDP programs can use maps > to do their own counting... This has been definitely a discussion point, and something we should develop a clear, strong, policy on. David, Jesper, care to chime in where we ended up in that last thread discussion this?
On Thu, 31 Jan 2019 09:45:23 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: "Michael S. Tsirkin" <mst@redhat.com> > Date: Thu, 31 Jan 2019 10:25:17 -0500 > > > On Thu, Jan 31, 2019 at 08:40:30PM +0900, Toshiaki Makita wrote: > >> Previously virtnet_xdp_xmit() did not account for device tx counters, > >> which caused confusions. > >> To be consistent with SKBs, account them on freeing xdp_frames. > >> > >> Reported-by: David Ahern <dsahern@gmail.com> > >> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> > > > > Well we count them on receive so I guess it makes sense for consistency > > > > Acked-by: Michael S. Tsirkin <mst@redhat.com> > > > > however, I really wonder whether adding more and more standard net stack > > things like this will end up costing most of XDP its speed. > > > > Should we instead make sure *not* to account XDP packets > > in any counters at all? XDP programs can use maps > > to do their own counting... > > This has been definitely a discussion point, and something we should > develop a clear, strong, policy on. > > David, Jesper, care to chime in where we ended up in that last thread > discussion this? IHMO packets RX and TX on a device need to be accounted, in standard counters, regardless of XDP. For XDP RX the packet is counted as RX, regardless if XDP choose to XDP_DROP. On XDP TX which is via XDP_REDIRECT or XDP_TX, the driver that transmit the packet need to account the packet in a TX counter (this if often delayed to DMA TX completion handling). We cannot break the expectation that RX and TX counter are visible to userspace stats tools. XDP should not make these packets invisible. Performance wise, I don't see an issue. As updating these counters (packets and bytes) can be done as a bulk, either when driver NAPI RX func ends, or in TX DMA completion, like most drivers already do today. Further more, most drivers save this in per RX ring data-area, which are only summed when userspace read these. A separate question (and project) raised by David Ahern, was if we should have more detailed stats on the different XDP action return codes, as an easy means for sysadms to diagnose running XDP programs. That is something that require more discussions, as it can impact performance, and likely need to be opt-in. My opinion is yes we should do this for the sake of better User eXperience, BUT *only* if we can find a technical solution that does not hurt performance. --Jesper
On 2019/02/01 5:15, Jesper Dangaard Brouer wrote: > On Thu, 31 Jan 2019 09:45:23 -0800 (PST) > David Miller <davem@davemloft.net> wrote: > >> From: "Michael S. Tsirkin" <mst@redhat.com> >> Date: Thu, 31 Jan 2019 10:25:17 -0500 >> >>> On Thu, Jan 31, 2019 at 08:40:30PM +0900, Toshiaki Makita wrote: >>>> Previously virtnet_xdp_xmit() did not account for device tx counters, >>>> which caused confusions. >>>> To be consistent with SKBs, account them on freeing xdp_frames. >>>> >>>> Reported-by: David Ahern <dsahern@gmail.com> >>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> >>> >>> Well we count them on receive so I guess it makes sense for consistency >>> >>> Acked-by: Michael S. Tsirkin <mst@redhat.com> >>> >>> however, I really wonder whether adding more and more standard net stack >>> things like this will end up costing most of XDP its speed. >>> >>> Should we instead make sure *not* to account XDP packets >>> in any counters at all? XDP programs can use maps >>> to do their own counting... >> >> This has been definitely a discussion point, and something we should >> develop a clear, strong, policy on. >> >> David, Jesper, care to chime in where we ended up in that last thread >> discussion this? > > IHMO packets RX and TX on a device need to be accounted, in standard > counters, regardless of XDP. For XDP RX the packet is counted as RX, > regardless if XDP choose to XDP_DROP. On XDP TX which is via > XDP_REDIRECT or XDP_TX, the driver that transmit the packet need to > account the packet in a TX counter (this if often delayed to DMA TX > completion handling). We cannot break the expectation that RX and TX > counter are visible to userspace stats tools. XDP should not make these > packets invisible. > > Performance wise, I don't see an issue. As updating these counters > (packets and bytes) can be done as a bulk, either when driver NAPI RX > func ends, or in TX DMA completion, like most drivers already do today. > Further more, most drivers save this in per RX ring data-area, which > are only summed when userspace read these. Agreed. > A separate question (and project) raised by David Ahern, was if we > should have more detailed stats on the different XDP action return > codes, as an easy means for sysadms to diagnose running XDP programs. > That is something that require more discussions, as it can impact > performance, and likely need to be opt-in. My opinion is yes we should > do this for the sake of better User eXperience, BUT *only* if we can find > a technical solution that does not hurt performance. Basically the situation for the detailed stats is the same as standard stats, at least in virtio_net. Stats are updated as a bulk, and the counters reside in RX/TX ring structures. Probably this way of implementation would be ok performance-wise? But as other drivers may have different situations, if it is generally difficult to avoid performance penalty I'm OK with making them opt-in as a standard way.
On 1/31/19 1:15 PM, Jesper Dangaard Brouer wrote: >> >> David, Jesper, care to chime in where we ended up in that last thread >> discussion this? > > IHMO packets RX and TX on a device need to be accounted, in standard > counters, regardless of XDP. For XDP RX the packet is counted as RX, > regardless if XDP choose to XDP_DROP. On XDP TX which is via > XDP_REDIRECT or XDP_TX, the driver that transmit the packet need to > account the packet in a TX counter (this if often delayed to DMA TX > completion handling). We cannot break the expectation that RX and TX > counter are visible to userspace stats tools. XDP should not make these > packets invisible. Agreed. What I was pushing on that last thread was Rx, Tx and dropped are all accounted by the driver in standard stats. Basically if the driver touched it, the driver's counters should indicate that. The push back was on dropped packets and whether that counter should be bumped on XDP_DROP.
From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Date: Thu, 31 Jan 2019 20:40:30 +0900 > Previously virtnet_xdp_xmit() did not account for device tx counters, > which caused confusions. > To be consistent with SKBs, account them on freeing xdp_frames. > > Reported-by: David Ahern <dsahern@gmail.com> > Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Applied, thank you.
On Sat, 2 Feb 2019 14:27:26 -0700 David Ahern <dsahern@gmail.com> wrote: > On 1/31/19 1:15 PM, Jesper Dangaard Brouer wrote: > >> > >> David, Jesper, care to chime in where we ended up in that last thread > >> discussion this? > > > > IHMO packets RX and TX on a device need to be accounted, in standard > > counters, regardless of XDP. For XDP RX the packet is counted as RX, > > regardless if XDP choose to XDP_DROP. On XDP TX which is via > > XDP_REDIRECT or XDP_TX, the driver that transmit the packet need to > > account the packet in a TX counter (this if often delayed to DMA TX > > completion handling). We cannot break the expectation that RX and TX > > counter are visible to userspace stats tools. XDP should not make these > > packets invisible. > > Agreed. What I was pushing on that last thread was Rx, Tx and dropped > are all accounted by the driver in standard stats. Basically if the > driver touched it, the driver's counters should indicate that. Sound like we all agree (except with the dropped counter, see below). Do notice that mlx5 driver doesn't do this. It is actually rather confusing to use XDP on mlx5, as when XDP "consume" which include XDP_DROP, XDP_REDIRECT or XDP_TX, then the driver standard stats are not incremented... the packet is invisible to "ifconfig" stat based tools. > The push back was on dropped packets and whether that counter should be > bumped on XDP_DROP. My opinion is the XDP_DROP action should NOT increment the drivers drop counter. First of all the "dropped" counter is also use for other stuff, which will confuse that this counter express. Second, choosing XDP_DROP is a policy choice, it still means it was RX-ed at the driver level.
On 2/4/19 3:53 AM, Jesper Dangaard Brouer wrote: > On Sat, 2 Feb 2019 14:27:26 -0700 > David Ahern <dsahern@gmail.com> wrote: > >> On 1/31/19 1:15 PM, Jesper Dangaard Brouer wrote: >>>> >>>> David, Jesper, care to chime in where we ended up in that last thread >>>> discussion this? >>> >>> IHMO packets RX and TX on a device need to be accounted, in standard >>> counters, regardless of XDP. For XDP RX the packet is counted as RX, >>> regardless if XDP choose to XDP_DROP. On XDP TX which is via >>> XDP_REDIRECT or XDP_TX, the driver that transmit the packet need to >>> account the packet in a TX counter (this if often delayed to DMA TX >>> completion handling). We cannot break the expectation that RX and TX >>> counter are visible to userspace stats tools. XDP should not make these >>> packets invisible. >> >> Agreed. What I was pushing on that last thread was Rx, Tx and dropped >> are all accounted by the driver in standard stats. Basically if the >> driver touched it, the driver's counters should indicate that. > > Sound like we all agree (except with the dropped counter, see below). > > Do notice that mlx5 driver doesn't do this. It is actually rather > confusing to use XDP on mlx5, as when XDP "consume" which include > XDP_DROP, XDP_REDIRECT or XDP_TX, then the driver standard stats are > not incremented... the packet is invisible to "ifconfig" stat based > tools. mlx5 needs some work. As I recall it still has the bug/panic removing xdp programs - at least I don't recall seeing a patch for it. > > >> The push back was on dropped packets and whether that counter should be >> bumped on XDP_DROP. > > My opinion is the XDP_DROP action should NOT increment the drivers drop > counter. First of all the "dropped" counter is also use for other > stuff, which will confuse that this counter express. Second, choosing > XDP_DROP is a policy choice, it still means it was RX-ed at the driver > level. > Understood. Hopefully in March I will get some time to come back to this and propose an idea on what I would like to see - namely, the admin has a config option at load time to enable driver counters versus custom map counters. (meaning the operator of the node chooses standard stats over strict performance.) But of course that means the drivers have the code to collect those stats.
On Mon, 2019-02-04 at 19:13 -0800, David Ahern wrote: > On 2/4/19 3:53 AM, Jesper Dangaard Brouer wrote: > > On Sat, 2 Feb 2019 14:27:26 -0700 > > David Ahern <dsahern@gmail.com> wrote: > > > > > On 1/31/19 1:15 PM, Jesper Dangaard Brouer wrote: > > > > > David, Jesper, care to chime in where we ended up in that > > > > > last thread > > > > > discussion this? > > > > > > > > IHMO packets RX and TX on a device need to be accounted, in > > > > standard > > > > counters, regardless of XDP. For XDP RX the packet is counted > > > > as RX, > > > > regardless if XDP choose to XDP_DROP. On XDP TX which is via > > > > XDP_REDIRECT or XDP_TX, the driver that transmit the packet > > > > need to > > > > account the packet in a TX counter (this if often delayed to > > > > DMA TX > > > > completion handling). We cannot break the expectation that RX > > > > and TX > > > > counter are visible to userspace stats tools. XDP should not > > > > make these > > > > packets invisible. > > > > > > Agreed. What I was pushing on that last thread was Rx, Tx and > > > dropped > > > are all accounted by the driver in standard stats. Basically if > > > the > > > driver touched it, the driver's counters should indicate that. > > > > Sound like we all agree (except with the dropped counter, see > > below). > > > > Do notice that mlx5 driver doesn't do this. It is actually rather > > confusing to use XDP on mlx5, as when XDP "consume" which include > > XDP_DROP, XDP_REDIRECT or XDP_TX, then the driver standard stats > > are > > not incremented... the packet is invisible to "ifconfig" stat based > > tools. > > mlx5 needs some work. As I recall it still has the bug/panic removing > xdp programs - at least I don't recall seeing a patch for it. Only when xdp_redirect to mlx5, and removing the program while redirect is happening, this is actually due to a lack of synchronization means between different drivers, we have some ideas to overcome this using a standard XDP API, or just use a hack in mlx5 driver which i don't like: https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=topic/xdp-redirect-fix&id=a3652d03cc35fd3ad62744986c8ccaca74c9f20c I will be working on this towards the end of this week. > > > > > > The push back was on dropped packets and whether that counter > > > should be > > > bumped on XDP_DROP. > > > > My opinion is the XDP_DROP action should NOT increment the drivers > > drop > > counter. First of all the "dropped" counter is also use for other > > stuff, which will confuse that this counter express. Second, > > choosing > > XDP_DROP is a policy choice, it still means it was RX-ed at the > > driver > > level. > > > > Understood. Hopefully in March I will get some time to come back to > this > and propose an idea on what I would like to see - namely, the admin > has > a config option at load time to enable driver counters versus custom > map > counters. (meaning the operator of the node chooses standard stats > over > strict performance.) But of course that means the drivers have the > code > to collect those stats. So bottom line: 1) Driver will count rx packets as rx-ed packets regardless of XDP decision. 2) Driver should keep track of XDP decisions statistics, report them in ethtool and in the new API suggested by David. track even (XDP_PASS) ? Maybe instead of having all drivers track the statistics on their own, we should move the responsibility to upper layer. Idea: since we already have rxq_info structure per XDP ring (no false sharing) and available per xdp_buff we can do: +++ b/include/linux/filter.h @@ -651,7 +651,9 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, * already takes rcu_read_lock() when fetching the program, so * it's not necessary here anymore. */ - return BPF_PROG_RUN(prog, xdp); + u32 ret = BPF_PROG_RUN(prog, xdp); + xdp->xdp_rxq_info.stats[ret]++ + return ret; } still we need a way (API) to report the rxq_info to whoever needs to read current XDP stats 3) Unrelated, In non XDP case, if skb allocation fails or driver fails to pass the skb up to the stack for somereason, should the driver increase rx packets ? IMHO the answer should be yes if we want to have similar behavior between XDP and non XDP cases. But this could result in netdev->stats.rx_packets + netdev- >stats.rx_dropped to be more than the actual rx-ed packets, is this acceptable ?
On Wed, 6 Feb 2019 00:06:33 +0000 Saeed Mahameed <saeedm@mellanox.com> wrote: > 3) Unrelated, In non XDP case, if skb allocation fails or driver fails > to pass the skb up to the stack for somereason, should the driver > increase rx packets ? IMHO the answer should be yes if we want to have > similar behavior between XDP and non XDP cases. I don't think "skb allocation fails" should increase rx packets counter. The difference is that these events are outside sysadm/users control, and is an error detected inside the driver. The XDP program takes a policy choice to XDP_DROP a packet, which can be accounted inside the XDP prog (as the samples show) or as we also discuss via a more generic XDP-action counters. That said, I took at quick look at driver code, and it seems this behavior differs per driver... ixgbe and mlx5 does not count "skb allocation fails" as RX-ed packets, while mlx4 seems to count them. > But this could result in netdev->stats.rx_packets + > netdev->stats.rx_dropped to be more than the actual rx-ed packets, is > this acceptable ? This is one reasons I think this is wrong. --Jesper
On Wed, 6 Feb 2019 14:48:14 +0100, Jesper Dangaard Brouer wrote: > On Wed, 6 Feb 2019 00:06:33 +0000 > Saeed Mahameed <saeedm@mellanox.com> wrote: > > > 3) Unrelated, In non XDP case, if skb allocation fails or driver fails > > to pass the skb up to the stack for somereason, should the driver > > increase rx packets ? IMHO the answer should be yes if we want to have > > similar behavior between XDP and non XDP cases. > > I don't think "skb allocation fails" should increase rx packets > counter. The difference is that these events are outside sysadm/users > control, and is an error detected inside the driver. The XDP program > takes a policy choice to XDP_DROP a packet, which can be accounted > inside the XDP prog (as the samples show) or as we also discuss via a > more generic XDP-action counters. FWIW that's my understanding as well. My understanding of Linux stats is that they are incrementing one counter per packet. I.e. in RX direction success packets are those given to the stack, and for TX those given to the hardware. Standards (IETF/IEEE) usually count stats on the same layer boundary, but I think software generally counts when it's done with the packet. I haven't seen it documented anywhere, yet. I have tried to document it in the docs of the recent RFC: https://patchwork.ozlabs.org/patch/1032332/ Incidentally XDP_DROP may have been better named XDP_DISCARD from stats perspective ;)
On Wed, 6 Feb 2019 00:06:33 +0000 Saeed Mahameed <saeedm@mellanox.com> wrote: > On Mon, 2019-02-04 at 19:13 -0800, David Ahern wrote: [...] > > > > mlx5 needs some work. As I recall it still has the bug/panic > > removing xdp programs - at least I don't recall seeing a patch for > > it. > > Only when xdp_redirect to mlx5, and removing the program while > redirect is happening, this is actually due to a lack of > synchronization means between different drivers, we have some ideas > to overcome this using a standard XDP API, or just use a hack in mlx5 > driver which i don't like: > > https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=topic/xdp-redirect-fix&id=a3652d03cc35fd3ad62744986c8ccaca74c9f20c > > I will be working on this towards the end of this week. Toke and I have been discussing how to solve this. The main idea for fixing this is to tie resource allocation to interface insertion into interface maps (kernel/bpf/devmap.c). As the =devmap= already have the needed synchronisation mechanisms and steps for safely adding and removing =net_devices= (e.g. stopping RX side, flushing remaining frames, waiting RCU period before freeing objects, etc.) As described here: https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#better-ndo_xdp_xmit-resource-management --Jesper
On Thu, 2019-02-07 at 08:48 +0100, Jesper Dangaard Brouer wrote: > On Wed, 6 Feb 2019 00:06:33 +0000 Saeed Mahameed <saeedm@mellanox.com > > wrote: > > > On Mon, 2019-02-04 at 19:13 -0800, David Ahern wrote: > [...] > > > mlx5 needs some work. As I recall it still has the bug/panic > > > removing xdp programs - at least I don't recall seeing a patch > > > for > > > it. > > > > Only when xdp_redirect to mlx5, and removing the program while > > redirect is happening, this is actually due to a lack of > > synchronization means between different drivers, we have some ideas > > to overcome this using a standard XDP API, or just use a hack in > > mlx5 > > driver which i don't like: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=topic/xdp-redirect-fix&id=a3652d03cc35fd3ad62744986c8ccaca74c9f20c > > > > I will be working on this towards the end of this week. > > Toke and I have been discussing how to solve this. > > The main idea for fixing this is to tie resource allocation to > interface > insertion into interface maps (kernel/bpf/devmap.c). As the =devmap= > already have the needed synchronisation mechanisms and steps for > safely > adding and removing =net_devices= (e.g. stopping RX side, flushing > remaining frames, waiting RCU period before freeing objects, etc.) > > As described here: > > https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#better-ndo_xdp_xmit-resource-management > > --Jesper Yes you already suggested this approach @LPC: So 1) on dev_map_update_elem() we will call dev->dev->ndo_bpf() to notify the device on the intention to start/stop redirect, and wait for it to create/destroy the HW resources before/after actually updating the map But: 2) this won't totally solve our problem, since sometimes the driver can decide to recreate (change of configuration) hw resources on the fly while redirect/devmap is already happening, so we need some kind of a dev_map_notification or a flag with rcu synch, for when the driver want to make the xdp redirect resources unavailable. Thanks, Saeed.
On Wed, 6 Feb 2019 00:06:33 +0000 Saeed Mahameed <saeedm@mellanox.com> wrote: > 2) Driver should keep track of XDP decisions statistics, report them in > ethtool and in the new API suggested by David. track even (XDP_PASS) ? > > Maybe instead of having all drivers track the statistics on their own, > we should move the responsibility to upper layer. > > Idea: since we already have rxq_info structure per XDP ring (no false > sharing) and available per xdp_buff we can do: > > +++ b/include/linux/filter.h > @@ -651,7 +651,9 @@ static __always_inline u32 bpf_prog_run_xdp(const > struct bpf_prog *prog, > * already takes rcu_read_lock() when fetching the program, so > * it's not necessary here anymore. > */ > - return BPF_PROG_RUN(prog, xdp); > + u32 ret = BPF_PROG_RUN(prog, xdp); > + xdp->xdp_rxq_info.stats[ret]++ > + return ret; > } > > still we need a way (API) to report the rxq_info to whoever needs to > read current XDP stats I'm capturing this as tasks under the XDP-project github page: https://github.com/xdp-project/xdp-project/pull/13/files
Saeed Mahameed <saeedm@mellanox.com> writes: > But: > 2) this won't totally solve our problem, since sometimes the driver can > decide to recreate (change of configuration) hw resources on the fly > while redirect/devmap is already happening, so we need some kind of a > dev_map_notification or a flag with rcu synch, for when the driver want > to make the xdp redirect resources unavailable. Good point, I'll make a note of this. Do you have a pointer to where the mlx5 driver does this kind of change currently? -Toke
On Fri, 2019-02-08 at 17:55 +0100, Toke Høiland-Jørgensen wrote: > Saeed Mahameed <saeedm@mellanox.com> writes: > > > But: > > 2) this won't totally solve our problem, since sometimes the driver > > can > > decide to recreate (change of configuration) hw resources on the > > fly > > while redirect/devmap is already happening, so we need some kind of > > a > > dev_map_notification or a flag with rcu synch, for when the driver > > want > > to make the xdp redirect resources unavailable. > > Good point, I'll make a note of this. Do you have a pointer to where > the > mlx5 driver does this kind of change currently? > example: ethtool -L to reduce/increase the number of rings e.g. @mlx5e_ethtool_set_ringparam or virtually anywhere mlx5e_switch_priv_channels is called when xdp prog redirect is attached to mlx5. > -Toke
On Thu, 2019-02-07 at 19:08 +0000, Saeed Mahameed wrote: > On Thu, 2019-02-07 at 08:48 +0100, Jesper Dangaard Brouer wrote: > > On Wed, 6 Feb 2019 00:06:33 +0000 Saeed Mahameed < > > saeedm@mellanox.com > > > wrote: > > > On Mon, 2019-02-04 at 19:13 -0800, David Ahern wrote: > > [...] > > > > mlx5 needs some work. As I recall it still has the bug/panic > > > > removing xdp programs - at least I don't recall seeing a patch > > > > for > > > > it. > > > > > > Only when xdp_redirect to mlx5, and removing the program while > > > redirect is happening, this is actually due to a lack of > > > synchronization means between different drivers, we have some > > > ideas > > > to overcome this using a standard XDP API, or just use a hack in > > > mlx5 > > > driver which i don't like: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=topic/xdp-redirect-fix&id=a3652d03cc35fd3ad62744986c8ccaca74c9f20c > > > > > > I will be working on this towards the end of this week. > > > > Toke and I have been discussing how to solve this. > > > > The main idea for fixing this is to tie resource allocation to > > interface > > insertion into interface maps (kernel/bpf/devmap.c). As the > > =devmap= > > already have the needed synchronisation mechanisms and steps for > > safely > > adding and removing =net_devices= (e.g. stopping RX side, flushing > > remaining frames, waiting RCU period before freeing objects, etc.) > > > > As described here: > > > > https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#better-ndo_xdp_xmit-resource-management > > > > --Jesper > > Yes you already suggested this approach @LPC: > > So > 1) on dev_map_update_elem() we will call > dev->dev->ndo_bpf() to notify the device on the intention to > start/stop > redirect, and wait for it to create/destroy the HW resources > before/after actually updating the map > silly me, dev_map_update_elem must be atomic, we can't hook driver resource allocation to it, it must come as a separate request (syscall) from user space to request to create XDP redirect resources. > But: > 2) this won't totally solve our problem, since sometimes the driver > can > decide to recreate (change of configuration) hw resources on the fly > while redirect/devmap is already happening, so we need some kind of a > dev_map_notification or a flag with rcu synch, for when the driver > want > to make the xdp redirect resources unavailable. > I will focus on this problem first, then figure out how to create XDP redirect resources without actullay attaching a dummy xdp program. > Thanks, > Saeed.
On Fri, 2019-02-08 at 15:17 -0800, Saeed Mahameed wrote: > On Thu, 2019-02-07 at 19:08 +0000, Saeed Mahameed wrote: > > > > So > > 1) on dev_map_update_elem() we will call > > dev->dev->ndo_bpf() to notify the device on the intention to > > start/stop > > redirect, and wait for it to create/destroy the HW resources > > before/after actually updating the map > > > > silly me, dev_map_update_elem must be atomic, we can't hook driver > resource allocation to it, it must come as a separate request > (syscall) > from user space to request to create XDP redirect resources. > Well, it is possible to render dev_map_update_elem non-atomic and fail BPF programs who try to update it in the verifier check_map_func_compatibility. if you know of any case where devmap needs to be updated from the BPF program please let me know.
On Sat, 9 Feb 2019 00:18:31 +0000, Saeed Mahameed wrote: > On Fri, 2019-02-08 at 15:17 -0800, Saeed Mahameed wrote: > > On Thu, 2019-02-07 at 19:08 +0000, Saeed Mahameed wrote: > > > > > > So > > > 1) on dev_map_update_elem() we will call > > > dev->dev->ndo_bpf() to notify the device on the intention to > > > start/stop > > > redirect, and wait for it to create/destroy the HW resources > > > before/after actually updating the map > > > > > > > silly me, dev_map_update_elem must be atomic, we can't hook driver > > resource allocation to it, it must come as a separate request > > (syscall) > > from user space to request to create XDP redirect resources. > > > > Well, it is possible to render dev_map_update_elem non-atomic and fail > BPF programs who try to update it in the verifier > check_map_func_compatibility. > > if you know of any case where devmap needs to be updated from the BPF > program please let me know. Did we find a solution to non-map redirect? Sorry if I missed the discussion, I couldn't make the iovisor call this week due to travel.
Saeed Mahameed <saeedm@mellanox.com> writes: > On Fri, 2019-02-08 at 15:17 -0800, Saeed Mahameed wrote: >> On Thu, 2019-02-07 at 19:08 +0000, Saeed Mahameed wrote: >> > >> > So >> > 1) on dev_map_update_elem() we will call >> > dev->dev->ndo_bpf() to notify the device on the intention to >> > start/stop >> > redirect, and wait for it to create/destroy the HW resources >> > before/after actually updating the map >> > >> >> silly me, dev_map_update_elem must be atomic, we can't hook driver >> resource allocation to it, it must come as a separate request >> (syscall) >> from user space to request to create XDP redirect resources. >> > > Well, it is possible to render dev_map_update_elem non-atomic and fail > BPF programs who try to update it in the verifier > check_map_func_compatibility. > > if you know of any case where devmap needs to be updated from the BPF > program please let me know. Well, maybe? My plan for dealing with the non-map redirect variant is essentially to have a hidden map that does just-in-time insertion of ifindexes if they are not already there; and rework XDP_REDIRECT to use that. However, this would essentially be an insert from eBPF; but I guess maybe it could be deferred until the RX-side driver exits its NAPI loop (as we're buffering frames in the map anyway)? -Toke
Jakub Kicinski <jakub.kicinski@netronome.com> writes: > On Sat, 9 Feb 2019 00:18:31 +0000, Saeed Mahameed wrote: >> On Fri, 2019-02-08 at 15:17 -0800, Saeed Mahameed wrote: >> > On Thu, 2019-02-07 at 19:08 +0000, Saeed Mahameed wrote: >> > > >> > > So >> > > 1) on dev_map_update_elem() we will call >> > > dev->dev->ndo_bpf() to notify the device on the intention to >> > > start/stop >> > > redirect, and wait for it to create/destroy the HW resources >> > > before/after actually updating the map >> > > >> > >> > silly me, dev_map_update_elem must be atomic, we can't hook driver >> > resource allocation to it, it must come as a separate request >> > (syscall) >> > from user space to request to create XDP redirect resources. >> > >> >> Well, it is possible to render dev_map_update_elem non-atomic and fail >> BPF programs who try to update it in the verifier >> check_map_func_compatibility. >> >> if you know of any case where devmap needs to be updated from the BPF >> program please let me know. > > Did we find a solution to non-map redirect? See my other reply to Saeed :) -Toke
David Ahern <dsahern@gmail.com> writes: > On 2/4/19 3:53 AM, Jesper Dangaard Brouer wrote: >> On Sat, 2 Feb 2019 14:27:26 -0700 >> David Ahern <dsahern@gmail.com> wrote: >> >>> On 1/31/19 1:15 PM, Jesper Dangaard Brouer wrote: >>>>> >>>>> David, Jesper, care to chime in where we ended up in that last thread >>>>> discussion this? >>>> >>>> IHMO packets RX and TX on a device need to be accounted, in standard >>>> counters, regardless of XDP. For XDP RX the packet is counted as RX, >>>> regardless if XDP choose to XDP_DROP. On XDP TX which is via >>>> XDP_REDIRECT or XDP_TX, the driver that transmit the packet need to >>>> account the packet in a TX counter (this if often delayed to DMA TX >>>> completion handling). We cannot break the expectation that RX and TX >>>> counter are visible to userspace stats tools. XDP should not make these >>>> packets invisible. >>> >>> Agreed. What I was pushing on that last thread was Rx, Tx and dropped >>> are all accounted by the driver in standard stats. Basically if the >>> driver touched it, the driver's counters should indicate that. >> >> Sound like we all agree (except with the dropped counter, see below). >> >> Do notice that mlx5 driver doesn't do this. It is actually rather >> confusing to use XDP on mlx5, as when XDP "consume" which include >> XDP_DROP, XDP_REDIRECT or XDP_TX, then the driver standard stats are >> not incremented... the packet is invisible to "ifconfig" stat based >> tools. > > mlx5 needs some work. As I recall it still has the bug/panic removing > xdp programs - at least I don't recall seeing a patch for it. > >> >> >>> The push back was on dropped packets and whether that counter should be >>> bumped on XDP_DROP. >> >> My opinion is the XDP_DROP action should NOT increment the drivers drop >> counter. First of all the "dropped" counter is also use for other >> stuff, which will confuse that this counter express. Second, choosing >> XDP_DROP is a policy choice, it still means it was RX-ed at the driver >> level. >> > > Understood. Hopefully in March I will get some time to come back to this > and propose an idea on what I would like to see - namely, the admin has > a config option at load time to enable driver counters versus custom map > counters. (meaning the operator of the node chooses standard stats over > strict performance.) But of course that means the drivers have the code > to collect those stats. Hi David I don't recall seeing any follow-up on this. Did you have a chance to formulate your ideas? :) -Toke
On 4/18/19 8:24 AM, Toke Høiland-Jørgensen wrote: >>> >> >> Understood. Hopefully in March I will get some time to come back to this >> and propose an idea on what I would like to see - namely, the admin has >> a config option at load time to enable driver counters versus custom map >> counters. (meaning the operator of the node chooses standard stats over >> strict performance.) But of course that means the drivers have the code >> to collect those stats. > > Hi David > > I don't recall seeing any follow-up on this. Did you have a chance to > formulate your ideas? :) > Not yet. Almost done with the nexthop changes. Once that is out of the way I can come back to this.
David Ahern <dsahern@gmail.com> writes: > On 4/18/19 8:24 AM, Toke Høiland-Jørgensen wrote: >>>> >>> >>> Understood. Hopefully in March I will get some time to come back to this >>> and propose an idea on what I would like to see - namely, the admin has >>> a config option at load time to enable driver counters versus custom map >>> counters. (meaning the operator of the node chooses standard stats over >>> strict performance.) But of course that means the drivers have the code >>> to collect those stats. >> >> Hi David >> >> I don't recall seeing any follow-up on this. Did you have a chance to >> formulate your ideas? :) >> > > Not yet. Almost done with the nexthop changes. Once that is out of the > way I can come back to this. Ping? :) -Toke
On 6/20/19 2:42 PM, Toke Høiland-Jørgensen wrote: >>> I don't recall seeing any follow-up on this. Did you have a chance to >>> formulate your ideas? :) >>> >> >> Not yet. Almost done with the nexthop changes. Once that is out of the >> way I can come back to this. > > Ping? :) > Definitely back to this after the July 4th holiday.
David Ahern <dsahern@gmail.com> writes: > On 6/20/19 2:42 PM, Toke Høiland-Jørgensen wrote: >>>> I don't recall seeing any follow-up on this. Did you have a chance to >>>> formulate your ideas? :) >>>> >>> >>> Not yet. Almost done with the nexthop changes. Once that is out of the >>> way I can come back to this. >> >> Ping? :) >> > > Definitely back to this after the July 4th holiday. Awesome! I'll wait until then before bugging you again... ;) -Toke
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 2594481..4cfceb7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -503,6 +503,8 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct bpf_prog *xdp_prog; struct send_queue *sq; unsigned int len; + int packets = 0; + int bytes = 0; int drops = 0; int kicks = 0; int ret, err; @@ -526,10 +528,18 @@ static int virtnet_xdp_xmit(struct net_device *dev, /* Free up any pending old buffers before queueing new ones. */ while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { - if (likely(is_xdp_frame(ptr))) - xdp_return_frame(ptr_to_xdp(ptr)); - else - napi_consume_skb(ptr, false); + if (likely(is_xdp_frame(ptr))) { + struct xdp_frame *frame = ptr_to_xdp(ptr); + + bytes += frame->len; + xdp_return_frame(frame); + } else { + struct sk_buff *skb = ptr; + + bytes += skb->len; + napi_consume_skb(skb, false); + } + packets++; } for (i = 0; i < n; i++) { @@ -549,6 +559,8 @@ static int virtnet_xdp_xmit(struct net_device *dev, } out: u64_stats_update_begin(&sq->stats.syncp); + sq->stats.bytes += bytes; + sq->stats.packets += packets; sq->stats.xdp_tx += n; sq->stats.xdp_tx_drops += drops; sq->stats.kicks += kicks;
Previously virtnet_xdp_xmit() did not account for device tx counters, which caused confusions. To be consistent with SKBs, account them on freeing xdp_frames. Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> --- drivers/net/virtio_net.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)