Message ID | 20200904135332.60259-1-bjorn.topel@gmail.com |
---|---|
Headers | show |
Series | xsk: exit NAPI loop when AF_XDP Rx ring is full | expand |
On 2020-09-04 15:53, Björn Töpel wrote: > This series addresses a problem that arises when AF_XDP zero-copy is > enabled, and the kernel softirq Rx processing and userland process > is running on the same core. > [...] > @Maxim I'm not well versed in Mellanox drivers. Would this be relevant to mlx5 as well? Cheers, Björn
On Fri, 4 Sep 2020 15:53:25 +0200 Björn Töpel <bjorn.topel@gmail.com> wrote: > On my machine the "one core scenario Rx drop" performance went from > ~65Kpps to 21Mpps. In other words, from "not usable" to > "usable". YMMV. We have observed this kind of dropping off an edge before with softirq (when userspace process runs on same RX-CPU), but I thought that Eric Dumazet solved it in 4cd13c21b207 ("softirq: Let ksoftirqd do its job"). I wonder what makes AF_XDP different or if the problem have come back?
On 2020-09-04 16:27, Jesper Dangaard Brouer wrote: > On Fri, 4 Sep 2020 15:53:25 +0200 > Björn Töpel <bjorn.topel@gmail.com> wrote: > >> On my machine the "one core scenario Rx drop" performance went from >> ~65Kpps to 21Mpps. In other words, from "not usable" to >> "usable". YMMV. > > We have observed this kind of dropping off an edge before with softirq > (when userspace process runs on same RX-CPU), but I thought that Eric > Dumazet solved it in 4cd13c21b207 ("softirq: Let ksoftirqd do its job"). > > I wonder what makes AF_XDP different or if the problem have come back? > I would say this is not the same issue. The problem is that the softirq is busy dropping packets since the AF_XDP Rx is full. So, the cycles *are* split 50/50, which is not what we want in this case. :-) This issue is more of a "Intel AF_XDP ZC drivers does stupid work", than fairness. If the Rx ring is full, then there is really no use to let the NAPI loop continue. Would you agree, or am I rambling? :-P Björn
On Fri, 4 Sep 2020 16:32:56 +0200 Björn Töpel wrote: > On 2020-09-04 16:27, Jesper Dangaard Brouer wrote: > > On Fri, 4 Sep 2020 15:53:25 +0200 > > Björn Töpel <bjorn.topel@gmail.com> wrote: > > > >> On my machine the "one core scenario Rx drop" performance went from > >> ~65Kpps to 21Mpps. In other words, from "not usable" to > >> "usable". YMMV. > > > > We have observed this kind of dropping off an edge before with softirq > > (when userspace process runs on same RX-CPU), but I thought that Eric > > Dumazet solved it in 4cd13c21b207 ("softirq: Let ksoftirqd do its job"). > > > > I wonder what makes AF_XDP different or if the problem have come back? > > > > I would say this is not the same issue. The problem is that the softirq > is busy dropping packets since the AF_XDP Rx is full. So, the cycles > *are* split 50/50, which is not what we want in this case. :-) > > This issue is more of a "Intel AF_XDP ZC drivers does stupid work", than > fairness. If the Rx ring is full, then there is really no use to let the > NAPI loop continue. > > Would you agree, or am I rambling? :-P I wonder if ksoftirqd never kicks in because we are able to discard the entire ring before we run out of softirq "slice". I've been pondering the exact problem you're solving with Maciej recently. The efficiency of AF_XDP on one core with the NAPI processing. Your solution (even though it admittedly helps, and is quite simple) still has the application potentially not able to process packets until the queue fills up. This will be bad for latency. Why don't we move closer to application polling? Never re-arm the NAPI after RX, let the application ask for packets, re-arm if 0 polled. You'd get max batching, min latency. Who's the rambling one now? :-D
On 2020-09-05 01:58, Jakub Kicinski wrote: > On Fri, 4 Sep 2020 16:32:56 +0200 Björn Töpel wrote: >> On 2020-09-04 16:27, Jesper Dangaard Brouer wrote: >>> On Fri, 4 Sep 2020 15:53:25 +0200 >>> Björn Töpel <bjorn.topel@gmail.com> wrote: >>> >>>> On my machine the "one core scenario Rx drop" performance went from >>>> ~65Kpps to 21Mpps. In other words, from "not usable" to >>>> "usable". YMMV. >>> >>> We have observed this kind of dropping off an edge before with softirq >>> (when userspace process runs on same RX-CPU), but I thought that Eric >>> Dumazet solved it in 4cd13c21b207 ("softirq: Let ksoftirqd do its job"). >>> >>> I wonder what makes AF_XDP different or if the problem have come back? >>> >> >> I would say this is not the same issue. The problem is that the softirq >> is busy dropping packets since the AF_XDP Rx is full. So, the cycles >> *are* split 50/50, which is not what we want in this case. :-) >> >> This issue is more of a "Intel AF_XDP ZC drivers does stupid work", than >> fairness. If the Rx ring is full, then there is really no use to let the >> NAPI loop continue. >> >> Would you agree, or am I rambling? :-P > > I wonder if ksoftirqd never kicks in because we are able to discard > the entire ring before we run out of softirq "slice". > This is exactly what's happening, so we're entering a "busy poll like" behavior; syscall, return from syscall softirq/napi, userland. > > I've been pondering the exact problem you're solving with Maciej > recently. The efficiency of AF_XDP on one core with the NAPI processing. > > Your solution (even though it admittedly helps, and is quite simple) > still has the application potentially not able to process packets > until the queue fills up. This will be bad for latency. > > Why don't we move closer to application polling? Never re-arm the NAPI > after RX, let the application ask for packets, re-arm if 0 polled. > You'd get max batching, min latency. > > Who's the rambling one now? :-D > :-D No, these are all very good ideas! We've actually experimented with it with the busy-poll series a while back -- NAPI busy-polling does exactly "application polling". However, I wonder if the busy-polling would have better performance than the scenario above (i.e. when the ksoftirqd never kicks in)? Executing the NAPI poll *explicitly* in the syscall, or implicitly from the softirq. Hmm, thinking out loud here. A simple(r) patch enabling busy poll; Exporting the napi_id to the AF_XDP socket (xdp->rxq->napi_id to sk->sk_napi_id), and do the sk_busy_poll_loop() in sendmsg. Or did you have something completely different in mind? As for this patch set, I think it would make sense to pull it in since it makes the single-core scenario *much* better, and it is pretty simple. Then do the application polling as another, potentially, improvement series. Thoughts? Thanks a lot for the feedback! Björn
On Mon, 7 Sep 2020 15:37:40 +0200 Björn Töpel wrote: > > I've been pondering the exact problem you're solving with Maciej > > recently. The efficiency of AF_XDP on one core with the NAPI processing. > > > > Your solution (even though it admittedly helps, and is quite simple) > > still has the application potentially not able to process packets > > until the queue fills up. This will be bad for latency. > > > > Why don't we move closer to application polling? Never re-arm the NAPI > > after RX, let the application ask for packets, re-arm if 0 polled. > > You'd get max batching, min latency. > > > > Who's the rambling one now? :-D > > > > :-D No, these are all very good ideas! We've actually experimented > with it with the busy-poll series a while back -- NAPI busy-polling > does exactly "application polling". > > However, I wonder if the busy-polling would have better performance > than the scenario above (i.e. when the ksoftirqd never kicks in)? > Executing the NAPI poll *explicitly* in the syscall, or implicitly > from the softirq. > > Hmm, thinking out loud here. A simple(r) patch enabling busy poll; > Exporting the napi_id to the AF_XDP socket (xdp->rxq->napi_id to > sk->sk_napi_id), and do the sk_busy_poll_loop() in sendmsg. > > Or did you have something completely different in mind? My understanding is that busy-polling is allowing application to pick up packets from the ring before the IRQ fires. What we're more concerned about is the IRQ firing in the first place. application: busy | needs packets | idle -----------------------+---------------+---------------------- standard | | polls NAPI | keep polling? sleep? busy poll | IRQ on | IRQ off | IRQ off IRQ on -------------+---------+---------------+---------------------- | | polls once | AF_XDP | IRQ off | IRQ off | IRQ on So busy polling is pretty orthogonal. It only applies to the "application needs packets" time. What we'd need is for the application to be able to suppress NAPI polls, promising the kernel that it will busy poll when appropriate. > As for this patch set, I think it would make sense to pull it in since > it makes the single-core scenario *much* better, and it is pretty > simple. Then do the application polling as another, potentially, > improvement series. Up to you, it's extra code in the driver so mostly your code to maintain. I think that if we implement what I described above - everyone will use that on a single core setup, so this set would be dead code (assuming RQ is sized appropriately). But again, your call :)
On 2020-09-07 20:40, Jakub Kicinski wrote: > On Mon, 7 Sep 2020 15:37:40 +0200 Björn Töpel wrote: >> > I've been pondering the exact problem you're solving with Maciej >> > recently. The efficiency of AF_XDP on one core with the NAPI processing. >> > >> > Your solution (even though it admittedly helps, and is quite simple) >> > still has the application potentially not able to process packets >> > until the queue fills up. This will be bad for latency. >> > >> > Why don't we move closer to application polling? Never re-arm the NAPI >> > after RX, let the application ask for packets, re-arm if 0 polled. >> > You'd get max batching, min latency. >> > >> > Who's the rambling one now? :-D >> > >> >> :-D No, these are all very good ideas! We've actually experimented >> with it with the busy-poll series a while back -- NAPI busy-polling >> does exactly "application polling". >> >> However, I wonder if the busy-polling would have better performance >> than the scenario above (i.e. when the ksoftirqd never kicks in)? >> Executing the NAPI poll *explicitly* in the syscall, or implicitly >> from the softirq. >> >> Hmm, thinking out loud here. A simple(r) patch enabling busy poll; >> Exporting the napi_id to the AF_XDP socket (xdp->rxq->napi_id to >> sk->sk_napi_id), and do the sk_busy_poll_loop() in sendmsg. >> >> Or did you have something completely different in mind? > > My understanding is that busy-polling is allowing application to pick > up packets from the ring before the IRQ fires. > > What we're more concerned about is the IRQ firing in the first place. > > application: busy | needs packets | idle > -----------------------+---------------+---------------------- > standard | | polls NAPI | keep polling? sleep? > busy poll | IRQ on | IRQ off | IRQ off IRQ on > -------------+---------+---------------+---------------------- > | | polls once | > AF_XDP | IRQ off | IRQ off | IRQ on > > > So busy polling is pretty orthogonal. It only applies to the > "application needs packets" time. What we'd need is for the application > to be able to suppress NAPI polls, promising the kernel that it will > busy poll when appropriate. > Ah, nice write-up! Thanks! A strict busy-poll mechanism, not the opportunistic (existing) NAPI busy-poll. This would be a new kind of mechanism, and a very much welcome one in AF_XDP-land. More below. >> As for this patch set, I think it would make sense to pull it in since >> it makes the single-core scenario *much* better, and it is pretty >> simple. Then do the application polling as another, potentially, >> improvement series. > > Up to you, it's extra code in the driver so mostly your code to > maintain. > > I think that if we implement what I described above - everyone will > use that on a single core setup, so this set would be dead code > (assuming RQ is sized appropriately). But again, your call :) > Now, I agree that the busy-poll you describe above would be the best option, but from my perspective it's a much larger set that involves experimenting. I will explore that, but I still think this series should go in sooner to make the single core scenario usable *today*. Ok, back to the busy-poll ideas. I'll call your idea "strict busy-poll", i.e. the NAPI loop is *only* driven by userland, and interrupts stay disabled. "Syscall driven poll-mode driver". :-) On the driver side (again, only talking Intel here, since that's what I know the details of), the NAPI context would only cover AF_XDP queues, so that other queues are not starved. Any ideas how strict busy-poll would look, API/implmentation-wise? An option only for AF_XDP sockets? Would this make sense to regular sockets? If so, maybe extend the existing NAPI busy-poll with a "strict" mode? I'll start playing around a bit, but again, I think this simple series should go in just to make AF_XDP single core usable *today*. Thanks! Björn
On 2020-09-04 16:59, Björn Töpel wrote: > On 2020-09-04 15:53, Björn Töpel wrote: >> This series addresses a problem that arises when AF_XDP zero-copy is >> enabled, and the kernel softirq Rx processing and userland process >> is running on the same core. >> > [...] >> > > @Maxim I'm not well versed in Mellanox drivers. Would this be relevant > to mlx5 as well? Thanks for letting me know about this series! So the basic idea is to stop processing hardware completions if the RX ring gets full, because the application didn't have chance to run? Yes, I think it's also relevant to mlx5, the issue is not driver-specific, and a similar fix is applicable. However, it may lead to completion queue overflows - some analysis is needed to understand what happens then and how to handle it. Regarding the feature, I think it should be opt-in (disabled by default), because old applications may not wakeup RX after they process packets in the RX ring. Is it required to change xdpsock accordingly? Also, when need_wakeup is disabled, your driver implementation seems to quit NAPI anyway, but it shouldn't happen, because no one will send a wakeup. Waiting until the RX ring fills up, then passing control to the application and waiting until the hardware completion queue fills up, and so on increases latency - the busy polling approach sounds more legit here. The behavior may be different depending on the driver implementation: 1. If you arm the completion queue and leave interrupts enabled on early exit too, the application will soon be interrupted anyway and won't have much time to process many packets, leading to app <-> NAPI ping-pong one packet at a time, making NAPI inefficient. 2. If you don't arm the completion queue on early exit and wait for the explicit wakeup from the application, it will easily overflow the hardware completion queue, because we don't have a symmetric mechanism to stop the application on imminent hardware queue overflow. It doesn't feel correct and may be trickier to handle: if the application is too slow, such drops should happen on driver/kernel level, not in hardware. Which behavior is used in your drivers? Or am I missing some more options? BTW, it should be better to pass control to the application before the first dropped packet, not after it has been dropped. Some workloads different from pure AF_XDP, for example, 50/50 AF_XDP and XDP_TX may suffer from such behavior, so it's another point to make a knob on the application layer to enable/disable it. From the driver API perspective, I would prefer to see a simpler API if possible. The current API exposes things that the driver shouldn't know (BPF map type), and requires XSK-specific handling. It would be better if some specific error code returned from xdp_do_redirect was reserved to mean "exit NAPI early if you support it". This way we wouldn't need two new helpers, two xdp_do_redirect functions, and this approach would be extensible to other non-XSK use cases without further changes in the driver, and also the logic to opt-in the feature could be put inside the kernel. Thanks, Max > > Cheers, > Björn
On Tue, Sep 8, 2020 at 12:33 PM Maxim Mikityanskiy <maximmi@nvidia.com> wrote: > > On 2020-09-04 16:59, Björn Töpel wrote: > > On 2020-09-04 15:53, Björn Töpel wrote: > >> This series addresses a problem that arises when AF_XDP zero-copy is > >> enabled, and the kernel softirq Rx processing and userland process > >> is running on the same core. > >> > > [...] > >> > > > > @Maxim I'm not well versed in Mellanox drivers. Would this be relevant > > to mlx5 as well? > > Thanks for letting me know about this series! So the basic idea is to > stop processing hardware completions if the RX ring gets full, because > the application didn't have chance to run? Yes, I think it's also > relevant to mlx5, the issue is not driver-specific, and a similar fix is > applicable. However, it may lead to completion queue overflows - some > analysis is needed to understand what happens then and how to handle it. > > Regarding the feature, I think it should be opt-in (disabled by > default), because old applications may not wakeup RX after they process > packets in the RX ring. How about need_wakeup enable/disable at bind time being that opt-in, instead of a new option? It is off by default, and when it is off, the driver busy-spins on the Rx ring until it can put an entry there. It will not yield to the application by returning something less than budget. Applications need not check the need_wakeup flag. If need_wakeup is enabled by the user, the contract is that user-space needs to check the need_wakeup flag and act on it. If it does not, then that is a programming error and it can be set for any unspecified reason. No reason to modify the application, if it checks need_wakeup. But if this patch behaves like that I have not checked. Good points in the rest of the mail, that I think should be addressed. /Magnus > Is it required to change xdpsock accordingly? > Also, when need_wakeup is disabled, your driver implementation seems to > quit NAPI anyway, but it shouldn't happen, because no one will send a > wakeup. > > Waiting until the RX ring fills up, then passing control to the > application and waiting until the hardware completion queue fills up, > and so on increases latency - the busy polling approach sounds more > legit here. > > The behavior may be different depending on the driver implementation: > > 1. If you arm the completion queue and leave interrupts enabled on early > exit too, the application will soon be interrupted anyway and won't have > much time to process many packets, leading to app <-> NAPI ping-pong one > packet at a time, making NAPI inefficient. > > 2. If you don't arm the completion queue on early exit and wait for the > explicit wakeup from the application, it will easily overflow the > hardware completion queue, because we don't have a symmetric mechanism > to stop the application on imminent hardware queue overflow. It doesn't > feel correct and may be trickier to handle: if the application is too > slow, such drops should happen on driver/kernel level, not in hardware. > > Which behavior is used in your drivers? Or am I missing some more options? > > BTW, it should be better to pass control to the application before the > first dropped packet, not after it has been dropped. > > Some workloads different from pure AF_XDP, for example, 50/50 AF_XDP and > XDP_TX may suffer from such behavior, so it's another point to make a > knob on the application layer to enable/disable it. > > From the driver API perspective, I would prefer to see a simpler API if > possible. The current API exposes things that the driver shouldn't know > (BPF map type), and requires XSK-specific handling. It would be better > if some specific error code returned from xdp_do_redirect was reserved > to mean "exit NAPI early if you support it". This way we wouldn't need > two new helpers, two xdp_do_redirect functions, and this approach would > be extensible to other non-XSK use cases without further changes in the > driver, and also the logic to opt-in the feature could be put inside the > kernel. > > Thanks, > Max > > > > > Cheers, > > Björn >
On 2020-09-08 13:37, Magnus Karlsson wrote: > On Tue, Sep 8, 2020 at 12:33 PM Maxim Mikityanskiy <maximmi@nvidia.com> wrote: >> >> On 2020-09-04 16:59, Björn Töpel wrote: >>> On 2020-09-04 15:53, Björn Töpel wrote: >>>> This series addresses a problem that arises when AF_XDP zero-copy is >>>> enabled, and the kernel softirq Rx processing and userland process >>>> is running on the same core. >>>> >>> [...] >>>> >>> >>> @Maxim I'm not well versed in Mellanox drivers. Would this be relevant >>> to mlx5 as well? >> >> Thanks for letting me know about this series! So the basic idea is to >> stop processing hardware completions if the RX ring gets full, because >> the application didn't have chance to run? Yes, I think it's also >> relevant to mlx5, the issue is not driver-specific, and a similar fix is >> applicable. However, it may lead to completion queue overflows - some >> analysis is needed to understand what happens then and how to handle it. >> >> Regarding the feature, I think it should be opt-in (disabled by >> default), because old applications may not wakeup RX after they process >> packets in the RX ring. > First of all; Max, thanks for some really good input as usual! > How about need_wakeup enable/disable at bind time being that opt-in, > instead of a new option? It is off by default, and when it is off, the > driver busy-spins on the Rx ring until it can put an entry there. It > will not yield to the application by returning something less than > budget. Applications need not check the need_wakeup flag. If > need_wakeup is enabled by the user, the contract is that user-space > needs to check the need_wakeup flag and act on it. If it does not, > then that is a programming error and it can be set for any unspecified > reason. No reason to modify the application, if it checks need_wakeup. > But if this patch behaves like that I have not checked. > It does not behave exactly like this. If need_wakeup is enabled, the napi look exists, but if not the napi is rearmed -- so we'll get a less efficient loop. I'll need address this. I'm leaning towards a more explicit opt-in like Max suggested. As Max pointed out, AF_XDP/XDP_TX is actually a non-insane(!) way of using AF_XDP zero-copy, which will suffer from the early exit. > Good points in the rest of the mail, that I think should be addressed. > Yeah, I agree. I will take a step back an rethink. I'll experiment with the busy-polling suggested by Jakub, and also an opt-in early exit. Thanks for all the suggestions folks! Cheers, Björn > /Magnus > >> Is it required to change xdpsock accordingly? >> Also, when need_wakeup is disabled, your driver implementation seems to >> quit NAPI anyway, but it shouldn't happen, because no one will send a >> wakeup. >> >> Waiting until the RX ring fills up, then passing control to the >> application and waiting until the hardware completion queue fills up, >> and so on increases latency - the busy polling approach sounds more >> legit here. >> >> The behavior may be different depending on the driver implementation: >> >> 1. If you arm the completion queue and leave interrupts enabled on early >> exit too, the application will soon be interrupted anyway and won't have >> much time to process many packets, leading to app <-> NAPI ping-pong one >> packet at a time, making NAPI inefficient. >> >> 2. If you don't arm the completion queue on early exit and wait for the >> explicit wakeup from the application, it will easily overflow the >> hardware completion queue, because we don't have a symmetric mechanism >> to stop the application on imminent hardware queue overflow. It doesn't >> feel correct and may be trickier to handle: if the application is too >> slow, such drops should happen on driver/kernel level, not in hardware. >> >> Which behavior is used in your drivers? Or am I missing some more options? >> >> BTW, it should be better to pass control to the application before the >> first dropped packet, not after it has been dropped. >> >> Some workloads different from pure AF_XDP, for example, 50/50 AF_XDP and >> XDP_TX may suffer from such behavior, so it's another point to make a >> knob on the application layer to enable/disable it. >> >> From the driver API perspective, I would prefer to see a simpler API if >> possible. The current API exposes things that the driver shouldn't know >> (BPF map type), and requires XSK-specific handling. It would be better >> if some specific error code returned from xdp_do_redirect was reserved >> to mean "exit NAPI early if you support it". This way we wouldn't need >> two new helpers, two xdp_do_redirect functions, and this approach would >> be extensible to other non-XSK use cases without further changes in the >> driver, and also the logic to opt-in the feature could be put inside the >> kernel. >> >> Thanks, >> Max >> >>> >>> Cheers, >>> Björn >>
On Tue, 8 Sep 2020 08:58:30 +0200 Björn Töpel wrote: > >> As for this patch set, I think it would make sense to pull it in since > >> it makes the single-core scenario *much* better, and it is pretty > >> simple. Then do the application polling as another, potentially, > >> improvement series. > > > > Up to you, it's extra code in the driver so mostly your code to > > maintain. > > > > I think that if we implement what I described above - everyone will > > use that on a single core setup, so this set would be dead code > > (assuming RQ is sized appropriately). But again, your call :) > > Now, I agree that the busy-poll you describe above would be the best > option, but from my perspective it's a much larger set that involves > experimenting. I will explore that, but I still think this series should > go in sooner to make the single core scenario usable *today*. > > Ok, back to the busy-poll ideas. I'll call your idea "strict busy-poll", > i.e. the NAPI loop is *only* driven by userland, and interrupts stay > disabled. "Syscall driven poll-mode driver". :-) > > On the driver side (again, only talking Intel here, since that's what I > know the details of), the NAPI context would only cover AF_XDP queues, > so that other queues are not starved. > > Any ideas how strict busy-poll would look, API/implmentation-wise? An > option only for AF_XDP sockets? Would this make sense to regular > sockets? If so, maybe extend the existing NAPI busy-poll with a "strict" > mode? For AF_XDP and other sockets I think it should be quite straightforward. For AF_XDP just implement current busy poll. Then for all socket types add a new sockopt which sets "timeout" on how long the IRQs can be suppressed for (we don't want application crash or hang to knock the system off the network), or just enables the feature and the timeout is from a sysctl. Then make sure that at the end of polling napi doesn't get scheduled, and set some bit which will prevent napi_schedule_prep() from letting normal IRQ processing from scheduling it, too. Set a timer for the timeout handling to undo all this. What I haven't figured out in my head is how/if this relates to the ongoing wq/threaded NAPI polling work 🤔 but that shouldn't stop you. > I'll start playing around a bit, but again, I think this simple series > should go in just to make AF_XDP single core usable *today*. No objection from me.
On 2020-09-08 19:24, Jakub Kicinski wrote: >> I'll start playing around a bit, but again, I think this simple series >> should go in just to make AF_XDP single core usable*today*. > No objection from me. Thanks Jakub, but as you (probably) noticed in the other thread Maxim had some valid concerns. Let's drop this for now, and I'll get back after some experimentation/hacking. Again, thanks for the ideas! Very much appreciated! Björn
On Tue, 8 Sep 2020 20:28:14 +0200 Björn Töpel wrote: > On 2020-09-08 19:24, Jakub Kicinski wrote: > >> I'll start playing around a bit, but again, I think this simple series > >> should go in just to make AF_XDP single core usable*today*. > > No objection from me. > > Thanks Jakub, but as you (probably) noticed in the other thread Maxim > had some valid concerns. Let's drop this for now, and I'll get back > after some experimentation/hacking. Yeah, I sort of assumed you got the wake-up problem down :S If it gets complicated it may not be worth pursuing this optimization.
On Tue, 8 Sep 2020 13:32:01 +0300 Maxim Mikityanskiy <maximmi@nvidia.com> wrote: > From the driver API perspective, I would prefer to see a simpler API if > possible. The current API exposes things that the driver shouldn't know > (BPF map type), and requires XSK-specific handling. It would be better > if some specific error code returned from xdp_do_redirect was reserved > to mean "exit NAPI early if you support it". This way we wouldn't need > two new helpers, two xdp_do_redirect functions, and this approach would > be extensible to other non-XSK use cases without further changes in the > driver, and also the logic to opt-in the feature could be put inside the > kernel. I agree.