Message ID | 1395343961-4744-1-git-send-email-zoltan.kiss@citrix.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2014-03-20 at 19:32 +0000, Zoltan Kiss wrote: > Since the early days TX stops if there isn't enough free pending slots to > consume a maximum sized (slot-wise) packet. Probably the reason for that is to > avoid the case when we don't have enough free pending slot in the ring to finish > the packet. But if we make sure that the pending ring has the same size as the > shared ring, that shouldn't really happen. The frontend can only post packets > which fit the to the free space of the shared ring. If it doesn't, the frontend > has to stop, as it can only increase the req_prod when the whole packet fits > onto the ring. My only real concern here is that by removing these checks we are introducing a way for a malicious or buggy guest to trigger misbehaviour in the backend, leading to e.g. a DoS. Should we need to some sanity checks which shutdown the ring if something like this occurs? i.e. if we come to consume a packet and there is insufficient space on the pending ring we kill the vif. What's the invariant we are relying on here, is it: req_prod >= req_cons >= pending_prod >= pending_cons >= rsp_prod >= rsp_cons ? > This patch avoid using this checking, makes sure the 2 ring has the same size, > and remove a checking from the callback. As now we don't stop the NAPI instance > on this condition, we don't have to wake it up if we free pending slots up. > > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > --- > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > index bef37be..a800a8e 100644 > --- a/drivers/net/xen-netback/common.h > +++ b/drivers/net/xen-netback/common.h > @@ -81,7 +81,7 @@ struct xenvif_rx_meta { > > #define MAX_BUFFER_OFFSET PAGE_SIZE > > -#define MAX_PENDING_REQS 256 > +#define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE XEN_NETIF_TX_RING_SIZE is already == 256, right? (Just want to make sure this is semantically no change). > /* It's possible for an skb to have a maximal number of frags > * but still be less than MAX_BUFFER_OFFSET in size. Thus the > @@ -253,12 +253,6 @@ static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif) > vif->pending_prod + vif->pending_cons; > } > > -static inline bool xenvif_tx_pending_slots_available(struct xenvif *vif) > -{ > - return nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX > - < MAX_PENDING_REQS; > -} > - > /* Callback from stack when TX packet can be released */ > void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success); > > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c > index 83a71ac..7bb7734 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -88,8 +88,7 @@ static int xenvif_poll(struct napi_struct *napi, int budget) > local_irq_save(flags); > > RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do); > - if (!(more_to_do && > - xenvif_tx_pending_slots_available(vif))) > + if (!more_to_do) > __napi_complete(napi); > > local_irq_restore(flags); > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index bc94320..5df8d63 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1175,8 +1175,7 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget) > struct sk_buff *skb; > int ret; > > - while (xenvif_tx_pending_slots_available(vif) && > - (skb_queue_len(&vif->tx_queue) < budget)) { > + while (skb_queue_len(&vif->tx_queue) < budget) { > struct xen_netif_tx_request txreq; > struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX]; > struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1]; > @@ -1516,13 +1515,6 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success) > wake_up(&vif->dealloc_wq); > spin_unlock_irqrestore(&vif->callback_lock, flags); > > - if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx) && > - xenvif_tx_pending_slots_available(vif)) { > - local_bh_disable(); > - napi_schedule(&vif->napi); > - local_bh_enable(); > - } > - > if (likely(zerocopy_success)) > vif->tx_zerocopy_success++; > else > @@ -1714,8 +1706,7 @@ static inline int rx_work_todo(struct xenvif *vif) > static inline int tx_work_todo(struct xenvif *vif) > { > > - if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) && > - xenvif_tx_pending_slots_available(vif)) > + if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))) > return 1; > > return 0; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21/03/14 09:36, Ian Campbell wrote: > On Thu, 2014-03-20 at 19:32 +0000, Zoltan Kiss wrote: >> Since the early days TX stops if there isn't enough free pending slots to >> consume a maximum sized (slot-wise) packet. Probably the reason for that is to >> avoid the case when we don't have enough free pending slot in the ring to finish >> the packet. But if we make sure that the pending ring has the same size as the >> shared ring, that shouldn't really happen. The frontend can only post packets >> which fit the to the free space of the shared ring. If it doesn't, the frontend >> has to stop, as it can only increase the req_prod when the whole packet fits >> onto the ring. > > My only real concern here is that by removing these checks we are > introducing a way for a malicious or buggy guest to trigger misbehaviour > in the backend, leading to e.g. a DoS. > > Should we need to some sanity checks which shutdown the ring if > something like this occurs? i.e. if we come to consume a packet and > there is insufficient space on the pending ring we kill the vif. The backend doesn't see what the guest does with the responses, and that's OK, it's the guest's problem, after netback increased rsp_prod_pvt it doesn't really care. But as soon as the guest start placing new requests after rsp_prod_pvt, or just increasing req_prod so req_prod-rsp_prod_pvt > XEN_NETIF_TX_RING_SIZE, it becomes an issue. So far this xenvif_tx_pending_slots_available indirectly saved us from consuming requests overwriting still pending requests, but the guest could still overwrote our responses. But again, that's still the guests problem, we had the original request saved in the pending ring data. If the guest went too far, build_gops killed the vif when req_prod-req_cons > XEN_NETIF_TX_RING_SIZE Indirect above means it only happened because the pending ring had the same size, and the used slots has a 1-to-1 mapping for slots between rsp_prod_pvt and req_cons. So xenvif_tx_pending_slots_available also means (req_cons - rsp_prod_pvt) + XEN_NETBK_LEGACY_SLOTS_MAX < XEN_NETIF_TX_RING_SIZE (does this look familiar? :) But consuming overrunning requests after rsp_prod_pvt is a problem: - NAPI instance races with dealloc thread over the slots. The first reads them as requests, the second writes them as responses - the NAPI instance overwrites used pending slots as well, so skb frag release go wrong etc. But the current RING_HAS_UNCONSUMED_REQUESTS fortunately saves us here, let me explain it through an example: rsp_prod_pvt = 0 req_cons = 253 req_prod = 258 Therefore: req = 5 rsp = 3 So in xenvif_tx_build_gops work_to_do will be 3, and in xenvif_count_requests we bail out when we see that the packet actually exceeds that. So in the end, we are safe here, but we shouldn't change that macro I suggested to refactor :) Zoli > > What's the invariant we are relying on here, is it: > req_prod >= req_cons >= pending_prod >= pending_cons >= rsp_prod >= rsp_cons > ? > >> This patch avoid using this checking, makes sure the 2 ring has the same size, >> and remove a checking from the callback. As now we don't stop the NAPI instance >> on this condition, we don't have to wake it up if we free pending slots up. >> >> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> >> --- >> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h >> index bef37be..a800a8e 100644 >> --- a/drivers/net/xen-netback/common.h >> +++ b/drivers/net/xen-netback/common.h >> @@ -81,7 +81,7 @@ struct xenvif_rx_meta { >> >> #define MAX_BUFFER_OFFSET PAGE_SIZE >> >> -#define MAX_PENDING_REQS 256 >> +#define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE > > XEN_NETIF_TX_RING_SIZE is already == 256, right? (Just want to make sure > this is semantically no change). Yes, it is __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index bef37be..a800a8e 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -81,7 +81,7 @@ struct xenvif_rx_meta { #define MAX_BUFFER_OFFSET PAGE_SIZE -#define MAX_PENDING_REQS 256 +#define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE /* It's possible for an skb to have a maximal number of frags * but still be less than MAX_BUFFER_OFFSET in size. Thus the @@ -253,12 +253,6 @@ static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif) vif->pending_prod + vif->pending_cons; } -static inline bool xenvif_tx_pending_slots_available(struct xenvif *vif) -{ - return nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX - < MAX_PENDING_REQS; -} - /* Callback from stack when TX packet can be released */ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success); diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 83a71ac..7bb7734 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -88,8 +88,7 @@ static int xenvif_poll(struct napi_struct *napi, int budget) local_irq_save(flags); RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do); - if (!(more_to_do && - xenvif_tx_pending_slots_available(vif))) + if (!more_to_do) __napi_complete(napi); local_irq_restore(flags); diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index bc94320..5df8d63 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1175,8 +1175,7 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget) struct sk_buff *skb; int ret; - while (xenvif_tx_pending_slots_available(vif) && - (skb_queue_len(&vif->tx_queue) < budget)) { + while (skb_queue_len(&vif->tx_queue) < budget) { struct xen_netif_tx_request txreq; struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX]; struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1]; @@ -1516,13 +1515,6 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success) wake_up(&vif->dealloc_wq); spin_unlock_irqrestore(&vif->callback_lock, flags); - if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx) && - xenvif_tx_pending_slots_available(vif)) { - local_bh_disable(); - napi_schedule(&vif->napi); - local_bh_enable(); - } - if (likely(zerocopy_success)) vif->tx_zerocopy_success++; else @@ -1714,8 +1706,7 @@ static inline int rx_work_todo(struct xenvif *vif) static inline int tx_work_todo(struct xenvif *vif) { - if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) && - xenvif_tx_pending_slots_available(vif)) + if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))) return 1; return 0;
Since the early days TX stops if there isn't enough free pending slots to consume a maximum sized (slot-wise) packet. Probably the reason for that is to avoid the case when we don't have enough free pending slot in the ring to finish the packet. But if we make sure that the pending ring has the same size as the shared ring, that shouldn't really happen. The frontend can only post packets which fit the to the free space of the shared ring. If it doesn't, the frontend has to stop, as it can only increase the req_prod when the whole packet fits onto the ring. This patch avoid using this checking, makes sure the 2 ring has the same size, and remove a checking from the callback. As now we don't stop the NAPI instance on this condition, we don't have to wake it up if we free pending slots up. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> --- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html