Message ID | 1265655334.31760.9.camel@w-sridhar.beaverton.ibm.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Monday 08 February 2010, Sridhar Samudrala wrote: > I am also seeing this issue with net-next-2.6. > Basically macvtap_put_user() and macvtap_get_user() call copy_to/from_user > from within a RCU read-side critical section. > > The following patch fixes this issue by releasing the RCU read lock before > calling these routines, but instead hold a reference to q->sk. > > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Yes, we need something like this, but we also need to protect the device from going away. The concept right now is to use file_get_queue to protect both the macvtap_queue and the macvlan_dev from going away. The sock_hold will keep the macvtap_queue around, but as far as I can tell, a user could still destroy the macvlan_dev using netlink at the same time, which still breaks. > /* Get packet from user space buffer */ > -static ssize_t macvtap_get_user(struct macvtap_queue *q, > +static ssize_t macvtap_get_user(struct macvlan_dev *vlan, struct sock *sk, > const struct iovec *iv, size_t count, > int noblock) > { > @@ -331,10 +331,10 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, > if (unlikely(len < ETH_HLEN)) > return -EINVAL; > > - skb = sock_alloc_send_skb(&q->sk, NET_IP_ALIGN + len, noblock, &err); > + skb = sock_alloc_send_skb(sk, NET_IP_ALIGN + len, noblock, &err); > > if (!skb) { > - macvlan_count_rx(q->vlan, 0, false, false); > + macvlan_count_rx(vlan, 0, false, false); > return err; > } > > @@ -342,14 +342,14 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, > skb_put(skb, count); > > if (skb_copy_datagram_from_iovec(skb, 0, iv, 0, len)) { > - macvlan_count_rx(q->vlan, 0, false, false); > + macvlan_count_rx(vlan, 0, false, false); > kfree_skb(skb); > return -EFAULT; > } > > skb_set_network_header(skb, ETH_HLEN); > > - macvlan_start_xmit(skb, q->vlan->dev); > + macvlan_start_xmit(skb, vlan->dev); > > return count; > } What are these changes for? The lifetime of q is the same as &q->sk, so it won't change anything, right? Moving the macvlan_count_rx and maxclan_start_xmit under the lock should be fine though, but we'd have to take it twice then for each transmit. I'd hope that this could get simpler by adding zero-copy transmit, where we first get_user() the whole buffer and do the rest under rcu_read_lock_bh(). > @@ -393,15 +399,20 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv, > { > struct file *file = iocb->ki_filp; > struct macvtap_queue *q = macvtap_file_get_queue(file); > + struct macvlan_dev *vlan; > + struct sock *sk; > > DECLARE_WAITQUEUE(wait, current); > struct sk_buff *skb; > ssize_t len, ret = 0; > > - if (!q) { > - ret = -ENOLINK; > - goto out; > - } > + if (!q) > + return -ENOLINK; > + > + vlan = q->vlan; > + sk = &q->sk; > + sock_hold(sk); > + macvtap_file_put_queue(); Here, we probably need to prevent vlan from going away by dev_hold(), not just sock_hold(). Or is one implied by the other? Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2010-02-10 at 15:48 +0100, Arnd Bergmann wrote: > On Monday 08 February 2010, Sridhar Samudrala wrote: > > I am also seeing this issue with net-next-2.6. > > Basically macvtap_put_user() and macvtap_get_user() call copy_to/from_user > > from within a RCU read-side critical section. > > > > The following patch fixes this issue by releasing the RCU read lock before > > calling these routines, but instead hold a reference to q->sk. > > > > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> > > Yes, we need something like this, but we also need to protect the > device from going away. The concept right now is to use file_get_queue > to protect both the macvtap_queue and the macvlan_dev from going > away. The sock_hold will keep the macvtap_queue around, but > as far as I can tell, a user could still destroy the macvlan_dev > using netlink at the same time, which still breaks. may be we should do a dev_hold() in macvtap_set_queue() and dev_put() in macvtap_del_queue() so that the underlying device cannot go away as long the macvtap fd is open. Thanks Sridhar -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sridhar Samudrala wrote: > On Wed, 2010-02-10 at 15:48 +0100, Arnd Bergmann wrote: >> On Monday 08 February 2010, Sridhar Samudrala wrote: >>> I am also seeing this issue with net-next-2.6. >>> Basically macvtap_put_user() and macvtap_get_user() call copy_to/from_user >>> from within a RCU read-side critical section. >>> >>> The following patch fixes this issue by releasing the RCU read lock before >>> calling these routines, but instead hold a reference to q->sk. >>> >>> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> >> Yes, we need something like this, but we also need to protect the >> device from going away. The concept right now is to use file_get_queue >> to protect both the macvtap_queue and the macvlan_dev from going >> away. The sock_hold will keep the macvtap_queue around, but >> as far as I can tell, a user could still destroy the macvlan_dev >> using netlink at the same time, which still breaks. > > may be we should do a dev_hold() in macvtap_set_queue() and dev_put() > in macvtap_del_queue() so that the underlying device cannot go away as > long the macvtap fd is open. You either need some kind of loose binding (f.i. using the ifindex) or need to handle the case that the device goes away asynchronously by indicating an error to the socket and unbinding it. But you can't make the lifetime of the device dependant on the socket. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index ad1f6ef..e3102ab 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -320,7 +320,7 @@ out: } /* Get packet from user space buffer */ -static ssize_t macvtap_get_user(struct macvtap_queue *q, +static ssize_t macvtap_get_user(struct macvlan_dev *vlan, struct sock *sk, const struct iovec *iv, size_t count, int noblock) { @@ -331,10 +331,10 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, if (unlikely(len < ETH_HLEN)) return -EINVAL; - skb = sock_alloc_send_skb(&q->sk, NET_IP_ALIGN + len, noblock, &err); + skb = sock_alloc_send_skb(sk, NET_IP_ALIGN + len, noblock, &err); if (!skb) { - macvlan_count_rx(q->vlan, 0, false, false); + macvlan_count_rx(vlan, 0, false, false); return err; } @@ -342,14 +342,14 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, skb_put(skb, count); if (skb_copy_datagram_from_iovec(skb, 0, iv, 0, len)) { - macvlan_count_rx(q->vlan, 0, false, false); + macvlan_count_rx(vlan, 0, false, false); kfree_skb(skb); return -EFAULT; } skb_set_network_header(skb, ETH_HLEN); - macvlan_start_xmit(skb, q->vlan->dev); + macvlan_start_xmit(skb, vlan->dev); return count; } @@ -360,23 +360,29 @@ static ssize_t macvtap_aio_write(struct kiocb *iocb, const struct iovec *iv, struct file *file = iocb->ki_filp; ssize_t result = -ENOLINK; struct macvtap_queue *q = macvtap_file_get_queue(file); + struct macvlan_dev *vlan; + struct sock *sk; if (!q) goto out; - result = macvtap_get_user(q, iv, iov_length(iv, count), + vlan = q->vlan; + sk = &q->sk; + sock_hold(sk); + macvtap_file_put_queue(); + + result = macvtap_get_user(vlan, sk, iv, iov_length(iv, count), file->f_flags & O_NONBLOCK); + sock_put(sk); out: - macvtap_file_put_queue(); return result; } /* Put packet to the user space buffer */ -static ssize_t macvtap_put_user(struct macvtap_queue *q, +static ssize_t macvtap_put_user(struct macvlan_dev *vlan, const struct sk_buff *skb, const struct iovec *iv, int len) { - struct macvlan_dev *vlan = q->vlan; int ret; len = min_t(int, skb->len, len); @@ -393,15 +399,20 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv, { struct file *file = iocb->ki_filp; struct macvtap_queue *q = macvtap_file_get_queue(file); + struct macvlan_dev *vlan; + struct sock *sk; DECLARE_WAITQUEUE(wait, current); struct sk_buff *skb; ssize_t len, ret = 0; - if (!q) { - ret = -ENOLINK; - goto out; - } + if (!q) + return -ENOLINK; + + vlan = q->vlan; + sk = &q->sk; + sock_hold(sk); + macvtap_file_put_queue(); len = iov_length(iv, count); if (len < 0) { @@ -409,12 +420,12 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv, goto out; } - add_wait_queue(q->sk.sk_sleep, &wait); + add_wait_queue(sk->sk_sleep, &wait); while (len) { current->state = TASK_INTERRUPTIBLE; /* Read frames from the queue */ - skb = skb_dequeue(&q->sk.sk_receive_queue); + skb = skb_dequeue(&sk->sk_receive_queue); if (!skb) { if (file->f_flags & O_NONBLOCK) { ret = -EAGAIN; @@ -428,16 +439,16 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv, schedule(); continue; } - ret = macvtap_put_user(q, skb, iv, len); + ret = macvtap_put_user(vlan, skb, iv, len); kfree_skb(skb); break; } current->state = TASK_RUNNING; - remove_wait_queue(q->sk.sk_sleep, &wait); + remove_wait_queue(sk->sk_sleep, &wait); out: - macvtap_file_put_queue(); + sock_put(sk); return ret; }