Message ID | 4F352182.6060601@parallels.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Le vendredi 10 février 2012 à 17:54 +0400, Pavel Emelyanov a écrit : > We're working on the checkpoint-restore project. To checkpoint a unix socket > we need to read its skb queue. Analogous task for TCP sockets Tejun proposed > to solve with parasite + recvmsg + MSG_PEEK. That's nice, but doesn't work > for unix sockets, because for them MSG_PEEK always peeks a single skb from the > head of the queue. > > I propose to extend the MSG_PEEK functionality with two more flags that peek > either next not picked skb in queue or pick the last picked one. The latter > ability is required to make it possible to re-read a message if MSG_TRUNC > was reported on it. > > These two flags can be used for unix stream sockets, since making the MSG_PEEK > just report all data that fits the buffer length is bad -- we may have scms > in queue thus turning stream socket into dgram one. > > Signed-off-by: Pavel Emelyanov <xemul@parallels.com> > > --- Nice ! So this CR stuff assumes an application wont use itself MSG_PEEK / MORE / AGAIN ? (skb->peeked can only be set, never unset) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/10/2012 06:41 PM, Eric Dumazet wrote: > Le vendredi 10 février 2012 à 17:54 +0400, Pavel Emelyanov a écrit : >> We're working on the checkpoint-restore project. To checkpoint a unix socket >> we need to read its skb queue. Analogous task for TCP sockets Tejun proposed >> to solve with parasite + recvmsg + MSG_PEEK. That's nice, but doesn't work >> for unix sockets, because for them MSG_PEEK always peeks a single skb from the >> head of the queue. >> >> I propose to extend the MSG_PEEK functionality with two more flags that peek >> either next not picked skb in queue or pick the last picked one. The latter >> ability is required to make it possible to re-read a message if MSG_TRUNC >> was reported on it. >> >> These two flags can be used for unix stream sockets, since making the MSG_PEEK >> just report all data that fits the buffer length is bad -- we may have scms >> in queue thus turning stream socket into dgram one. >> >> Signed-off-by: Pavel Emelyanov <xemul@parallels.com> >> >> --- > > Nice ! > > So this CR stuff assumes an application wont use itself MSG_PEEK / > MORE / AGAIN ? :( Not exactly. MSG_PEEK will still work, but PEEK_MORE/PEEK_AGAIN will not. Hm... This means that the state of "what was peek-ed already" should be on the user side. Can we pass some "offset" (in bytes for stream and in packets for datagram) to the recvmsg? > (skb->peeked can only be set, never unset) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Pavel Emelyanov <xemul@parallels.com> Date: Fri, 10 Feb 2012 17:54:10 +0400 > We're working on the checkpoint-restore project. To checkpoint a unix socket > we need to read its skb queue. Analogous task for TCP sockets Tejun proposed > to solve with parasite + recvmsg + MSG_PEEK. That's nice, but doesn't work > for unix sockets, because for them MSG_PEEK always peeks a single skb from the > head of the queue. > > I propose to extend the MSG_PEEK functionality with two more flags that peek > either next not picked skb in queue or pick the last picked one. The latter > ability is required to make it possible to re-read a message if MSG_TRUNC > was reported on it. > > These two flags can be used for unix stream sockets, since making the MSG_PEEK > just report all data that fits the buffer length is bad -- we may have scms > in queue thus turning stream socket into dgram one. > > Signed-off-by: Pavel Emelyanov <xemul@parallels.com> I'm still thinking about how I feel about this. But meanwhile you can fix some thing up, for example I really feel that you should make sure that multiple bits aren't set at the same time. You either mean MSG_PEEK_MORE or MSG_PEEK_AGAIN, so setting both makes no sense and should be flagged as an error. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/socket.h b/include/linux/socket.h index d0e77f6..ab3aa19 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -266,6 +266,9 @@ struct ucred { #define MSG_MORE 0x8000 /* Sender will send more */ #define MSG_WAITFORONE 0x10000 /* recvmmsg(): block until 1+ packets avail */ +#define MSG_PEEK_MORE 0x20000 +#define MSG_PEEK_AGAIN 0x40000 + #define MSG_EOF MSG_FIN #define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exit for file diff --git a/net/core/datagram.c b/net/core/datagram.c index 68bbf9f..c330e40 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -157,6 +157,40 @@ out_noerr: * quite explicitly by POSIX 1003.1g, don't change them without having * the standard around please. */ + +/* + * Peek the last skb marked as peeked + */ + +static struct sk_buff *skb_peek_again(struct sk_buff_head *queue) +{ + struct sk_buff *skb, *prev = NULL; + + skb_queue_walk(queue, skb) { + if (skb->peeked) + prev = skb; + else + break; + } + + return prev; +} + +/* + * Peek the first not peeked skb + */ + +static struct sk_buff *skb_peek_more(struct sk_buff_head *queue) +{ + struct sk_buff *skb; + + skb_queue_walk(queue, skb) + if (!skb->peeked) + return skb; + + return NULL; +} + struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags, int *peeked, int *err) { @@ -180,18 +214,28 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags, * However, this function was correct in any case. 8) */ unsigned long cpu_flags; - - spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags); - skb = skb_peek(&sk->sk_receive_queue); + struct sk_buff_head *queue = &sk->sk_receive_queue; + + spin_lock_irqsave(&queue->lock, cpu_flags); + if (flags & MSG_PEEK) { + if (flags & MSG_PEEK_MORE) + skb = skb_peek_more(queue); + else if (flags & MSG_PEEK_AGAIN) + skb = skb_peek_again(queue); + else + skb = skb_peek(queue); + } else + skb = skb_peek(queue); + if (skb) { *peeked = skb->peeked; if (flags & MSG_PEEK) { skb->peeked = 1; atomic_inc(&skb->users); } else - __skb_unlink(skb, &sk->sk_receive_queue); + __skb_unlink(skb, queue); } - spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags); + spin_unlock_irqrestore(&queue->lock, cpu_flags); if (skb) return skb;
We're working on the checkpoint-restore project. To checkpoint a unix socket we need to read its skb queue. Analogous task for TCP sockets Tejun proposed to solve with parasite + recvmsg + MSG_PEEK. That's nice, but doesn't work for unix sockets, because for them MSG_PEEK always peeks a single skb from the head of the queue. I propose to extend the MSG_PEEK functionality with two more flags that peek either next not picked skb in queue or pick the last picked one. The latter ability is required to make it possible to re-read a message if MSG_TRUNC was reported on it. These two flags can be used for unix stream sockets, since making the MSG_PEEK just report all data that fits the buffer length is bad -- we may have scms in queue thus turning stream socket into dgram one. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> --- include/linux/socket.h | 3 ++ net/core/datagram.c | 52 ++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 51 insertions(+), 4 deletions(-)