Message ID | 200904141944.JFE64074.FHtOMOFQLFJOVS@I-love.SAKURA.ne.jp |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tuesday 14 April 2009 06:44:35 am Tetsuo Handa wrote: > Hello. > > The security_socket_post_accept() hook was recently removed because this > hook was not used by any in-tree modules and its existence continued to > cause problems by confusing people about what can be safely accomplished > using this hook. Now, TOMOYO became in-tree and TOMOYO wants to add network > access control in 2.6.31. > > TOMOYO is not a label based access control and won't cause packet labeling > problem. TOMOYO won't care as long as packets are not copied to userspace. We've discussed this issue several times on the mailing list so I won't go over all of the details again, but I will say that my main concern with the post_accept() hook is that you are rejecting a connection after is has already gone through the connection handshake. I personally don't feel this is a good approach but I don't want to stand in your way if I am alone on this point. I'm less concerned about the recv_datagram() hook although the issues you point out about sharing sockets across multiple processes does highlight what I believe are some fundamental issues regarding the TOMOYO network security model. Once again, I'm not going to object if I am the only one. Lastly, I think in order to review these patches we need to see how they are used. Please submit a patch set that includes both this patch as well as a patch to TOMOYO which makes use of these changes; this way we can properly review your patches in context. Regardless, I took a quick look and noticed a few things (below). Thanks. > --- security-testing-2.6.git.orig/net/core/datagram.c > +++ security-testing-2.6.git/net/core/datagram.c > @@ -55,6 +55,7 @@ > #include <net/checksum.h> > #include <net/sock.h> > #include <net/tcp_states.h> > +#include <linux/security.h> > > /* > * Is a socket 'connection oriented' ? > @@ -148,6 +149,7 @@ struct sk_buff *__skb_recv_datagram(stru > { > struct sk_buff *skb; > long timeo; > + unsigned long cpu_flags; > /* > * Caller is allowed not to check sk->sk_err before skb_recv_datagram() > */ > @@ -165,7 +167,6 @@ struct sk_buff *__skb_recv_datagram(stru > * Look at current nfs client by the way... > * However, this function was corrent in any case. 8) > */ > - unsigned long cpu_flags; > > spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags); > skb = skb_peek(&sk->sk_receive_queue); > @@ -179,6 +180,14 @@ struct sk_buff *__skb_recv_datagram(stru > } > spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags); > > + /* Filter packets from unwanted peers. */ It might be best to drop this comment, we don't want to speculate how the LSM might be making access controls decisions here. > + if (skb) { > + error = security_socket_post_recv_datagram(sk, skb, > + flags); > + if (error) > + goto force_dequeue; > + } > + > if (skb) > return skb; Why check to see if skb != NULL twice? I think you should be able to move the skb return statement into the body of the first if block. > @@ -191,6 +200,24 @@ struct sk_buff *__skb_recv_datagram(stru > > return NULL; > > +force_dequeue: > + /* Drop this packet because LSM says "Don't pass it to the caller". */ Once again, remove this comment since we don't know what a LSM might decide. > + if (!(flags & MSG_PEEK)) > + goto no_peek; > + /* > + * If this packet is MSG_PEEK'ed, dequeue it forcibly > + * so that this packet won't prevent the caller from picking up > + * next packet. > + */ > + spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags); > + if (skb == skb_peek(&sk->sk_receive_queue)) { > + __skb_unlink(skb, &sk->sk_receive_queue); > + atomic_dec(&skb->users); > + /* Do I have something to do with skb->peeked ? */ I don't know but you should find out before this code is merged :) > + } > + spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags); > +no_peek: > + kfree_skb(skb); > no_packet: > *err = error; > return NULL;
Hello. Paul Moore wrote: > Please submit a patch set that includes both this patch as well as a > patch to TOMOYO which makes use of these changes; this way we can properly > review your patches in context. I see. I have some questions. > > + if (!(flags & MSG_PEEK)) > > + goto no_peek; > > + /* > > + * If this packet is MSG_PEEK'ed, dequeue it forcibly > > + * so that this packet won't prevent the caller from picking up > > + * next packet. > > + */ > > + spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags); > > + if (skb == skb_peek(&sk->sk_receive_queue)) { > > + __skb_unlink(skb, &sk->sk_receive_queue); > > + atomic_dec(&skb->users); > > + /* Do I have something to do with skb->peeked ? */ > > I don't know but you should find out before this code is merged :) Q1: Can I use skb_kill_datagram() here? skb_kill_datagram() uses spin_lock_bh() while __skb_recv_datagram() uses spin_lock_irqsave(). Since this codepath is called inside __skb_recv_datagram(), I used spin_lock_irqsave() rather than calling skb_kill_datagram(). > > > + } > > + spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags); > > +no_peek: > > + kfree_skb(skb); Q2: Do I need to use skb_free_datagram() here rather than kfree_skb()? In the past ( http://lkml.org/lkml/2007/11/16/406 ), there was no difference between skb_free_datagram() and kfree_skb(). | void skb_free_datagram(struct sock *sk, struct sk_buff *skb) | { | kfree_skb(skb); | } But now (as of 2.6.30-rc2), there is a difference. | void skb_free_datagram(struct sock *sk, struct sk_buff *skb) | { | consume_skb(skb); | sk_mem_reclaim_partial(sk); | } > > no_packet: > > *err = error; > > return NULL; Q3: Is __skb_recv_datagram() called from contexts that are not permitted to sleep? If so, TOMOYO has to check whether it is allowed to sleep, for TOMOYO will prompt the user "whether to allow App1 to read this datagram or not". Q4: Is there a way to distinguish requests from userland programs and requests from kernel code? Some kernel code (e.g. NFS) sends/receives UDP packets to deal requests from userland program's requests. TOMOYO wants to distinguish "direct requests" (requests issued by userland programs, such as open()/read()/ write() against files on NFS) and "indirect requests" (requests issued by reasons of kernel's own which are needed to handle "direct requests", such as fetching file data from NFS server). But currently, TOMOYO can't distinguish these requests. As a result, those who use NFS have to give permissions for sending/receiving UDP packets to/from NFS server to all userland programs. This means that TOMOYO allows userland programs to send/receive crafted packets to/from NFS server. I want to solve this problem. Regards. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 15 April 2009 01:12:41 am Tetsuo Handa wrote: > Hello. > > Paul Moore wrote: > > Please submit a patch set that includes both this patch as well as a > > patch to TOMOYO which makes use of these changes; this way we can > > properly review your patches in context. Thank you for sending a patchset with both TOMOYO and LSM/stack changes; this should give other developers who may not be familiar with TOMOYO a chance to review your code. My comments/concerns about the LSM changes still stand, if it is decided to merge these changes I'll be happy to review the TOMOYO changes further. > > > + if (!(flags & MSG_PEEK)) > > > + goto no_peek; > > > + /* > > > + * If this packet is MSG_PEEK'ed, dequeue it forcibly > > > + * so that this packet won't prevent the caller from picking up > > > + * next packet. > > > + */ > > > + spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags); > > > + if (skb == skb_peek(&sk->sk_receive_queue)) { > > > + __skb_unlink(skb, &sk->sk_receive_queue); > > > + atomic_dec(&skb->users); > > > + /* Do I have something to do with skb->peeked ? */ > > > > I don't know but you should find out before this code is merged :) > > Q1: Can I use skb_kill_datagram() here? > > skb_kill_datagram() uses spin_lock_bh() while __skb_recv_datagram() > uses spin_lock_irqsave(). Since this codepath is called inside > __skb_recv_datagram(), I used spin_lock_irqsave() rather than calling > skb_kill_datagram(). Since __skb_recv_datagram() is already using spin_lock_irqsave() it seems reasonable to do the same in your changes. As far as skb->peeked is concerned I don't think it matters much since you are destroying the skb anyway. > > > +no_peek: > > > + kfree_skb(skb); > > Q2: Do I need to use skb_free_datagram() here rather than kfree_skb()? > > In the past ( http://lkml.org/lkml/2007/11/16/406 ), there was no > difference between skb_free_datagram() and kfree_skb(). > > | void skb_free_datagram(struct sock *sk, struct sk_buff *skb) > | { > | kfree_skb(skb); > | } > > But now (as of 2.6.30-rc2), there is a difference. > > | void skb_free_datagram(struct sock *sk, struct sk_buff *skb) > | { > | consume_skb(skb); > | sk_mem_reclaim_partial(sk); > | } I don't know for certain, I would have to go look at other users of skb_free_datagram(), but it does look like using skb_free_datagram() instead of skb_free() might be preferable. > Q3: Is __skb_recv_datagram() called from contexts that are not permitted to > sleep? > > If so, TOMOYO has to check whether it is allowed to sleep, for TOMOYO > will prompt the user "whether to allow App1 to read this datagram or not". I believe __skb_recv_datagram() is only called via userspace so sleeping should not be an issue. > Q4: Is there a way to distinguish requests from userland programs and > requests from kernel code? I'm not sure if this is the 100% correct way to do it, but in the past I have always checked current->mm, for kernel threads this will be NULL.
Hello. Thank you for answering my questions. Paul Moore wrote: > > Q1: Can I use skb_kill_datagram() here? > > > > skb_kill_datagram() uses spin_lock_bh() while __skb_recv_datagram() > > uses spin_lock_irqsave(). Since this codepath is called inside > > __skb_recv_datagram(), I used spin_lock_irqsave() rather than calling > > skb_kill_datagram(). > > Since __skb_recv_datagram() is already using spin_lock_irqsave() it seems > reasonable to do the same in your changes. I see. I'll use spin_lock_irqsave() here. > > > > +no_peek: > > > > + kfree_skb(skb); > > > > Q2: Do I need to use skb_free_datagram() here rather than kfree_skb()? > > > > In the past ( http://lkml.org/lkml/2007/11/16/406 ), there was no > > difference between skb_free_datagram() and kfree_skb(). > > > > | void skb_free_datagram(struct sock *sk, struct sk_buff *skb) > > | { > > | kfree_skb(skb); > > | } > > > > But now (as of 2.6.30-rc2), there is a difference. > > > > | void skb_free_datagram(struct sock *sk, struct sk_buff *skb) > > | { > > | consume_skb(skb); > > | sk_mem_reclaim_partial(sk); > > | } > > I don't know for certain, I would have to go look at other users of > skb_free_datagram(), but it does look like using skb_free_datagram() instead > of skb_free() might be preferable. I see. I'll use skb_free_datagram() here. Also, I'll need to protect skb_free_datagram() with lock_sock()/ release_sock(). (Thanks to Eric Dumazet.) > > Q3: Is __skb_recv_datagram() called from contexts that are not permitted to > > sleep? > > > > If so, TOMOYO has to check whether it is allowed to sleep, for TOMOYO > > will prompt the user "whether to allow App1 to read this datagram or not". > > I believe __skb_recv_datagram() is only called via userspace so sleeping > should not be an issue. > NFS code needs to issue UDP send()/recv() requests from the kernel. Therefore, I think __skb_recv_datagram() is called from kernel space. I'm worrying that __skb_recv_datagram() is called with a spinlock held. > > Q4: Is there a way to distinguish requests from userland programs and > > requests from kernel code? > > I'm not sure if this is the 100% correct way to do it, but in the past I have > always checked current->mm, for kernel threads this will be NULL. Nowadays, it will be "current->mm && !(current->flags & PF_ KTHREAD)" because get_task_mm() says a kernel workthread may temporarily have a user mm to do its AIO. Sorry for confusing question. What I wanted to know is not "how can I distinguish kernel processes and userland processes". What I wanted to know is "how can I distinguish requests issued for processing a request >from userland process". (a) If a file is on a simple filesystem like ext3, an open()/ read()/write() request from userspace application will not issue another open()/read()/write() request. (b) If a file is on a stackable filesystem like unionfs, an open ()/create()/ unlink() request from userspace application will issue another open()/create()/unlink() request from kernel space. (c) If a file is on a networking filesystem like nfs, an open()/ create()/ unlink() request from userspace application will issue send ()/recv() request from kernel space. If (b) and (c) use a dedicated task_struct for handling requests from kernel space, TOMOYO has nothing to do. But unfortunately, (b) and (c) use a task_struct of the userspace application for handling requests from kernel space. TOMOYO wants to distinguish an open()/create()/unlink() request from userspace application and open()/create()/unlink()/send()/recv() requests needed for handling the open()/create()/unlink() request from userspace application. I wished that there is an indicator that tells TOMOYO that whether an open()/create()/unlink()/send()/recv() request is issued for processing a request from userland process. I know little about NFS, but I think NFS does not use a kernel workthread for sending/receiving UDP packets to handle an open()/read()/write() request from a userspace application. I'm afraid that checking "current->mm && !(current->flags & PF_ KTHREAD)" will not help TOMOYO in distinguishing "direct request" and "indirect request". By the way, I need to tell you one more thing about security_socket_post_accept() hook's usage. Not now, but in future, I want to introduce task's state variable which is used for dividing permissions within a domain. # Example policy for /usr/sbin/sshd allow_network TCP accept 10.0.0.0-10.255.255.255.255 1024- 65535 ; set task.state[0]=1 allow_network TCP accept 192.168.0.0-192.168.255.255 1024- 65535 ; set task.state[0]=2 allow_execute /bin/bash if task.state[0]=1 allow_execute /bin/rbash if task.state[0]=2 The above example policy allows an instance of /usr/sbin/sshd to (a) execute /bin/bash if that instance accepted a TCP connection from 10.0.0.0/8 (b) execute /bin/rbash if that instance accepted a TCP connection from 192.168.0.0/16. (c) abort TCP connections if that instance accepted a TCP connection from neither 10.0.0.0/8 nor 192.168.0.0/16. The security_socket_post_accept() hook is used for not only aborting TCP connections from unwanted peers but also associating client's information with a process who handles that TCP connection. The task's state variable definitely requires a LSM hook which is called after sock->ops-> accept() call. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Saturday 18 April 2009 04:34:02 am Tetsuo Handa wrote: > Hello. > > Thank you for answering my questions. No problem. > > I believe __skb_recv_datagram() is only called via userspace so sleeping > > should not be an issue. > > NFS code needs to issue UDP send()/recv() requests from the > kernel. Therefore, I think __skb_recv_datagram() is called from kernel > space. My mistake, I trusted (or misread) the comment at the start of the function. > I'm worrying that __skb_recv_datagram() is called with a > spinlock held. It looks like you've already solved that issue. > > > Q4: Is there a way to distinguish requests from userland programs and > > > requests from kernel code? > > > > I'm not sure if this is the 100% correct way to do it, but in the past I > > have always checked current->mm, for kernel threads this will be NULL. > > Nowadays, it will be "current->mm && !(current->flags & PF_ > KTHREAD)" because get_task_mm() says a kernel workthread may temporarily > have a user mm to do its AIO. Thanks, that is good to know. > Sorry for confusing question. What I wanted to know is not "how > can I distinguish kernel processes and userland processes". What I > wanted to know is "how can I distinguish requests issued for processing a > request from userland process". I do not know of a way but someone else reading this might. > By the way, I need to tell you one more thing about > security_socket_post_accept() hook's usage. Not now, but in future, > I want to introduce task's state variable which is used for dividing > permissions within a domain. > > # Example policy for /usr/sbin/sshd > allow_network TCP accept 10.0.0.0-10.255.255.255.255 1024- > 65535 ; set task.state[0]=1 > allow_network TCP accept 192.168.0.0-192.168.255.255 1024- > 65535 ; set task.state[0]=2 > allow_execute /bin/bash if task.state[0]=1 > allow_execute /bin/rbash if task.state[0]=2 > > The above example policy allows an instance of /usr/sbin/sshd to > (a) execute /bin/bash if that instance accepted a TCP connection > from > 10.0.0.0/8 > (b) execute /bin/rbash if that instance accepted a TCP > connection from > 192.168.0.0/16. > (c) abort TCP connections if that instance accepted a TCP > connection from > neither 10.0.0.0/8 nor 192.168.0.0/16. > > The security_socket_post_accept() hook is used for not only > aborting TCP connections from unwanted peers but also associating client's > information with a process who handles that TCP connection. The task's state > variable definitely requires a LSM hook which is called after sock->ops-> > accept() call. I don't have a problem with using a socket_post_accept() hook to assign/modify state, however, I still not like the idea of using the socket_post_accept() hook to abort connections.
Paul Moore wrote: > > The security_socket_post_accept() hook is used for not only > > aborting TCP connections from unwanted peers but also associating client's > > information with a process who handles that TCP connection. The task's state > > variable definitely requires a LSM hook which is called after sock->ops-> > > accept() call. > > I don't have a problem with using a socket_post_accept() hook to assign/modify > state, however, I still not like the idea of using the socket_post_accept() > hook to abort connections. What do you think that assigning/modifying state at socket_post_accept() could fail? TOMOYO 1.x never fails since task's state variable is directly attached to "struct task_struct" but TOMOYO 2.x can fail since task's state variable will be indirectly attached via current->cred->security and memory allocation failure for new credential could occur. Though it unlikely fails, I have to abort connections by returning an error at socket_post_accept() hook if failed. I think the socket_post_accept() hook anyway need to be able to abort connections. (Or prepare new credential at socket_accept() and add a hook for discarding prepared credential when sock->ops->accept() returned an error?) Regards. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Tue, 21 Apr 2009 19:54:06 +0900 > Paul Moore wrote: >> > The security_socket_post_accept() hook is used for not only >> > aborting TCP connections from unwanted peers but also associating client's >> > information with a process who handles that TCP connection. The task's state >> > variable definitely requires a LSM hook which is called after sock->ops-> >> > accept() call. >> >> I don't have a problem with using a socket_post_accept() hook to assign/modify >> state, however, I still not like the idea of using the socket_post_accept() >> hook to abort connections. > > What do you think that assigning/modifying state at socket_post_accept() could > fail? I have to agree with Paul. There is no sane way for the user to handle this connection being aborted, and there is no way to insert the connection back into the listening socket queue once we get to this point so we can't replay this situation either. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > There is no sane way for the user to handle this connection > being aborted, and there is no way to insert the connection > back into the listening socket queue once we get to this > point so we can't replay this situation either. > Excuse me, I couldn't catch. I don't have a problem that there is no way to insert the connection back into the listening socket queue once we get to this point. I want to drop the connection rather than pushing the connection back to the queue. I meant that "if we allow doing something after sock->ops->accept(), we need to drop the connection if something returned an error". An example of something is to assign/modify state. To modify task's state, I need to allocate memory for new credential which is an operation that could fail. Therefore, the socket_post_accept() hook should allow aborting the connection. Paul doesn't like using the socket_post_accept() hook to abort connections. But the connection is aborted if either newsock->ops->getname() or move_addr_to_user() returned an error. I wonder what's so problematic with using socket_post_accept() for dropping the connection. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Tue, 21 Apr 2009 20:39:08 +0900 > David Miller wrote: >> There is no sane way for the user to handle this connection >> being aborted, and there is no way to insert the connection >> back into the listening socket queue once we get to this >> point so we can't replay this situation either. >> > Excuse me, I couldn't catch. > > I don't have a problem that there is no way to insert the connection back into > the listening socket queue once we get to this point. I want to drop the > connection rather than pushing the connection back to the queue. I am saying that you shouldn't be dropping connections like this. You've already accepted the packet, you cannot reneg on this. You must either allow the socket to have the connection with current labels or give it a label appropriate for the socket. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > I am saying that you shouldn't be dropping connections like > this. Please don't assume that we are talking about labeled networking. I want to implement TCPwrapper-like filtering (which is automatically linked to all processes). > You've already accepted the packet, you cannot reneg on this. TCPwrapper accepts the packet before it decides to drop the connection. > You must either allow the socket to have the connection with > current labels or give it a label appropriate for the socket. The socket_post_accept() hook is not designed for modifying labels of a socket. It is too late to modify because we have already accepted the packet. + * @socket_post_accept: + * This hook allows a security module to filter connections from unwanted + * peers based on the process accepting this connection. + * The connection will be aborted if this hook returns nonzero. + * This hook is not designed for updating security attributes of + * an accept()ed socket, for the accept()ed socket has already sent + * several packets (e.g. TCP's SYN/ACK packet and some ACK packets for + * incoming data) before this hook is called. + * @sock contains the listening socket structure. + * @newsock contains the newly created server socket for connection. + * Return 0 if permission is granted. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Tue, 21 Apr 2009 21:26:36 +0900 > It is too late to modify because we have already accepted the > packet. Then queue it up somewhere, and deliver your verdict and continue packet processing later. In fact, I do not like killing accept()'s like this at all. If we tell the application, via poll() or similar, that connections are available and no other thread can get in there to steal the connection, we must deliver a connection to the listening socket when it calls accept() except in very unusual circumstances (out of file descriptors, memory allocation failure, etc.)! I believe this idea is conceptually very broken, sorry. The semantics are absolutely horrible. Put this stuff in userspace, where you say it already is. The kernel isn't the place to dump every cool feature you want to bring in from userspace. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > Put this stuff in userspace, where you say it already is. The kernel > isn't the place to dump every cool feature you want to bring in from > userspace. If I can do this stuff in userspace, I already put this stuff in userspace. Access control in userspace is easily bypassed, no good for policy based mandatory access control. The reason I propose these hooks in kernel is that nobody can bypass these hooks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Tue, 21 Apr 2009 21:52:57 +0900 > David Miller wrote: >> Put this stuff in userspace, where you say it already is. The kernel >> isn't the place to dump every cool feature you want to bring in from >> userspace. > If I can do this stuff in userspace, I already put this stuff in userspace. > Access control in userspace is easily bypassed, no good for policy based > mandatory access control. The reason I propose these hooks in kernel is that > nobody can bypass these hooks. They you have to make your decisions when the packet arrives in order to not break application expected semantics. If poll() says to a listening socket that connections are available, we MUST return a connection from accept() unless there is a hard error. The current proposal is not acceptable. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > If poll() says to a listening socket that connections are > available, we MUST return a connection from accept() unless > there is a hard error. No application is permitted to assume that accept() returns a connection if poll() says that connections are available even if there is an assumption that none of sock->ops->accept(), newsock->ops->getname(), move_addr_to_user() fails, for there is security_socket_accept() hook which can interrupt between "poll()" and "accept()". There is a possibility that LSM module's policy changes from "allow picking the connection up from the listening socket" to "deny picking the connection up from the listening socket" after poll() said "connections are available". We allow this policy change and we tolerate breakage of the application expected semantics. > If we tell the application, via poll() or similar, that connections > are available and no other thread can get in there to steal the > connection, we must deliver a connection to the listening socket when > it calls accept() except in very unusual circumstances (out of file > descriptors, memory allocation failure, etc.)! If you think policy changes between poll() and security_socket_accept() is one of unusual circumstances, I think policy changes between security_socket_accept() and security_socket_post_accept() (the old policy before poll() said "connections are available" was "allow picking the connection up from any client" while the new policy after poll() said "connections are available" is "allow picking the connection up from only 127.0.0.1") is also one of unusual circumstances. The only difference between security_socket_accept() and security_socket_post_accept() is that whether the connection remains in the listening socket's queue or not. We cannot avoid breakage of the application expected semantics from the beginning. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date: Wed, 22 Apr 2009 09:55:26 +0900 > David Miller wrote: >> If poll() says to a listening socket that connections are >> available, we MUST return a connection from accept() unless >> there is a hard error. > > No application is permitted to assume that accept() returns a connection if > poll() says that connections are available even if there is an assumption that > none of sock->ops->accept(), newsock->ops->getname(), move_addr_to_user() > fails, for there is security_socket_accept() hook which can interrupt between > "poll()" and "accept()". > > There is a possibility that LSM module's policy changes from "allow picking > the connection up from the listening socket" to "deny picking the connection > up from the listening socket" after poll() said "connections are available". > We allow this policy change and we tolerate breakage of the application > expected semantics. We had a similar situation with read()'s on UDP sockets. When poll() says something, it has to stick. This is not discussable. If these semantics are "necessary", then what you're doing is definitely misdesigned in my opinion. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > We had a similar situation with read()'s on UDP sockets. > > When poll() says something, it has to stick. To adhere what poll() said (i.e. "connections are ready" or "datagrams are ready"), security_socket_accept() and security_socket_recvmsg() hooks must be removed. Otherwise, LSM users cannot adhere what poll() said. However, security_socket_accept() and security_socket_recvmsg() hooks remain there. LSM users are already using semantics which may not adhere what poll() said. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 21, 2009 at 06:14:11PM -0700, David Miller wrote: > We had a similar situation with read()'s on UDP sockets. > > When poll() says something, it has to stick. Isn't that completely different? Anyone who writes code that calls accept() quickly finds out that in the real world it fails for all kinds of reasons worth ignoring. As an example, a comment in ircd at the only accept call (circa 1998): /* ** There may be many reasons for error return, but in otherwise ** correctly working environment the probable cause is running ** out of file descriptors (EMFILE, ENFILE or others?). The ** man pages for accept don't seem to list these as possible, ** although it's obvious that it may happen here. ** Thus no specific errors are tested at this point, just ** assume that connections cannot be accepted until some old ** is closed first. */ And it silently ignores EAGAIN, which of course is a can't happen when used with select(). The recently-written only-runs-on-Linux system I'm working on ignores EAGAIN, even though it's a can't happen with epoll. I can ask the guy who wrote it, but he's probably ignoring it because he was frequently seeing them. I'd be surprised if you found much real-life code that didn't gracefully tolerate accept failures. Can anyone come up with an example? -- greg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date: Wed, 22 Apr 2009 10:49:42 +0900 > David Miller wrote: >> We had a similar situation with read()'s on UDP sockets. >> >> When poll() says something, it has to stick. > To adhere what poll() said (i.e. "connections are ready" or "datagrams are > ready"), security_socket_accept() and security_socket_recvmsg() hooks must be > removed. Otherwise, LSM users cannot adhere what poll() said. > > However, security_socket_accept() and security_socket_recvmsg() hooks remain > there. LSM users are already using semantics which may not adhere what poll() > said. So what does your TOMOTO stuff do if the mapping changes again and that incoming connection that became unacceptable is now acceptable? We've lost the connection, and can never get it back. These semantics don't make any sense at all, and the point at which you make your choices here is totally arbitrary. Read that carefully: the point at which you are making this drop decision is arbitrary. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Greg Lindahl <greg@blekko.com> Date: Tue, 21 Apr 2009 18:52:28 -0700 > On Tue, Apr 21, 2009 at 06:14:11PM -0700, David Miller wrote: > >> We had a similar situation with read()'s on UDP sockets. >> >> When poll() says something, it has to stick. > > Isn't that completely different? Anyone who writes code that calls > accept() quickly finds out that in the real world it fails for all > kinds of reasons worth ignoring. As an example, a comment in ircd at > the only accept call (circa 1998): I said explicitly that hard errors are allows (out of file descriptors, memory allocation failure) Feel free to ignore what I'm saying, and I'll feel free to ignore you too. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > So what does your TOMOTO stuff do if the mapping changes again and > that incoming connection that became unacceptable is now acceptable? Nothing. > We've lost the connection, and can never get it back. TOMOYO won't keep the connection. That's fine. TOMOYO thinks that dropping the connection/datagram is better than choking the queue/buffer by keeping the connection/datagram for future policy changes. I browsed the poll() logic for TCP/IPv4's listening socket. SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds, long, timeout_msecs) { int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds, struct timespec *end_time) { static int do_poll(unsigned int nfds, struct poll_list *list, struct poll_wqueues *wait, struct timespec *end_time) { static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait) { file->f_op->poll(file, pwait); } } } } (file->f_op->poll == sock_poll in net/socket.c) static unsigned int sock_poll(struct file *file, poll_table *wait) { sock->ops->poll(file, sock, wait); } (sock->ops->poll == tcp_poll in net/ipv4/af_inet.c) unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait) { if (sk->sk_state == TCP_LISTEN) { static inline unsigned int inet_csk_listen_poll(const struct sock *sk) { return !reqsk_queue_empty(&inet_csk(sk)->icsk_accept_queue) ? (POLLIN | POLLRDNORM) : 0; } } } (and reqsk_queue_empty() is defined as) static inline int reqsk_queue_empty(struct request_sock_queue *queue) { return queue->rskq_accept_head == NULL; } If I read the code correctly, there is no LSM hook called for controlling whether poll() is allowed to say "connections are ready" or not. If security_socket_accept() rejects the accept() request, a connection remains in the listening socket's queue. This will cause the subsequent poll() requests to say "connections are ready" but the subsequent accept() requests to say "you are not permitted to pick a connection up", and will continue consuming 100% of CPU resource until security_socket_accept() says "you are permitted to pick a connection up". Did I miss something? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date: Wed, 22 Apr 2009 14:02:16 +0900 > Did I miss something? Do me a favor. We have mechanisms by which you can queue up packets (even into userspace) to make policy decisions. You can make your decision there, and if appropriate, reinject the packet back into the kernel input processing. This is how netfilter ip_queue allows people to implement policies in userspace, and you could use it in the kernel and then need none of these ugly hooks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > Do me a favor. We have mechanisms by which you can queue up packets > (even into userspace) to make policy decisions. I know. > > You can make your decision there, and if appropriate, reinject the > packet back into the kernel input processing. No I can't. > > This is how netfilter ip_queue allows people to implement policies > in userspace, and you could use it in the kernel and then need > none of these ugly hooks. I was suggested to use that approach in the past discussion. I want to make policy decisions based on the "task_struct" who picks up the connection/datagram. The netfilter can't predict which "task_struct" will pick up the connection/datagram. Nobody can predict it. Even security_socket_accept() and security_socket_recvmsg() can't predict it. If TOMOYO is allowed to make policy decisions based on the "task_struct" who picks up the connection/datagram, the only possible approach is to introduce security_socket_post_accept() and security_socket_post_recv_datagram(). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date: Wed, 22 Apr 2009 14:38:04 +0900 > If TOMOYO is allowed to make policy decisions based on the "task_struct" who > picks up the connection/datagram, the only possible approach is to introduce > security_socket_post_accept() and security_socket_post_recv_datagram(). Fine. FWIW I do not agree with TOMOYO conceptually. But if people are generally going to let such a scheme into the kernel, I guess I have to accept these socket layer hacks :-/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 21, 2009 at 09:23:42PM -0700, David Miller wrote: > I said explicitly that hard errors are allows (out of file > descriptors, memory allocation failure) I have no idea what you meant by a "hard" error. Note that I also discussed EAGAIN, which appears to happen commonly historically and today, and appears to be what the security module folks would want to have happen and you're rejecting. Do you consider that to be a hard error? I'm betting not. -- greg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Greg Lindahl <greg@blekko.com> Date: Tue, 21 Apr 2009 23:10:06 -0700 > On Tue, Apr 21, 2009 at 09:23:42PM -0700, David Miller wrote: > >> I said explicitly that hard errors are allows (out of file >> descriptors, memory allocation failure) > > I have no idea what you meant by a "hard" error. Note that I also > discussed EAGAIN, which appears to happen commonly historically and > today, and appears to be what the security module folks would want to > have happen and you're rejecting. Do you consider that to be a hard > error? I'm betting not. People use poll() to avoid -EAGAIN and blocking, they expect the bits to tell them what fd's they can work on to do real work. But this whole task_struct based security is bogus from the start. If I dup a file descriptor for a listening socket, and accept() in the "wrong" task, the other task has no way to accept() that connection even if it's security settings allow it. The connection is lost forever. The realm of file descriptors overlaps that of tasks. Trying to pretend this isn't the case results in all kinds of crazy things like we see being attempted here. I consider TOMOYO conceptually broken in many regards. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 21, 2009 at 11:34:03PM -0700, David Miller wrote: > People use poll() to avoid -EAGAIN and blocking, they expect the bits > to tell them what fd's they can work on to do real work. My point is that EAGAIN happens already. So you can't claim that returning it in accept() breaks the interface, when it's common enough that today's user-level network code already handles it. I have no opinion about TOMOYO. There are many reasons other than EAGAIN from accept() to complain about. -- greg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Greg Lindahl <greg@blekko.com> Date: Tue, 21 Apr 2009 23:41:59 -0700 > On Tue, Apr 21, 2009 at 11:34:03PM -0700, David Miller wrote: > >> People use poll() to avoid -EAGAIN and blocking, they expect the bits >> to tell them what fd's they can work on to do real work. > > My point is that EAGAIN happens already. So you can't claim that > returning it in accept() breaks the interface, when it's common enough > that today's user-level network code already handles it. EAGAIN does not happen if the application calls poll(), gets an indication that connections are available, and then immediately calls accept() on the indicated FD. It can only happen if it calls accept() multiple times after such a poll() call _OR_ it does not control multi-threaded access to the listen file descriptor. This new behavior from TOMOYO would make accept() return -EAGAIN in cases which are of no fault of the application. It is definitely unexpected behavior. If overly anal apps "code for it" that is entirely besides the point. What we have to be concerned for, from a kernel behavioral standpoint, is that some apps "might not code for it". This is why we don't change behavior. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 21, 2009 at 11:46:18PM -0700, David Miller wrote: > EAGAIN does not happen if the application calls poll(), gets > an indication that connections are available, and then > immediately calls accept() on the indicated FD. I have observed it recently and historically, and not by calling accept() repeatedly. I don't know what you mean by "immediately" since I don't think you're advocating race conditions; the other end can always exit/reset/whatever between the poll() and the accept(). > If overly anal apps "code for it" that is entirely besides the point. > What we have to be concerned for, from a kernel behavioral standpoint, > is that some apps "might not code for it". This is why we don't > change behavior. I am suggesting that you survey actual apps. If you find that they're all overly anal, then maybe you've learned something about EAGAIN already happening today. I assure you that the co-worker who stuck in the "ignore EAGAIN without logging it" only did so because he saw it fairly frequently. He's that way. -- greg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Greg Lindahl <greg@blekko.com> Date: Tue, 21 Apr 2009 23:54:43 -0700 > On Tue, Apr 21, 2009 at 11:46:18PM -0700, David Miller wrote: > >> If overly anal apps "code for it" that is entirely besides the point. >> What we have to be concerned for, from a kernel behavioral standpoint, >> is that some apps "might not code for it". This is why we don't >> change behavior. > > I am suggesting that you survey actual apps. If you find that they're > all overly anal, then maybe you've learned something about EAGAIN > already happening today. I assure you that the co-worker who stuck in > the "ignore EAGAIN without logging it" only did so because he saw it > fairly frequently. He's that way. It's likely a bug and should have been reported. Unexplainable behavior isn't something to ignore. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > If I dup a file descriptor for a listening socket, and accept() in the > "wrong" task, the other task has no way to accept() that connection > even if it's security settings allow it. The connection is lost > forever. Why the connection gets lost? If two tasks' security settings are the same, the process whichever reached sock->ops->accept() first will get the connetion. If two tasks' security settings are not the same, I warned it on the patch descripption. > This new behavior from TOMOYO would make accept() return -EAGAIN in > cases which are of no fault of the application. It is definitely > unexpected behavior. TOMOYO will return -ECONNABORTED, which is also returned by failure of newsock->ops->getname(). If there were some application which can't handle accept() returning -ECONNABORTED error, we can simply disable this filtering (by giving such application permission to accept connection from all addresses). Applications should be able to handle accept() error other than -EAGAIN. It is legal to return (for example) -ENOMEM, -EPERM. "man 2 accept" says: ERRORS accept() shall fail if: EAGAIN or EWOULDBLOCK The socket is marked non-blocking and no connections are present to be accepted. EBADF The descriptor is invalid. ECONNABORTED A connection has been aborted. EINTR The system call was interrupted by a signal that was caught before a valid connection arrived. EINVAL Socket is not listening for connections, or addrlen is invalid (e.g., is negative). EMFILE The per-process limit of open file descriptors has been reached. ENFILE The system limit on the total number of open files has been reached. ENOTSOCK The descriptor references a file, not a socket. EOPNOTSUPP The referenced socket is not of type SOCK_STREAM. accept() may fail if: EFAULT The addr argument is not in a writable part of the user address space. ENOBUFS, ENOMEM Not enough free memory. This often means that the memory allocation is limited by the socket buffer limits, not by the system memory. EPROTO Protocol error. Linux accept() may fail if: EPERM Firewall rules forbid connection. In addition, network errors for the new socket and as defined for the protocol may be returned. Various Linux kernels can return other errors such as ENOSR, ESOCKTNOSUPPORT, EPROTONOSUPPORT, ETIMEDOUT. The value ERESTARTSYS may be seen during a trace. Linux 2.6.7 2004-06-17 ACCEPT(2) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote:
> FWIW I do not agree with TOMOYO conceptually.
That will be my fault. I haven't explained you the background of this proposal.
Would you please be patient and read below explanation?
Thanks.
----------
TOMOYO is a security module which focuses on behavior of a system. A process is
created to achieve something. TOMOYO lets each process declare behaviors and
resources needed to achieve its purpose (like an immigration officer) and
permits only declared behaviors and resources (like an operation watchdog).
TOMOYO has an unprecedented concept called "process invocation history" (in
short, PIH). TOMOYO utilizes the PIH for categorizing the purpose of a process.
The PIH is stored into current->cred->security and is defined as concatenation
of program's pathnames ever executed. For example, /sbin/init invoked from the
kernel is defined as "<kernel> /sbin/init", /etc/rc.d/rc.sysinit invoked from
/sbin/init invoked from the kernel is defined as
"<kernel> /sbin/init /etc/rc.d/rc.sysinit". (There are some exceptions, but I
omit explanation because exceptions have no linkage with this proposal.)
**TOMOYO's policy is PIH-driven.** For example,
<kernel> /sbin/init
allow_read /etc/inittab
means that any process with PIH "<kernel> /sbin/init" is allowed to open a file
named /etc/inittab for reading.
<kernel> /usr/sbin/sshd
allow_create /var/run/sshd.pid
means that any process with PIH "<kernel> /usr/sbin/sshd" is allowed to create
a file named /var/run/sshd.pid .
<kernel> /usr/sbin/sshd /bin/bash /usr/bin/curl
allow_network TCP connect 192.168.1.1 80
allow_network UDP connect 192.168.1.2 53
means that any process with PIH
"<kernel> /usr/sbin/sshd /bin/bash /usr/bin/curl" is allowed to send TCP
connect() requests to 192.168.1.1 port 80 and is allowed to send UDP datagrams
to 192.168.1.2 port 53.
TOMOYO wants to allow writing policy for incoming connections/datagrams in the
same manner. For example,
<kernel> /usr/sbin/sshd
allow_network TCP accept 10.0.0.0-10.255.255.255 1024-65535
means that any process with PIH "<kernel> /usr/sbin/sshd" is allowed to pick up
TCP connections from 10.0.0.0/8 port 1024-65535.
To be able to write in the same manner, TOMOYO needs to know the PIH of a
process who is about to pick up the incoming connection/datagram.
The PIH (i.e. current->cred->security) is different from the security context
of a socket which is going to enqueue the incoming connection/datagram
(i.e. "struct sock"->sk_security). And LSM has no hooks which allow TOMOYO to
use current->cred->security for incoming connections/datagrams.
There could be some programs which get confused by accept()/recvmsg() returning
an error when poll() said "connections are ready" or "datagrams are ready".
If we find such programs, we can tell TOMOYO to disable filtering for such
programs.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Thu, 23 Apr 2009 23:00:09 +0900 > David Miller wrote: >> FWIW I do not agree with TOMOYO conceptually. > That will be my fault. I haven't explained you the background of this proposal. > Would you please be patient and read below explanation? Save your typing, I've read all of these descriptions, I still do not like it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > There could be some programs which get confused by accept()/recvmsg() returning > an error when poll() said "connections are ready" or "datagrams are ready". > If we find such programs, we can tell TOMOYO to disable filtering for such > programs. Hello Tetsuo, this will introduce a way to bypass the security system (?) If TOMOYO won't filter such programs, people may add this "poll()" feature to their code, in order to escape the security system. I think it's strange for a security system to allow some programs because of specific code issue, and not because of security reasons. sam -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > People use poll() to avoid -EAGAIN and blocking, they expect the bits > to tell them what fd's they can work on to do real work. I found that "man 2 select" says Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block. Linux 2.6.16 2006-03-11 SELECT(2) People cannot use "poll()" to avoid blocking. Applications had better not to completely depend on what poll() says. > The current proposal is not acceptable. You don't like TOMOYO's concept. I see. But I don't see the reason you can't accept this proposal. What does this proposal break? Please explain me. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date: Fri, 24 Apr 2009 11:07:28 +0900 > David Miller wrote: >> People use poll() to avoid -EAGAIN and blocking, they expect the bits >> to tell them what fd's they can work on to do real work. > > I found that "man 2 select" says > > Under Linux, select() may report a socket file descriptor as "ready for > reading", while nevertheless a subsequent read blocks. This could for example > happen when data has arrived but upon examination has wrong checksum and is > discarded. You won't give up will you? If you're trying to irritate me, it's working. This behavior mentioned in that manpage snippet is a BUG which we FIXED! You're just proving my point even more! The poll() paths now cause a bypass of the delayed checksum verification, which used to cause that above mentioned incorrect behavior. Don't push man page crap at me. Instead, actually understand what the code does and how it behaves. This man page snipped above is completely wrong and buggy. Never trust documentation over code. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > This behavior mentioned in that manpage snippet is a BUG which we > FIXED! You're just proving my point even more! So, people can use poll() to avoid -EAGAIN and blocking. I see. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
OK. I understood that security_socket_post_recv_datagram() must not return -EAGAIN if a process calls recvmsg() after poll() said "ready". That will be also true for security_socket_recvmsg(). Is it OK for security_socket_recvmsg()/security_socket_accept() to return an error other than -EAGAIN? (In other words, security_socket_recvmsg()/security_socket_accept() errors are one of "hard" errors?) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date: Fri, 24 Apr 2009 14:26:24 +0900 > OK. I understood that security_socket_post_recv_datagram() must not return > -EAGAIN if a process calls recvmsg() after poll() said "ready". > That will be also true for security_socket_recvmsg(). > > > Is it OK for security_socket_recvmsg()/security_socket_accept() to return > an error other than -EAGAIN? > (In other words, security_socket_recvmsg()/security_socket_accept() errors are > one of "hard" errors?) I think you really need to wrap your head around the fact that you can't decide after you've accepted a packet, that it's no longer acceptable. Once it's in the socket's queue, and you tell the application it's there at poll() time, you simply cannot reneg. You just can't. Otherwise you're breaking the whole premise upon which these UNIX system calls are based. This is how people use these things. Are you beginning to understand the fundamental problems I have with your work? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > Are you beginning to understand the fundamental problems I have with > your work? You are saying that we must not discard what we have in the queue after we told somebody that we have something in the queue unless we get "hard" errors. > If poll() says to a listening socket that connections are > available, we MUST return a connection from accept() unless > there is a hard error. I haven't understood what you meant by a "hard" error. What I want to know is, are errors returned by security_socket_accept() and security_socket_recvmsg() "hard" errors? If these errors are not "hard" errors, security_socket_accept() and security_socket_recvmsg() must not return an error because it will result in "not returning a connection/datagram", for you are saying that only "hard" errors can justify "not returning a connection/datagram". If these errors are "hard" errors, security_socket_accept() and security_socket_recvmsg() can justify "not returning a connection/datagram". But errors by newsock->ops->getname() or move_addr_to_user() (these are also "hard" errors, aren't they?) justified "discarding a connection/datagram". Then, if a "hard" error was returned by LSM hooks after a connection/datagram was dequeued, I think we can justify "discarding that connection/datagram". -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- security-testing-2.6.git.orig/include/linux/security.h +++ security-testing-2.6.git/include/linux/security.h @@ -880,6 +880,17 @@ static inline void security_free_mnt_opt * @sock contains the listening socket structure. * @newsock contains the newly created server socket for connection. * Return 0 if permission is granted. + * @socket_post_accept: + * This hook allows a security module to filter connections from unwanted + * peers based on the process accepting this connection. + * The connection will be aborted if this hook returns nonzero. + * This hook is not designed for updating security attributes of + * an accept()ed socket, for the accept()ed socket has already sent + * several packets (e.g. TCP's SYN/ACK packet and some ACK packets for + * incoming data) before this hook is called. + * @sock contains the listening socket structure. + * @newsock contains the newly created server socket for connection. + * Return 0 if permission is granted. * @socket_sendmsg: * Check permission before transmitting a message to another socket. * @sock contains the socket structure. @@ -893,6 +904,15 @@ static inline void security_free_mnt_opt * @size contains the size of message structure. * @flags contains the operational flags. * Return 0 if permission is granted. + * @socket_post_recv_datagram: + * Check permission after receiving a datagram. + * This hook allows a security module to filter packets + * from unwanted peers based on the process receiving this datagram. + * The packet will be discarded if this hook returns nonzero. + * @sk contains the socket. + * @skb contains the socket buffer. + * @flags contains the operational flags. + * Return 0 if permission is granted. * @socket_getsockname: * Check permission before the local address (name) of the socket object * @sock is retrieved. @@ -1549,10 +1569,13 @@ struct security_operations { struct sockaddr *address, int addrlen); int (*socket_listen) (struct socket *sock, int backlog); int (*socket_accept) (struct socket *sock, struct socket *newsock); + int (*socket_post_accept) (struct socket *sock, struct socket *newsock); int (*socket_sendmsg) (struct socket *sock, struct msghdr *msg, int size); int (*socket_recvmsg) (struct socket *sock, struct msghdr *msg, int size, int flags); + int (*socket_post_recv_datagram) (struct sock *sk, struct sk_buff *skb, + unsigned int flags); int (*socket_getsockname) (struct socket *sock); int (*socket_getpeername) (struct socket *sock); int (*socket_getsockopt) (struct socket *sock, int level, int optname); @@ -2530,9 +2553,12 @@ int security_socket_bind(struct socket * int security_socket_connect(struct socket *sock, struct sockaddr *address, int addrlen); int security_socket_listen(struct socket *sock, int backlog); int security_socket_accept(struct socket *sock, struct socket *newsock); +int security_socket_post_accept(struct socket *sock, struct socket *newsock); int security_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size); int security_socket_recvmsg(struct socket *sock, struct msghdr *msg, int size, int flags); +int security_socket_post_recv_datagram(struct sock *sk, struct sk_buff *skb, + unsigned int flags); int security_socket_getsockname(struct socket *sock); int security_socket_getpeername(struct socket *sock); int security_socket_getsockopt(struct socket *sock, int level, int optname); @@ -2608,6 +2634,12 @@ static inline int security_socket_accept return 0; } +static inline int security_socket_post_accept(struct socket *sock, + struct socket *newsock) +{ + return 0; +} + static inline int security_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size) { @@ -2621,6 +2653,13 @@ static inline int security_socket_recvms return 0; } +static inline int security_socket_post_recv_datagram(struct sock *sk, + struct sk_buff *skb, + unsigned int flags) +{ + return 0; +} + static inline int security_socket_getsockname(struct socket *sock) { return 0; --- security-testing-2.6.git.orig/net/core/datagram.c +++ security-testing-2.6.git/net/core/datagram.c @@ -55,6 +55,7 @@ #include <net/checksum.h> #include <net/sock.h> #include <net/tcp_states.h> +#include <linux/security.h> /* * Is a socket 'connection oriented' ? @@ -148,6 +149,7 @@ struct sk_buff *__skb_recv_datagram(stru { struct sk_buff *skb; long timeo; + unsigned long cpu_flags; /* * Caller is allowed not to check sk->sk_err before skb_recv_datagram() */ @@ -165,7 +167,6 @@ struct sk_buff *__skb_recv_datagram(stru * Look at current nfs client by the way... * However, this function was corrent in any case. 8) */ - unsigned long cpu_flags; spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags); skb = skb_peek(&sk->sk_receive_queue); @@ -179,6 +180,14 @@ struct sk_buff *__skb_recv_datagram(stru } spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags); + /* Filter packets from unwanted peers. */ + if (skb) { + error = security_socket_post_recv_datagram(sk, skb, + flags); + if (error) + goto force_dequeue; + } + if (skb) return skb; @@ -191,6 +200,24 @@ struct sk_buff *__skb_recv_datagram(stru return NULL; +force_dequeue: + /* Drop this packet because LSM says "Don't pass it to the caller". */ + if (!(flags & MSG_PEEK)) + goto no_peek; + /* + * If this packet is MSG_PEEK'ed, dequeue it forcibly + * so that this packet won't prevent the caller from picking up + * next packet. + */ + spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags); + if (skb == skb_peek(&sk->sk_receive_queue)) { + __skb_unlink(skb, &sk->sk_receive_queue); + atomic_dec(&skb->users); + /* Do I have something to do with skb->peeked ? */ + } + spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags); +no_peek: + kfree_skb(skb); no_packet: *err = error; return NULL; --- security-testing-2.6.git.orig/net/socket.c +++ security-testing-2.6.git/net/socket.c @@ -1519,6 +1519,11 @@ SYSCALL_DEFINE4(accept4, int, fd, struct if (err < 0) goto out_fd; + /* Filter connections from unwanted peers. */ + err = security_socket_post_accept(sock, newsock); + if (err) + goto out_fd; + if (upeer_sockaddr) { if (newsock->ops->getname(newsock, (struct sockaddr *)&address, &len, 2) < 0) { --- security-testing-2.6.git.orig/security/capability.c +++ security-testing-2.6.git/security/capability.c @@ -620,6 +620,11 @@ static int cap_socket_accept(struct sock return 0; } +static int cap_socket_post_accept(struct socket *sock, struct socket *newsock) +{ + return 0; +} + static int cap_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size) { return 0; @@ -631,6 +636,12 @@ static int cap_socket_recvmsg(struct soc return 0; } +static int cap_socket_post_recv_datagram(struct sock *sk, struct sk_buff *skb, + unsigned int flags) +{ + return 0; +} + static int cap_socket_getsockname(struct socket *sock) { return 0; @@ -1010,8 +1021,10 @@ void security_fixup_ops(struct security_ set_to_cap_if_null(ops, socket_connect); set_to_cap_if_null(ops, socket_listen); set_to_cap_if_null(ops, socket_accept); + set_to_cap_if_null(ops, socket_post_accept); set_to_cap_if_null(ops, socket_sendmsg); set_to_cap_if_null(ops, socket_recvmsg); + set_to_cap_if_null(ops, socket_post_recv_datagram); set_to_cap_if_null(ops, socket_getsockname); set_to_cap_if_null(ops, socket_getpeername); set_to_cap_if_null(ops, socket_setsockopt); --- security-testing-2.6.git.orig/security/security.c +++ security-testing-2.6.git/security/security.c @@ -1007,6 +1007,11 @@ int security_socket_accept(struct socket return security_ops->socket_accept(sock, newsock); } +int security_socket_post_accept(struct socket *sock, struct socket *newsock) +{ + return security_ops->socket_post_accept(sock, newsock); +} + int security_socket_sendmsg(struct socket *sock, struct msghdr *msg, int size) { return security_ops->socket_sendmsg(sock, msg, size); @@ -1018,6 +1023,12 @@ int security_socket_recvmsg(struct socke return security_ops->socket_recvmsg(sock, msg, size, flags); } +int security_socket_post_recv_datagram(struct sock *sk, struct sk_buff *skb, + unsigned int flags) +{ + return security_ops->socket_post_recv_datagram(sk, skb, flags); +} + int security_socket_getsockname(struct socket *sock) { return security_ops->socket_getsockname(sock);