Message ID | 20190307000316.31133-1-viro@ZenIV.linux.org.uk |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Series | [1/8] aio: make sure file is pinned | expand |
On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > From: Al Viro <viro@zeniv.linux.org.uk> > > "aio: remove the extra get_file/fput pair in io_submit_one" was > too optimistic - not dereferencing file pointer after e.g. > ->write_iter() returns is not enough; that reference might've been > the only thing that kept alive objects that are referenced > *before* the method returns. Such as inode, for example... I still; think that this is actually _worse_ than just having the refcount on the req instead. As it is, we have that completely insane "ref can go away from under us", because nothing keeps that around, which then causes all those other crazy issues with "woken" etc garbage. I think we should be able to get rid of those entirely. Make the poll() case just return zero if it has added the entry successfully to poll queue. No need for "woken", no need for all that odd "oh, but now the req might no longer exist". The refcount wasn't the problem. Everything *else* was the problem, including only using the refcount for the poll case etc. Linus
On Wed, Mar 06, 2019 at 04:23:04PM -0800, Linus Torvalds wrote: > On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > From: Al Viro <viro@zeniv.linux.org.uk> > > > > "aio: remove the extra get_file/fput pair in io_submit_one" was > > too optimistic - not dereferencing file pointer after e.g. > > ->write_iter() returns is not enough; that reference might've been > > the only thing that kept alive objects that are referenced > > *before* the method returns. Such as inode, for example... > > I still; think that this is actually _worse_ than just having the > refcount on the req instead. > > As it is, we have that completely insane "ref can go away from under > us", because nothing keeps that around, which then causes all those > other crazy issues with "woken" etc garbage. > > I think we should be able to get rid of those entirely. Make the > poll() case just return zero if it has added the entry successfully to > poll queue. No need for "woken", no need for all that odd "oh, but > now the req might no longer exist". Not really. Sure, you can get rid of "might no longer exist" considerations, but you still need to decide which way do we want to handle it. There are 3 cases: * it's already taken up; don't put on the list for possible cancel, don't call aio_complete(). * will eventually be woken up; put on the list for possible cancle, don't call aio_complete(). * wanted to be on several queues, fortunately not woken up yet. Make sure it's gone from queue, return an error. * none of the above, and ->poll() has reported what we wanted from the very beginning. Remove from queue, call aio_complete(). You'll need some logics to handle that. I can buy the "if we know the req is still alive, we can check if it's still queued instead of separate woken flag", but but it won't win you much ;-/
On Thu, Mar 07, 2019 at 12:41:59AM +0000, Al Viro wrote: > On Wed, Mar 06, 2019 at 04:23:04PM -0800, Linus Torvalds wrote: > > On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > From: Al Viro <viro@zeniv.linux.org.uk> > > > > > > "aio: remove the extra get_file/fput pair in io_submit_one" was > > > too optimistic - not dereferencing file pointer after e.g. > > > ->write_iter() returns is not enough; that reference might've been > > > the only thing that kept alive objects that are referenced > > > *before* the method returns. Such as inode, for example... > > > > I still; think that this is actually _worse_ than just having the > > refcount on the req instead. > > > > As it is, we have that completely insane "ref can go away from under > > us", because nothing keeps that around, which then causes all those > > other crazy issues with "woken" etc garbage. > > > > I think we should be able to get rid of those entirely. Make the > > poll() case just return zero if it has added the entry successfully to > > poll queue. No need for "woken", no need for all that odd "oh, but > > now the req might no longer exist". > > Not really. Sure, you can get rid of "might no longer exist" > considerations, but you still need to decide which way do we want to > handle it. There are 3 cases: > * it's already taken up; don't put on the list for possible > cancel, don't call aio_complete(). > * will eventually be woken up; put on the list for possible > cancle, don't call aio_complete(). > * wanted to be on several queues, fortunately not woken up > yet. Make sure it's gone from queue, return an error. > * none of the above, and ->poll() has reported what we wanted > from the very beginning. Remove from queue, call aio_complete(). > > You'll need some logics to handle that. I can buy the "if we know > the req is still alive, we can check if it's still queued instead of > separate woken flag", but but it won't win you much ;-/ If anything, the one good reason for refcount would be the risk that some ->read_iter() or ->write_iter() will try to dereference iocb after having decided to return -EIOCBQUEUED and submitted all bios. I think that doesn't happen, but making sure it doesn't would be a good argument in favour of that refcount.
On Thu, Mar 07, 2019 at 12:48:28AM +0000, Al Viro wrote: > On Thu, Mar 07, 2019 at 12:41:59AM +0000, Al Viro wrote: > > On Wed, Mar 06, 2019 at 04:23:04PM -0800, Linus Torvalds wrote: > > > On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > > > From: Al Viro <viro@zeniv.linux.org.uk> > > > > > > > > "aio: remove the extra get_file/fput pair in io_submit_one" was > > > > too optimistic - not dereferencing file pointer after e.g. > > > > ->write_iter() returns is not enough; that reference might've been > > > > the only thing that kept alive objects that are referenced > > > > *before* the method returns. Such as inode, for example... > > > > > > I still; think that this is actually _worse_ than just having the > > > refcount on the req instead. > > > > > > As it is, we have that completely insane "ref can go away from under > > > us", because nothing keeps that around, which then causes all those > > > other crazy issues with "woken" etc garbage. > > > > > > I think we should be able to get rid of those entirely. Make the > > > poll() case just return zero if it has added the entry successfully to > > > poll queue. No need for "woken", no need for all that odd "oh, but > > > now the req might no longer exist". > > > > Not really. Sure, you can get rid of "might no longer exist" > > considerations, but you still need to decide which way do we want to > > handle it. There are 3 cases: > > * it's already taken up; don't put on the list for possible > > cancel, don't call aio_complete(). > > * will eventually be woken up; put on the list for possible > > cancle, don't call aio_complete(). > > * wanted to be on several queues, fortunately not woken up > > yet. Make sure it's gone from queue, return an error. > > * none of the above, and ->poll() has reported what we wanted > > from the very beginning. Remove from queue, call aio_complete(). > > > > You'll need some logics to handle that. I can buy the "if we know > > the req is still alive, we can check if it's still queued instead of > > separate woken flag", but but it won't win you much ;-/ > > If anything, the one good reason for refcount would be the risk that > some ->read_iter() or ->write_iter() will try to dereference iocb > after having decided to return -EIOCBQUEUED and submitted all bios. > I think that doesn't happen, but making sure it doesn't would be > a good argument in favour of that refcount. *grumble* It is a good argument, unfortunately ;-/ Proof that instances do not step into that is rather subtle and won't take much to break. OK... I'll try to massage that series on top of your patch; I still hate the post-vfs_poll() logics in aio_poll() ;-/ Give me about half an hour and I'll have something to post.
On Wed, Mar 6, 2019 at 5:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > I'll try to massage that series on top of your patch; I still hate the > post-vfs_poll() logics in aio_poll() ;-/ Give me about half an hour > and I'll have something to post. No inherent hurry, I sent the ping just to make sure it hadn't gotten lost. And yeah, I think the post-vfs_poll() logic cannot possibly be necessary. My gut feel is that *if* we have the refcounting right, then we should be able to just let the wakeup come in at any later point, and ordering shouldn't matter all that much, and we shouldn't even need any locking. I'd like to think that it can be done with something like "just 'or' in the mask atomically" (so that we don't care about ordering between the synchronous vfs_poll() and the async poll wakeup), together with "when refcount goes to zero, finish the thing off and complete it" (so that we don't care who finishes first). No "woken" logic, no "who fired first" logic, no BS. Just make the operations work regardless of ordering. And maybe it can't be done. But the current model seems just so hacky that it can't be the right model. Linus
On Wed, Mar 06, 2019 at 05:30:21PM -0800, Linus Torvalds wrote: > On Wed, Mar 6, 2019 at 5:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > I'll try to massage that series on top of your patch; I still hate the > > post-vfs_poll() logics in aio_poll() ;-/ Give me about half an hour > > and I'll have something to post. > > No inherent hurry, I sent the ping just to make sure it hadn't gotten lost. > > And yeah, I think the post-vfs_poll() logic cannot possibly be > necessary. My gut feel is that *if* we have the refcounting right, > then we should be able to just let the wakeup come in at any later > point, and ordering shouldn't matter all that much, and we shouldn't > even need any locking. > > I'd like to think that it can be done with something like "just 'or' > in the mask atomically" (so that we don't care about ordering between > the synchronous vfs_poll() and the async poll wakeup), together with > "when refcount goes to zero, finish the thing off and complete it" (so > that we don't care who finishes first). > > No "woken" logic, no "who fired first" logic, no BS. Just make the > operations work regardless of ordering. > > And maybe it can't be done. But the current model seems just so hacky > that it can't be the right model. Umm... It is kinda-sorta doable; we do need something vaguely similar to ->woken ("should we add it to the list of cancellables, or is the async reference already gone?"), but other than that it seems to be feasible. See vfs.git#work.aio; the crucial bits are in these commits: keep io_event in aio_kiocb get rid of aio_complete() res/res2 arguments move aio_complete() to final iocb_put(), try to fix aio_poll() logics The first two are preparations, the last is where the fixes (hopefully) happen. The logics in aio_poll() after vfs_poll(): * we might want to steal the async reference (e.g. due to event returned from the very beginning, or due to attempt to put on more than one waitqueue, which makes results unreliable). That's _NOT_ possible if the thing had been put on a waitqueue, but currently isn't there. It might be either due to early wakeup having done everything or the same having scheduled aio_poll_complete_work(). In either case, the best we can do is to ignore the return value of vfs_poll() and, in case of error, mark the sucker cancelled. We *can't* return an error in that case. * if we want and can steal the async reference, rip it from waitqueue; otherwise, put it on the "cancellable" list, unless it's already gone or unless we are simulating the cancel ourselves. * if vfs_poll() has reported something we want and we have successufully stolen the iocb, put it there, have the reference we'd taken over dropped and return 0 Comments?
On Fri, Mar 08, 2019 at 03:36:50AM +0000, Al Viro wrote: > See vfs.git#work.aio; the crucial bits are in these commits: > keep io_event in aio_kiocb > get rid of aio_complete() res/res2 arguments > move aio_complete() to final iocb_put(), try to fix aio_poll() logics > The first two are preparations, the last is where the fixes (hopefully) > happen. Looks sensible. I'll try to run the tests over it, and I've added Avi so that maybe he can make sure that scylladb is also happy with it, that was usually the best way to find aio poll bugs..
On Fri, Mar 08, 2019 at 03:36:50AM +0000, Al Viro wrote: > See vfs.git#work.aio; the crucial bits are in these commits: > keep io_event in aio_kiocb > get rid of aio_complete() res/res2 arguments > move aio_complete() to final iocb_put(), try to fix aio_poll() logics > The first two are preparations, the last is where the fixes (hopefully) > happen. OK, refactored, cleaned up and force-pushed. Current state: Al Viro (7): keep io_event in aio_kiocb aio: store event at final iocb_put() Fix aio_poll() races make aio_read()/aio_write() return int move dropping ->ki_eventfd into iocb_destroy() deal with get_reqs_available() in aio_get_req() itself aio: move sanity checks and request allocation to io_submit_one() Linus Torvalds (1): pin iocb through aio. fs/aio.c | 327 ++++++++++++++++++++++++++++----------------------------------- 1 file changed, 146 insertions(+), 181 deletions(-)
On Sun, Mar 10, 2019 at 07:06:18AM +0000, Al Viro wrote:
> OK, refactored, cleaned up and force-pushed. Current state:
This survives the libaio test suite at least.
diff --git a/fs/aio.c b/fs/aio.c index 3d9669d011b9..ea30b78187ed 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1790,6 +1790,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, struct iocb __user *user_iocb, bool compat) { struct aio_kiocb *req; + struct file *file; ssize_t ret; /* enforce forwards compatibility on users */ @@ -1844,6 +1845,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, req->ki_user_iocb = user_iocb; req->ki_user_data = iocb->aio_data; + file = get_file(req->ki_filp); /* req can die too early */ switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: @@ -1872,6 +1874,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; break; } + fput(file); /* * If ret is 0, we'd either done aio_complete() ourselves or have