From patchwork Fri Oct 25 17:30:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 1184346 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="eO2sEvBR"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 470B4P621Zz9sPf for ; Sat, 26 Oct 2019 04:31:49 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2505892AbfJYRbl (ORCPT ); Fri, 25 Oct 2019 13:31:41 -0400 Received: from mail-il1-f195.google.com ([209.85.166.195]:38363 "EHLO mail-il1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2502816AbfJYRap (ORCPT ); Fri, 25 Oct 2019 13:30:45 -0400 Received: by mail-il1-f195.google.com with SMTP id y5so2526361ilb.5 for ; Fri, 25 Oct 2019 10:30:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=xtMxYmqRKSbwDpEY25WYyVtGPb61T4kFCH3oEpXXy68=; b=eO2sEvBRfDw2BEa78HcquHbYUy0RhJZx4eUsUoRTxcYx8NzQ+73cAFHIAlT+BdJvSM RR8Zng9yyw7W34H3jsoTaueTKN8ZFTbuhDNY7Uv0XHv2SnVDrUTmDHtC+SvNaE9d0ve3 4iq0/mbqiJ+O0NfwqTid1112TlmXnayjlpAdFmdM5D2BUhhEDehX6c6h8jIH950u3ZGn /yAUcMG0kANuUIiXPnqhrxqACGlG6eIVyp+PpjRqEyCPfJCDLLcypWKswqopy3+ynjAq W1aKTavU+7DSDLIbarMLXWYNjYTy+27+I6kK7e875Shizwvt0oR8AdErvNVjK8REwsKd e/Hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=xtMxYmqRKSbwDpEY25WYyVtGPb61T4kFCH3oEpXXy68=; b=mPoDvf3I9eiMyfpczpqMaLL4QE6aUzQlgzdJ2ycX9oFovmvyhrOCc0ft1Mus/cUelO VgzGPzSiTNJGvnyoqbNYzdLUNvC45+zhCj4p5X24Iad+GCyY213IZmMVD95SnjfDNyvG dB5T298xzE7KbOiBceo0JdolkzcTgzXKXL9ujXnPOpiovwtoXEdS0KTdpw0/7I5QnkEP An7xlLlgn+e+T1ed8KFBIgj9LKaQvXZUjRZll0WXx2CJluMipvF5GidkVMwa3nqmY7JO 7XIz5IuYTuN91+d7u0r2EJtGYaOuxrru8ojV5Gnm8aS1A5CwB6w8VUEwjGEQRICDckm5 STiw== X-Gm-Message-State: APjAAAXc06xMZ9yikwARbbmEkpp2il72DYtqAry75yIO3H8nHs6SmK1S pgN0ZDHaIKK4zxpbQShiCyn+Cw== X-Google-Smtp-Source: APXvYqzLUoLBx6alLytHVjGHDF10wjNoUs8zmUqhKKGfDvC3ay9LiURfxPT3wAjapKuZrMIKv7ZwPg== X-Received: by 2002:a92:5c4f:: with SMTP id q76mr5363295ilb.158.1572024643144; Fri, 25 Oct 2019 10:30:43 -0700 (PDT) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id g23sm323674ioe.73.2019.10.25.10.30.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2019 10:30:42 -0700 (PDT) From: Jens Axboe To: linux-block@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, jannh@google.com, Jens Axboe Subject: [PATCH 1/4] io_uring: reorder struct sqe_submit Date: Fri, 25 Oct 2019 11:30:34 -0600 Message-Id: <20191025173037.13486-2-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191025173037.13486-1-axboe@kernel.dk> References: <20191025173037.13486-1-axboe@kernel.dk> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Reorder it to pack it better, takes it from 24 bytes to 16 bytes, and io_kiocb from 192 to 184 bytes. No functional changes in this patch. Signed-off-by: Jens Axboe --- fs/io_uring.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 13c1ebf96626..effa385ebe72 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -260,10 +260,10 @@ struct io_ring_ctx { struct sqe_submit { const struct io_uring_sqe *sqe; unsigned short index; + bool has_user : 1; + bool in_async : 1; + bool needs_fixed_file : 1; u32 sequence; - bool has_user; - bool in_async; - bool needs_fixed_file; }; /* From patchwork Fri Oct 25 17:30:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 1184345 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="By5ztQT2"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 470B4K2w1Fz9sPf for ; Sat, 26 Oct 2019 04:31:45 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2505896AbfJYRbm (ORCPT ); Fri, 25 Oct 2019 13:31:42 -0400 Received: from mail-il1-f194.google.com ([209.85.166.194]:38365 "EHLO mail-il1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2502829AbfJYRar (ORCPT ); Fri, 25 Oct 2019 13:30:47 -0400 Received: by mail-il1-f194.google.com with SMTP id y5so2526429ilb.5 for ; Fri, 25 Oct 2019 10:30:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=X1tFIQN8hluXSAcqstDDG+VfYhLOozrIYHLT8zb816k=; b=By5ztQT2DfEBZqFtiPHkmtZ4atGNrkbrt1N6371R9MFytbjn9tfjsMjYAViB01DZnT Sdbk+2RIvqg/RJwUtyGVh0MDRFM54swCWhU2+CUczRRQVPqaFaTKOzPS7eCVe5SKIPX5 3FbnsFP+xwQcqEHf11G8RbgluF0nSPq30/UasF+1pIUBMGG/as3iVKsUlZIpolsQ34g0 GTeo8zbWcJ4ME29jSRBjpLkN1xyp5B8K2hrZ3CXfhE1HSZTMnzXdvqHVVAhBhQrlDjwt 8T1Q0rhlDkUpWREdHY1SanKlpTelXSlT85SglUjw1LTVuGI1hkLKaJLcSdhv19Niyye4 gcQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=X1tFIQN8hluXSAcqstDDG+VfYhLOozrIYHLT8zb816k=; b=W3syVq4yRWa87dwEP4YTEhy8z3+81gYjy5BB1obhL5mknDxTaIcfiAUHZtkozvpouM OMRCkz3UrJlpvBXtRVR2mf6OLCillChLPJtUSWvtXrH+Mabw7SKLH2MSz6fP5x8UVcEe bJO+ygoN/WB8V4R4GsxhJdstqcXvqr2S3mRfTV11bg2Ib5VCRbpiP6H7TkjOZSiNel2r L63octDJ87PlUS7Hsaz7M7tbq/4W8+rc/gkJJCqsbeuJHZK1nO7wEsuV9OQCqLoPg5LJ 9iRg5XzkUBBTVg8U3ogJHPWmfNzF1vjPuXgzKRDTALHW1i2M4ZnxwR6AaQlGei5LyQ6V 7Bew== X-Gm-Message-State: APjAAAW3+dsNYRcAuP92UZ3nv5MnYlpUHwPGeB5ipaxd4dAJBxCEdLqJ JTv5+HHmyHaq4tuI31SmV4eUxw== X-Google-Smtp-Source: APXvYqx8M6MKAd6TVEy4vcslE5xwt22GQcYvdBqfMwXsl43h2psuSIus9pfA66/b8AxrWv1ocCsBFQ== X-Received: by 2002:a92:1b49:: with SMTP id b70mr4263418ilb.180.1572024644907; Fri, 25 Oct 2019 10:30:44 -0700 (PDT) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id g23sm323674ioe.73.2019.10.25.10.30.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2019 10:30:44 -0700 (PDT) From: Jens Axboe To: linux-block@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, jannh@google.com, Jens Axboe Subject: [PATCH 2/4] io_uring: io_uring: add support for async work inheriting files Date: Fri, 25 Oct 2019 11:30:35 -0600 Message-Id: <20191025173037.13486-3-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191025173037.13486-1-axboe@kernel.dk> References: <20191025173037.13486-1-axboe@kernel.dk> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This is in preparation for adding opcodes that need to add new files in a process file table, system calls like open(2) or accept4(2). If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work item. If work that needs to get punted to async context have this set, the async worker will assume the original task file table before executing the work. Note that opcodes that need access to the current files of an application cannot be done through IORING_SETUP_SQPOLL. Signed-off-by: Jens Axboe --- fs/io-wq.c | 14 ++++++ fs/io-wq.h | 3 ++ fs/io_uring.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 129 insertions(+), 4 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 99ac5e338d99..134c4632c0be 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -52,6 +52,7 @@ struct io_worker { struct rcu_head rcu; struct mm_struct *mm; + struct files_struct *restore_files; }; struct io_wq_nulls_list { @@ -128,6 +129,12 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker) __must_hold(wqe->lock) __releases(wqe->lock) { + if (current->files != worker->restore_files) { + task_lock(current); + current->files = worker->restore_files; + task_unlock(current); + } + /* * If we have an active mm, we need to drop the wq lock before unusing * it. If we do, return true and let the caller retry the idle loop. @@ -188,6 +195,7 @@ static void io_worker_start(struct io_wqe *wqe, struct io_worker *worker) current->flags |= PF_IO_WORKER; worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING); + worker->restore_files = current->files; atomic_inc(&wqe->nr_running); } @@ -278,6 +286,12 @@ static void io_worker_handle_work(struct io_worker *worker) if (!work) break; next: + if ((work->flags & IO_WQ_WORK_NEEDS_FILES) && + current->files != work->files) { + task_lock(current); + current->files = work->files; + task_unlock(current); + } if ((work->flags & IO_WQ_WORK_NEEDS_USER) && !worker->mm && wq->mm && mmget_not_zero(wq->mm)) { use_mm(wq->mm); diff --git a/fs/io-wq.h b/fs/io-wq.h index be8f22c8937b..e93f764b1fa4 100644 --- a/fs/io-wq.h +++ b/fs/io-wq.h @@ -8,6 +8,7 @@ enum { IO_WQ_WORK_HAS_MM = 2, IO_WQ_WORK_HASHED = 4, IO_WQ_WORK_NEEDS_USER = 8, + IO_WQ_WORK_NEEDS_FILES = 16, IO_WQ_HASH_SHIFT = 24, /* upper 8 bits are used for hash key */ }; @@ -22,12 +23,14 @@ struct io_wq_work { struct list_head list; void (*func)(struct io_wq_work **); unsigned flags; + struct files_struct *files; }; #define INIT_IO_WORK(work, _func) \ do { \ (work)->func = _func; \ (work)->flags = 0; \ + (work)->files = NULL; \ } while (0) \ struct io_wq *io_wq_create(unsigned concurrency, struct mm_struct *mm); diff --git a/fs/io_uring.c b/fs/io_uring.c index effa385ebe72..5a6f8e1dc718 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -196,6 +196,8 @@ struct io_ring_ctx { struct list_head defer_list; struct list_head timeout_list; + + wait_queue_head_t inflight_wait; } ____cacheline_aligned_in_smp; /* IO offload */ @@ -250,6 +252,9 @@ struct io_ring_ctx { */ struct list_head poll_list; struct list_head cancel_list; + + spinlock_t inflight_lock; + struct list_head inflight_list; } ____cacheline_aligned_in_smp; #if defined(CONFIG_UNIX) @@ -259,11 +264,13 @@ struct io_ring_ctx { struct sqe_submit { const struct io_uring_sqe *sqe; + struct file *ring_file; unsigned short index; bool has_user : 1; bool in_async : 1; bool needs_fixed_file : 1; u32 sequence; + int ring_fd; }; /* @@ -318,10 +325,13 @@ struct io_kiocb { #define REQ_F_TIMEOUT 1024 /* timeout request */ #define REQ_F_ISREG 2048 /* regular file */ #define REQ_F_MUST_PUNT 4096 /* must be punted even for NONBLOCK */ +#define REQ_F_INFLIGHT 8192 /* on inflight list */ u64 user_data; u32 result; u32 sequence; + struct list_head inflight_entry; + struct io_wq_work work; }; @@ -402,6 +412,9 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_LIST_HEAD(&ctx->cancel_list); INIT_LIST_HEAD(&ctx->defer_list); INIT_LIST_HEAD(&ctx->timeout_list); + init_waitqueue_head(&ctx->inflight_wait); + spin_lock_init(&ctx->inflight_lock); + INIT_LIST_HEAD(&ctx->inflight_list); return ctx; } @@ -671,9 +684,20 @@ static void io_free_req_many(struct io_ring_ctx *ctx, void **reqs, int *nr) static void __io_free_req(struct io_kiocb *req) { + struct io_ring_ctx *ctx = req->ctx; + if (req->file && !(req->flags & REQ_F_FIXED_FILE)) fput(req->file); - percpu_ref_put(&req->ctx->refs); + if (req->flags & REQ_F_INFLIGHT) { + unsigned long flags; + + spin_lock_irqsave(&ctx->inflight_lock, flags); + list_del(&req->inflight_entry); + if (waitqueue_active(&ctx->inflight_wait)) + wake_up(&ctx->inflight_wait); + spin_unlock_irqrestore(&ctx->inflight_lock, flags); + } + percpu_ref_put(&ctx->refs); kmem_cache_free(req_cachep, req); } @@ -2277,6 +2301,30 @@ static int io_req_set_file(struct io_ring_ctx *ctx, const struct sqe_submit *s, return 0; } +static int io_grab_files(struct io_ring_ctx *ctx, struct io_kiocb *req) +{ + int ret = -EBADF; + + rcu_read_lock(); + spin_lock_irq(&ctx->inflight_lock); + /* + * We use the f_ops->flush() handler to ensure that we can flush + * out work accessing these files if the fd is closed. Check if + * the fd has changed since we started down this path, and disallow + * this operation if it has. + */ + if (fcheck(req->submit.ring_fd) == req->submit.ring_file) { + list_add(&req->inflight_entry, &ctx->inflight_list); + req->flags |= REQ_F_INFLIGHT; + req->work.files = current->files; + ret = 0; + } + spin_unlock_irq(&ctx->inflight_lock); + rcu_read_unlock(); + + return ret; +} + static int __io_queue_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, struct sqe_submit *s) { @@ -2296,17 +2344,25 @@ static int __io_queue_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, if (sqe_copy) { s->sqe = sqe_copy; memcpy(&req->submit, s, sizeof(*s)); - io_queue_async_work(ctx, req); + if (req->work.flags & IO_WQ_WORK_NEEDS_FILES) { + ret = io_grab_files(ctx, req); + if (ret) { + kfree(sqe_copy); + goto err; + } + } /* * Queued up for async execution, worker will release * submit reference when the iocb is actually submitted. */ + io_queue_async_work(ctx, req); return 0; } } /* drop submission reference */ +err: io_put_req(req, NULL); /* and drop final reference, if we failed */ @@ -2509,6 +2565,7 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s) head = READ_ONCE(sq_array[head & ctx->sq_mask]); if (head < ctx->sq_entries) { + s->ring_file = NULL; s->index = head; s->sqe = &ctx->sq_sqes[head]; s->sequence = ctx->cached_sq_head; @@ -2716,7 +2773,8 @@ static int io_sq_thread(void *data) return 0; } -static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit) +static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit, + struct file *ring_file, int ring_fd) { struct io_submit_state state, *statep = NULL; struct io_kiocb *link = NULL; @@ -2758,9 +2816,11 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit) } out: + s.ring_file = ring_file; s.has_user = true; s.in_async = false; s.needs_fixed_file = false; + s.ring_fd = ring_fd; submit++; trace_io_uring_submit_sqe(ctx, true, false); io_submit_sqe(ctx, &s, statep, &link); @@ -3722,6 +3782,53 @@ static int io_uring_release(struct inode *inode, struct file *file) return 0; } +static void io_uring_cancel_files(struct io_ring_ctx *ctx, + struct files_struct *files) +{ + struct io_kiocb *req; + DEFINE_WAIT(wait); + + while (!list_empty_careful(&ctx->inflight_list)) { + enum io_wq_cancel ret = IO_WQ_CANCEL_NOTFOUND; + + spin_lock_irq(&ctx->inflight_lock); + list_for_each_entry(req, &ctx->inflight_list, inflight_entry) { + if (req->work.files == files) { + ret = io_wq_cancel_work(ctx->io_wq, &req->work); + break; + } + } + if (ret == IO_WQ_CANCEL_RUNNING) + prepare_to_wait(&ctx->inflight_wait, &wait, + TASK_UNINTERRUPTIBLE); + + spin_unlock_irq(&ctx->inflight_lock); + + /* + * We need to keep going until we get NOTFOUND. We only cancel + * one work at the time. + * + * If we get CANCEL_RUNNING, then wait for a work to complete + * before continuing. + */ + if (ret == IO_WQ_CANCEL_OK) + continue; + else if (ret != IO_WQ_CANCEL_RUNNING) + break; + schedule(); + } +} + +static int io_uring_flush(struct file *file, void *data) +{ + struct io_ring_ctx *ctx = file->private_data; + + io_uring_cancel_files(ctx, data); + if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) + io_wq_cancel_all(ctx->io_wq); + return 0; +} + static int io_uring_mmap(struct file *file, struct vm_area_struct *vma) { loff_t offset = (loff_t) vma->vm_pgoff << PAGE_SHIFT; @@ -3790,7 +3897,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, to_submit = min(to_submit, ctx->sq_entries); mutex_lock(&ctx->uring_lock); - submitted = io_ring_submit(ctx, to_submit); + submitted = io_ring_submit(ctx, to_submit, f.file, fd); mutex_unlock(&ctx->uring_lock); } if (flags & IORING_ENTER_GETEVENTS) { @@ -3813,6 +3920,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, static const struct file_operations io_uring_fops = { .release = io_uring_release, + .flush = io_uring_flush, .mmap = io_uring_mmap, .poll = io_uring_poll, .fasync = io_uring_fasync, From patchwork Fri Oct 25 17:30:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 1184344 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="CGdT1qUu"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 470B4J3Dxkz9sPV for ; Sat, 26 Oct 2019 04:31:44 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2505901AbfJYRbm (ORCPT ); Fri, 25 Oct 2019 13:31:42 -0400 Received: from mail-io1-f65.google.com ([209.85.166.65]:45275 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2505881AbfJYRas (ORCPT ); Fri, 25 Oct 2019 13:30:48 -0400 Received: by mail-io1-f65.google.com with SMTP id c25so3277604iot.12 for ; Fri, 25 Oct 2019 10:30:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=t/7vAwNRes/OgSWHDpsbdJG9hBx8gt3i+4E5Djauy08=; b=CGdT1qUuvKsWPxl4sLXLPgFyVbTaw4sKniCbWUtOdSjtQxNOy7E59UEa1eUO3jdk6N 9dllHULVf9uLP2ouMpR54/XwyGmQ/n8wQR/2gnPQitAqZsD2p5lvS0mar/wvAdHUZ0fr SwKc9yWB/CZOOiaIImo3tScDuG6yXKtvBHjzG0jDNAEXIYrXeRMChySP6DbtQrAw0EIi WjB66sJgBjVwwV2GCU6VVkUqtgxdKH7AsDyM8fnv50DFXXWF6E9gwijB/lr4+IXlarI2 WttdiDYQ5VN63AwOlWyylp59aHxnvqVxZ+aNxWuHPlB68lVEL0JJ16xpVyzRk3T49ea8 QUBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=t/7vAwNRes/OgSWHDpsbdJG9hBx8gt3i+4E5Djauy08=; b=Sa8YvtiF3R95sIY+pPBWMJqSPnrVbZ+j77JcgV999UFqAPCB6p/L9V2QvraYrztRWs 8kXd34lbo5BKcrt7PbD/9ZnhXSLF0PvQeXJZqNDZD712aHeH7vlEX4UY95VBIVFCoRmC SKYw1vIbLVnJhaA+Wc1XNJQwtL4Kf7HYgaoDoH55maaffhvnHmi6HcznOrY3TsgHkJ6F 8vZe9WrE3OJKdZWqD5+J6APxPJoYcX7TEpPAiiOX4Dk4TxOSQRBSEjNfdxIUo4npFJXr d1GR7DaZOqnxxXDJMiUHNpXw6GYviSpoeQjgeWbKh5xt61gFiJ1FPqXj7EgJT1PuxZPZ cZUw== X-Gm-Message-State: APjAAAWewqpGAfgZthC1yGZfTapzGss1GlfBNJT+tTXANRQtDPnp5uyk wdjul4PF9HJWvYzE3Aey1GNKuA== X-Google-Smtp-Source: APXvYqwUDgf7JyPX6HPTQqUx/qAcpASBBCIBV3bC66/ycxJrFBqB+pbO6njJJA0HY1iYHru7aPh3wA== X-Received: by 2002:a5d:83c1:: with SMTP id u1mr1446664ior.78.1572024646595; Fri, 25 Oct 2019 10:30:46 -0700 (PDT) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id g23sm323674ioe.73.2019.10.25.10.30.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2019 10:30:45 -0700 (PDT) From: Jens Axboe To: linux-block@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, jannh@google.com, Jens Axboe Subject: [PATCH 3/4] net: add __sys_accept4_file() helper Date: Fri, 25 Oct 2019 11:30:36 -0600 Message-Id: <20191025173037.13486-4-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191025173037.13486-1-axboe@kernel.dk> References: <20191025173037.13486-1-axboe@kernel.dk> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This is identical to __sys_accept4(), except it takes a struct file instead of an fd, and it also allows passing in extra file->f_flags flags. The latter is done to support masking in O_NONBLOCK without manipulating the original file flags. No functional changes in this patch. Cc: David Miller Cc: netdev@vger.kernel.org Signed-off-by: Jens Axboe --- include/linux/socket.h | 3 ++ net/socket.c | 65 ++++++++++++++++++++++++++---------------- 2 files changed, 44 insertions(+), 24 deletions(-) diff --git a/include/linux/socket.h b/include/linux/socket.h index fc0bed59fc84..dd061f741bc1 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -392,6 +392,9 @@ extern int __sys_recvfrom(int fd, void __user *ubuf, size_t size, extern int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, struct sockaddr __user *addr, int addr_len); +extern int __sys_accept4_file(struct file *file, unsigned file_flags, + struct sockaddr __user *upeer_sockaddr, + int __user *upeer_addrlen, int flags); extern int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen, int flags); extern int __sys_socket(int family, int type, int protocol); diff --git a/net/socket.c b/net/socket.c index 6a9ab7a8b1d2..40ab39f6c5d8 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1690,24 +1690,13 @@ SYSCALL_DEFINE2(listen, int, fd, int, backlog) return __sys_listen(fd, backlog); } -/* - * For accept, we attempt to create a new socket, set up the link - * with the client, wake up the client, then return the new - * connected fd. We collect the address of the connector in kernel - * space and move it to user at the very end. This is unclean because - * we open the socket then return an error. - * - * 1003.1g adds the ability to recvmsg() to query connection pending - * status to recvmsg. We need to add that support in a way thats - * clean when we restructure accept also. - */ - -int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, - int __user *upeer_addrlen, int flags) +int __sys_accept4_file(struct file *file, unsigned file_flags, + struct sockaddr __user *upeer_sockaddr, + int __user *upeer_addrlen, int flags) { struct socket *sock, *newsock; struct file *newfile; - int err, len, newfd, fput_needed; + int err, len, newfd; struct sockaddr_storage address; if (flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK)) @@ -1716,14 +1705,14 @@ int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK)) flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK; - sock = sockfd_lookup_light(fd, &err, &fput_needed); + sock = sock_from_file(file, &err); if (!sock) goto out; err = -ENFILE; newsock = sock_alloc(); if (!newsock) - goto out_put; + goto out; newsock->type = sock->type; newsock->ops = sock->ops; @@ -1738,20 +1727,21 @@ int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, if (unlikely(newfd < 0)) { err = newfd; sock_release(newsock); - goto out_put; + goto out; } newfile = sock_alloc_file(newsock, flags, sock->sk->sk_prot_creator->name); if (IS_ERR(newfile)) { err = PTR_ERR(newfile); put_unused_fd(newfd); - goto out_put; + goto out; } err = security_socket_accept(sock, newsock); if (err) goto out_fd; - err = sock->ops->accept(sock, newsock, sock->file->f_flags, false); + err = sock->ops->accept(sock, newsock, sock->file->f_flags | file_flags, + false); if (err < 0) goto out_fd; @@ -1772,15 +1762,42 @@ int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, fd_install(newfd, newfile); err = newfd; - -out_put: - fput_light(sock->file, fput_needed); out: return err; out_fd: fput(newfile); put_unused_fd(newfd); - goto out_put; + goto out; + +} + +/* + * For accept, we attempt to create a new socket, set up the link + * with the client, wake up the client, then return the new + * connected fd. We collect the address of the connector in kernel + * space and move it to user at the very end. This is unclean because + * we open the socket then return an error. + * + * 1003.1g adds the ability to recvmsg() to query connection pending + * status to recvmsg. We need to add that support in a way thats + * clean when we restructure accept also. + */ + +int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, + int __user *upeer_addrlen, int flags) +{ + int ret = -EBADF; + struct fd f; + + f = fdget(fd); + if (f.file) { + ret = __sys_accept4_file(f.file, 0, upeer_sockaddr, + upeer_addrlen, flags); + if (f.flags) + fput(f.file); + } + + return ret; } SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr, From patchwork Fri Oct 25 17:30:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 1184343 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="w0UwFk56"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 470B4H3TBVz9sPc for ; Sat, 26 Oct 2019 04:31:43 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2505887AbfJYRbm (ORCPT ); Fri, 25 Oct 2019 13:31:42 -0400 Received: from mail-io1-f65.google.com ([209.85.166.65]:42868 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2505882AbfJYRav (ORCPT ); Fri, 25 Oct 2019 13:30:51 -0400 Received: by mail-io1-f65.google.com with SMTP id i26so3293478iog.9 for ; Fri, 25 Oct 2019 10:30:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=wTR5iTu8qxd9g7eLd8K6LBymJYwzVyhOH22aSZJ49ZA=; b=w0UwFk56i+jEFwBRCWVBN0+u6HkQbPjq8+/3iG7PpgLFvkZQi/Fa4UJyEJu9VWS24n zVzrFkZeyta6ea3T6LEyLJxunNZOf9Aag88JtZ/b7+zFXkr5tPaNOqKDWgPqLx2RClY4 mOKsw/HcVC8OtUbNpVicpEMK0sTocy0+dy0k+MxIOI0/4Ftinwp32w1961wHy5b0HYhj 8R82VclIdCCYr5O9X2rZ/dboEKQDobt+3RJp8ZQufT/jttNLOEYHdB2oaXpSw5Lts08O Eczb8OpZ8StQsFDBKlohuoOO2JDJfAhF39RnwoCBRTHL9XlfOZ4ZZPx/BCxEucnDFNoj S3Jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=wTR5iTu8qxd9g7eLd8K6LBymJYwzVyhOH22aSZJ49ZA=; b=tMqlXTjUd08yQwaVnu5UDAuIWhxOvCG6Ak409sLixh74Enta4VxqeZ0C+2k0Sca9wT wRSl0BNlDY5Qlqi404wOPUQeLeOGzXAPEpenDa9xkoQzl2K8xjVfPJ2LcV9mQG18vz5M 9mNcii7813enhazX8vMAYN2zWrwOSJyitR6u2M383JlJE2DX/oghMY20XdbWn1B4ZJXp egmNB5plBO/2PKRu+OFn2FpVk1k5YRf/XWEisa8TxGQAzZ6APQc4hJcQ7TBAGZIVz3U2 YQgs9xPHsc0oQAckBjjUl9oGzriXfw+xxINyP+qxeJLRS5cXYozBBWoA2eVL7Cigi2Zp kKag== X-Gm-Message-State: APjAAAUWQM/P01JE0eSbjWexGkSffBFuyInqHvR+aG1B/60l5Cym3HN3 4aXJmxGV+Wfnrv5bkDxQ00PUWQ== X-Google-Smtp-Source: APXvYqzddTq9xHH/LP/GQgmjueIfFEpwDhlGEjJskjOvwEijfI2Kc5EQbd0qfNFJDzzUpXG59IJOSQ== X-Received: by 2002:a6b:fa19:: with SMTP id p25mr4747107ioh.125.1572024649744; Fri, 25 Oct 2019 10:30:49 -0700 (PDT) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id g23sm323674ioe.73.2019.10.25.10.30.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2019 10:30:47 -0700 (PDT) From: Jens Axboe To: linux-block@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org, jannh@google.com, Jens Axboe Subject: [PATCH 4/4] io_uring: add support for IORING_OP_ACCEPT Date: Fri, 25 Oct 2019 11:30:37 -0600 Message-Id: <20191025173037.13486-5-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191025173037.13486-1-axboe@kernel.dk> References: <20191025173037.13486-1-axboe@kernel.dk> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This allows an application to call accept4() in an async fashion. Like other opcodes, we first try a non-blocking accept, then punt to async context if we have to. Signed-off-by: Jens Axboe --- fs/io_uring.c | 35 +++++++++++++++++++++++++++++++++++ include/uapi/linux/io_uring.h | 7 ++++++- 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 5a6f8e1dc718..4402485f0879 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1687,6 +1687,38 @@ static int io_recvmsg(struct io_kiocb *req, const struct io_uring_sqe *sqe, #endif } +static int io_accept(struct io_kiocb *req, const struct io_uring_sqe *sqe, + struct io_kiocb **nxt, bool force_nonblock) +{ +#if defined(CONFIG_NET) + struct sockaddr __user *addr; + int __user *addr_len; + unsigned file_flags; + int flags, ret; + + if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_SQPOLL))) + return -EINVAL; + + addr = (struct sockaddr __user *) (unsigned long) READ_ONCE(sqe->addr); + addr_len = (int __user *) (unsigned long) READ_ONCE(sqe->addr2); + flags = READ_ONCE(sqe->accept_flags); + file_flags = force_nonblock ? O_NONBLOCK : 0; + + ret = __sys_accept4_file(req->file, file_flags, addr, addr_len, flags); + if (ret == -EAGAIN && force_nonblock) { + req->work.flags |= IO_WQ_WORK_NEEDS_FILES; + return -EAGAIN; + } + if (ret < 0 && (req->flags & REQ_F_LINK)) + req->flags |= REQ_F_FAIL_LINK; + io_cqring_add_event(req->ctx, sqe->user_data, ret); + io_put_req(req, nxt); + return 0; +#else + return -EOPNOTSUPP; +#endif +} + static void io_poll_remove_one(struct io_kiocb *req) { struct io_poll_iocb *poll = &req->poll; @@ -2174,6 +2206,9 @@ static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, case IORING_OP_TIMEOUT_REMOVE: ret = io_timeout_remove(req, s->sqe); break; + case IORING_OP_ACCEPT: + ret = io_accept(req, s->sqe, nxt, force_nonblock); + break; default: ret = -EINVAL; break; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 6dc5ced1c37a..f82d90e617a6 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -19,7 +19,10 @@ struct io_uring_sqe { __u8 flags; /* IOSQE_ flags */ __u16 ioprio; /* ioprio for the request */ __s32 fd; /* file descriptor to do IO on */ - __u64 off; /* offset into file */ + union { + __u64 off; /* offset into file */ + __u64 addr2; + }; __u64 addr; /* pointer to buffer or iovecs */ __u32 len; /* buffer size or number of iovecs */ union { @@ -29,6 +32,7 @@ struct io_uring_sqe { __u32 sync_range_flags; __u32 msg_flags; __u32 timeout_flags; + __u32 accept_flags; }; __u64 user_data; /* data to be passed back at completion time */ union { @@ -65,6 +69,7 @@ struct io_uring_sqe { #define IORING_OP_RECVMSG 10 #define IORING_OP_TIMEOUT 11 #define IORING_OP_TIMEOUT_REMOVE 12 +#define IORING_OP_ACCEPT 13 /* * sqe->fsync_flags