Message ID | 20241108025004.255862-2-chengen.du@canonical.com |
---|---|
State | New |
Headers | show |
Series | CVE-2023-21400 | expand |
Hi, I’d like to provide additional explanations for this patch, as it differs significantly from both the upstream version and SUSE’s fix for 5.3 [1]. The CVE affected kernel 5.4 in a unique way. When io_uring is set up with the IORING_SETUP_IOPOLL flag and operates on a filesystem implementing the iopoll function hook, the io_uring_enter system call will use io_iopoll_check to retrieve completed events. If completed events are present, the io_iopoll_complete function is called, which in turn calls io_commit_cqring without holding completion_lock. The io_commit_cqring function extracts entries from defer_list for processing. The defer_list is appended to when a request is submitted with the IOSQE_IO_DRAIN flag. However, removing entries from defer_list without a lock risks breaking the list structure. To address this issue, completion_lock should be held before calling io_commit_cqring. It’s recommended to hold completion_lock using spin_lock_irq, as it’s also used in the timer callback function (i.e., io_timeout_fn). If I understand correctly, spin_lock_irqsave isn’t necessary here, since io_iopoll_complete function isn’t used in scenarios where IRQs may be masked. io_iopoll_complete <- io_do_iopoll <- io_iopoll_getevents <- io_sq_thread <- io_iopoll_check <- io_uring_enter syscall <- io_iopoll_reap_events <- io_ring_ctx_free <- io_ring_ctx_wait_and_kill <- io_ring_ctx_wait_and_kill <- io_uring_create <- io_uring_setup <- io_uring_setup syscall <- io_uring_release <- io_uring_fops release function hook Please let me know if there's anything I may have overlooked. Best regards, Chengen Du [1] https://github.com/SUSE/kernel/commit/a16c5d2daa1cf45f8694ff72bd2665d474a763a7 On Fri, Nov 8, 2024 at 10:50 AM Chengen Du <chengen.du@canonical.com> wrote: > > From: Jens Axboe <axboe@kernel.dk> > > CVE-2023-21400 > > BugLink: https://bugs.launchpad.net/bugs/2078659 > > No direct upstream commit exists for this issue. It was fixed in > 5.18 as part of a larger rework of the completion side. > > io_commit_cqring() writes the CQ ring tail to make it visible, but it > also kicks off any deferred work we have. A ring setup with IOPOLL > does not need any locking around the CQ ring updates, as we're always > under the ctx uring_lock. But if we have deferred work that needs > processing, then io_queue_deferred() assumes that the completion_lock > is held, as it is for !IOPOLL. > > Add a lockdep assertion to check and document this fact, and have > io_iopoll_complete() check if we have deferred work and run that > separately with the appropriate lock grabbed. > > Cc: stable@vger.kernel.org # 5.10, 5.15 > Reported-by: dghost david <daviduniverse18@gmail.com> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > (backported from commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 linux-5.10.y) > [chengen - apply the locking logic to the corresponding functions / > use spin_lock_irq because the same lock is used in the timer callback > function.] > Signed-off-by: Chengen Du <chengen.du@canonical.com> > --- > fs/io_uring.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 875dd8e0f766..aa75c32f2d16 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx) > { > struct io_rings *rings = ctx->rings; > > + lockdep_assert_held(&ctx->completion_lock); > + > if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) { > /* order cqe stores with ring update */ > smp_store_release(&rings->cq.tail, ctx->cached_cq_tail); > @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, > } > } > > + spin_lock_irq(&ctx->completion_lock); > io_commit_cqring(ctx); > + spin_unlock_irq(&ctx->completion_lock); > io_free_req_many(ctx, reqs, &to_free); > } > > -- > 2.43.0 >
diff --git a/fs/io_uring.c b/fs/io_uring.c index 875dd8e0f766..aa75c32f2d16 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx) { struct io_rings *rings = ctx->rings; + lockdep_assert_held(&ctx->completion_lock); + if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) { /* order cqe stores with ring update */ smp_store_release(&rings->cq.tail, ctx->cached_cq_tail); @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, } } + spin_lock_irq(&ctx->completion_lock); io_commit_cqring(ctx); + spin_unlock_irq(&ctx->completion_lock); io_free_req_many(ctx, reqs, &to_free); }