Message ID | 20240927063755.112103-2-chengen.du@canonical.com |
---|---|
State | New |
Headers | show |
Series | CVE-2023-21400 | expand |
On Fri, 27 Sep 2024 14:37:51 +0800 Chengen Du <chengen.du@canonical.com> wrote: > From: Jens Axboe <axboe@kernel.dk> > > CVE-2023-21400 > > BugLink: https://bugs.launchpad.net/bugs/2078659 > > No direct upstream commit exists for this issue. It was fixed in > 5.18 as part of a larger rework of the completion side. > > io_commit_cqring() writes the CQ ring tail to make it visible, but it > also kicks off any deferred work we have. A ring setup with IOPOLL > does not need any locking around the CQ ring updates, as we're always > under the ctx uring_lock. But if we have deferred work that needs > processing, then io_queue_deferred() assumes that the completion_lock > is held, as it is for !IOPOLL. > > Add a lockdep assertion to check and document this fact, and have > io_iopoll_complete() check if we have deferred work and run that > separately with the appropriate lock grabbed. > > Cc: stable@vger.kernel.org # 5.10, 5.15 > Reported-by: dghost david <daviduniverse18@gmail.com> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > (backported from commit fb348857e7b67eefe365052f1423427b66dedbf3) > [chengen - apply the locking logic to the corresponding functions / > use spin_lock_irq because the same lock is used in the timer callback > function.] > Signed-off-by: Chengen Du <chengen.du@canonical.com> > --- > fs/io_uring.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 875dd8e0f766..aa75c32f2d16 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx) > { > struct io_rings *rings = ctx->rings; > > + lockdep_assert_held(&ctx->completion_lock); > + > if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) { > /* order cqe stores with ring update */ > smp_store_release(&rings->cq.tail, ctx->cached_cq_tail); > @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, > } > } > > + spin_lock_irq(&ctx->completion_lock); > io_commit_cqring(ctx); > + spin_unlock_irq(&ctx->completion_lock); > io_free_req_many(ctx, reqs, &to_free); > } > Not sure if that is sufficient. Your patch is (obviously) vastly different from the original. You might want to check SUSE fix for 5.3: https://github.com/SUSE/kernel/commit/a16c5d2daa1cf45f8694ff72bd2665d474a763a7 Do you have a reproducer to actually validate the fix? ...Juerg
Hi, I apologize for the delayed response. This issue is difficult to reproduce as it’s related to timing, but there is an article [1] that suggests an approach to trigger it. The problem occurs when ctx->defer_list is accessed in the io_commit_cqring function without holding ctx->completion_lock. Regarding the SUSE fix for kernel 5.3 that you mentioned, the io_commit_cqring function appears to be quite different from ours, as ours includes handling for killing timeout requests. One thing that caught my attention is the use of spin_lock_irqsave to lock ctx->completion_lock. I've gone through the code in detail but am not fully certain why spin_lock_irqsave is used instead of spin_lock_irq. To make the patch more robust, I plan to develop a reproducer rather than simply verifying that the patch doesn't introduce deadlocks. I might use spin_lock_irqsave to lock ctx->completion_lock and will further study the code to explore possible ways to limit the lock range, similar to the SUSE fix. Best regards, Chengen Du [1] https://yanglingxi1993.github.io/dirty_pagetable/dirty_pagetable.html On Wed, Oct 2, 2024 at 5:25 PM Juerg Haefliger <juerg.haefliger@canonical.com> wrote: > > On Fri, 27 Sep 2024 14:37:51 +0800 > Chengen Du <chengen.du@canonical.com> wrote: > > > From: Jens Axboe <axboe@kernel.dk> > > > > CVE-2023-21400 > > > > BugLink: https://bugs.launchpad.net/bugs/2078659 > > > > No direct upstream commit exists for this issue. It was fixed in > > 5.18 as part of a larger rework of the completion side. > > > > io_commit_cqring() writes the CQ ring tail to make it visible, but it > > also kicks off any deferred work we have. A ring setup with IOPOLL > > does not need any locking around the CQ ring updates, as we're always > > under the ctx uring_lock. But if we have deferred work that needs > > processing, then io_queue_deferred() assumes that the completion_lock > > is held, as it is for !IOPOLL. > > > > Add a lockdep assertion to check and document this fact, and have > > io_iopoll_complete() check if we have deferred work and run that > > separately with the appropriate lock grabbed. > > > > Cc: stable@vger.kernel.org # 5.10, 5.15 > > Reported-by: dghost david <daviduniverse18@gmail.com> > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > (backported from commit fb348857e7b67eefe365052f1423427b66dedbf3) > > [chengen - apply the locking logic to the corresponding functions / > > use spin_lock_irq because the same lock is used in the timer callback > > function.] > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > --- > > fs/io_uring.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > > index 875dd8e0f766..aa75c32f2d16 100644 > > --- a/fs/io_uring.c > > +++ b/fs/io_uring.c > > @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx) > > { > > struct io_rings *rings = ctx->rings; > > > > + lockdep_assert_held(&ctx->completion_lock); > > + > > if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) { > > /* order cqe stores with ring update */ > > smp_store_release(&rings->cq.tail, ctx->cached_cq_tail); > > @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, > > } > > } > > > > + spin_lock_irq(&ctx->completion_lock); > > io_commit_cqring(ctx); > > + spin_unlock_irq(&ctx->completion_lock); > > io_free_req_many(ctx, reqs, &to_free); > > } > > > > Not sure if that is sufficient. Your patch is (obviously) vastly different > from the original. You might want to check SUSE fix for 5.3: > https://github.com/SUSE/kernel/commit/a16c5d2daa1cf45f8694ff72bd2665d474a763a7 > > Do you have a reproducer to actually validate the fix? > > ...Juerg
diff --git a/fs/io_uring.c b/fs/io_uring.c index 875dd8e0f766..aa75c32f2d16 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx) { struct io_rings *rings = ctx->rings; + lockdep_assert_held(&ctx->completion_lock); + if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) { /* order cqe stores with ring update */ smp_store_release(&rings->cq.tail, ctx->cached_cq_tail); @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, } } + spin_lock_irq(&ctx->completion_lock); io_commit_cqring(ctx); + spin_unlock_irq(&ctx->completion_lock); io_free_req_many(ctx, reqs, &to_free); }