diff mbox series

[SRU,F,1/1] io_uring: ensure IOPOLL locks around deferred work

Message ID 20240927063755.112103-2-chengen.du@canonical.com
State New
Headers show
Series CVE-2023-21400 | expand

Commit Message

Chengen Du Sept. 27, 2024, 6:37 a.m. UTC
From: Jens Axboe <axboe@kernel.dk>

CVE-2023-21400

BugLink: https://bugs.launchpad.net/bugs/2078659

No direct upstream commit exists for this issue. It was fixed in
5.18 as part of a larger rework of the completion side.

io_commit_cqring() writes the CQ ring tail to make it visible, but it
also kicks off any deferred work we have. A ring setup with IOPOLL
does not need any locking around the CQ ring updates, as we're always
under the ctx uring_lock. But if we have deferred work that needs
processing, then io_queue_deferred() assumes that the completion_lock
is held, as it is for !IOPOLL.

Add a lockdep assertion to check and document this fact, and have
io_iopoll_complete() check if we have deferred work and run that
separately with the appropriate lock grabbed.

Cc: stable@vger.kernel.org # 5.10, 5.15
Reported-by: dghost david <daviduniverse18@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit fb348857e7b67eefe365052f1423427b66dedbf3)
[chengen - apply the locking logic to the corresponding functions /
use spin_lock_irq because the same lock is used in the timer callback
function.]
Signed-off-by: Chengen Du <chengen.du@canonical.com>
---
 fs/io_uring.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Juerg Haefliger Oct. 2, 2024, 9:25 a.m. UTC | #1
On Fri, 27 Sep 2024 14:37:51 +0800
Chengen Du <chengen.du@canonical.com> wrote:

> From: Jens Axboe <axboe@kernel.dk>
> 
> CVE-2023-21400
> 
> BugLink: https://bugs.launchpad.net/bugs/2078659
> 
> No direct upstream commit exists for this issue. It was fixed in
> 5.18 as part of a larger rework of the completion side.
> 
> io_commit_cqring() writes the CQ ring tail to make it visible, but it
> also kicks off any deferred work we have. A ring setup with IOPOLL
> does not need any locking around the CQ ring updates, as we're always
> under the ctx uring_lock. But if we have deferred work that needs
> processing, then io_queue_deferred() assumes that the completion_lock
> is held, as it is for !IOPOLL.
> 
> Add a lockdep assertion to check and document this fact, and have
> io_iopoll_complete() check if we have deferred work and run that
> separately with the appropriate lock grabbed.
> 
> Cc: stable@vger.kernel.org # 5.10, 5.15
> Reported-by: dghost david <daviduniverse18@gmail.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> (backported from commit fb348857e7b67eefe365052f1423427b66dedbf3)
> [chengen - apply the locking logic to the corresponding functions /
> use spin_lock_irq because the same lock is used in the timer callback
> function.]
> Signed-off-by: Chengen Du <chengen.du@canonical.com>
> ---
>  fs/io_uring.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 875dd8e0f766..aa75c32f2d16 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx)
>  {
>  	struct io_rings *rings = ctx->rings;
>  
> +	lockdep_assert_held(&ctx->completion_lock);
> +
>  	if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
>  		/* order cqe stores with ring update */
>  		smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
> @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>  		}
>  	}
>  
> +	spin_lock_irq(&ctx->completion_lock);
>  	io_commit_cqring(ctx);
> +	spin_unlock_irq(&ctx->completion_lock);
>  	io_free_req_many(ctx, reqs, &to_free);
>  }
>  

Not sure if that is sufficient. Your patch is (obviously) vastly different
from the original. You might want to check SUSE fix for 5.3:
https://github.com/SUSE/kernel/commit/a16c5d2daa1cf45f8694ff72bd2665d474a763a7

Do you have a reproducer to actually validate the fix?

...Juerg
Chengen Du Oct. 16, 2024, 7:52 a.m. UTC | #2
Hi,

I apologize for the delayed response.

This issue is difficult to reproduce as it’s related to timing, but
there is an article [1] that suggests an approach to trigger it.
The problem occurs when ctx->defer_list is accessed in the
io_commit_cqring function without holding ctx->completion_lock.

Regarding the SUSE fix for kernel 5.3 that you mentioned, the
io_commit_cqring function appears to be quite different from ours, as
ours includes handling for killing timeout requests.
One thing that caught my attention is the use of spin_lock_irqsave to
lock ctx->completion_lock.
I've gone through the code in detail but am not fully certain why
spin_lock_irqsave is used instead of spin_lock_irq.

To make the patch more robust, I plan to develop a reproducer rather
than simply verifying that the patch doesn't introduce deadlocks.
I might use spin_lock_irqsave to lock ctx->completion_lock and will
further study the code to explore possible ways to limit the lock
range, similar to the SUSE fix.

Best regards,
Chengen Du

[1] https://yanglingxi1993.github.io/dirty_pagetable/dirty_pagetable.html

On Wed, Oct 2, 2024 at 5:25 PM Juerg Haefliger
<juerg.haefliger@canonical.com> wrote:
>
> On Fri, 27 Sep 2024 14:37:51 +0800
> Chengen Du <chengen.du@canonical.com> wrote:
>
> > From: Jens Axboe <axboe@kernel.dk>
> >
> > CVE-2023-21400
> >
> > BugLink: https://bugs.launchpad.net/bugs/2078659
> >
> > No direct upstream commit exists for this issue. It was fixed in
> > 5.18 as part of a larger rework of the completion side.
> >
> > io_commit_cqring() writes the CQ ring tail to make it visible, but it
> > also kicks off any deferred work we have. A ring setup with IOPOLL
> > does not need any locking around the CQ ring updates, as we're always
> > under the ctx uring_lock. But if we have deferred work that needs
> > processing, then io_queue_deferred() assumes that the completion_lock
> > is held, as it is for !IOPOLL.
> >
> > Add a lockdep assertion to check and document this fact, and have
> > io_iopoll_complete() check if we have deferred work and run that
> > separately with the appropriate lock grabbed.
> >
> > Cc: stable@vger.kernel.org # 5.10, 5.15
> > Reported-by: dghost david <daviduniverse18@gmail.com>
> > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > (backported from commit fb348857e7b67eefe365052f1423427b66dedbf3)
> > [chengen - apply the locking logic to the corresponding functions /
> > use spin_lock_irq because the same lock is used in the timer callback
> > function.]
> > Signed-off-by: Chengen Du <chengen.du@canonical.com>
> > ---
> >  fs/io_uring.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 875dd8e0f766..aa75c32f2d16 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx)
> >  {
> >       struct io_rings *rings = ctx->rings;
> >
> > +     lockdep_assert_held(&ctx->completion_lock);
> > +
> >       if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
> >               /* order cqe stores with ring update */
> >               smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
> > @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
> >               }
> >       }
> >
> > +     spin_lock_irq(&ctx->completion_lock);
> >       io_commit_cqring(ctx);
> > +     spin_unlock_irq(&ctx->completion_lock);
> >       io_free_req_many(ctx, reqs, &to_free);
> >  }
> >
>
> Not sure if that is sufficient. Your patch is (obviously) vastly different
> from the original. You might want to check SUSE fix for 5.3:
> https://github.com/SUSE/kernel/commit/a16c5d2daa1cf45f8694ff72bd2665d474a763a7
>
> Do you have a reproducer to actually validate the fix?
>
> ...Juerg
diff mbox series

Patch

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 875dd8e0f766..aa75c32f2d16 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -512,6 +512,8 @@  static void __io_commit_cqring(struct io_ring_ctx *ctx)
 {
 	struct io_rings *rings = ctx->rings;
 
+	lockdep_assert_held(&ctx->completion_lock);
+
 	if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
 		/* order cqe stores with ring update */
 		smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
@@ -841,7 +843,9 @@  static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 		}
 	}
 
+	spin_lock_irq(&ctx->completion_lock);
 	io_commit_cqring(ctx);
+	spin_unlock_irq(&ctx->completion_lock);
 	io_free_req_many(ctx, reqs, &to_free);
 }