diff mbox series

[SRU,F,1/1] io_uring: ensure IOPOLL locks around deferred work

Message ID 20241108025004.255862-2-chengen.du@canonical.com
State New
Headers show
Series CVE-2023-21400 | expand

Commit Message

Chengen Du Nov. 8, 2024, 2:49 a.m. UTC
From: Jens Axboe <axboe@kernel.dk>

CVE-2023-21400

BugLink: https://bugs.launchpad.net/bugs/2078659

No direct upstream commit exists for this issue. It was fixed in
5.18 as part of a larger rework of the completion side.

io_commit_cqring() writes the CQ ring tail to make it visible, but it
also kicks off any deferred work we have. A ring setup with IOPOLL
does not need any locking around the CQ ring updates, as we're always
under the ctx uring_lock. But if we have deferred work that needs
processing, then io_queue_deferred() assumes that the completion_lock
is held, as it is for !IOPOLL.

Add a lockdep assertion to check and document this fact, and have
io_iopoll_complete() check if we have deferred work and run that
separately with the appropriate lock grabbed.

Cc: stable@vger.kernel.org # 5.10, 5.15
Reported-by: dghost david <daviduniverse18@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 linux-5.10.y)
[chengen - apply the locking logic to the corresponding functions /
use spin_lock_irq because the same lock is used in the timer callback
function.]
Signed-off-by: Chengen Du <chengen.du@canonical.com>
---
 fs/io_uring.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Chengen Du Nov. 8, 2024, 2:53 a.m. UTC | #1
Hi,

I’d like to provide additional explanations for this patch, as it
differs significantly from both the upstream version and SUSE’s fix
for 5.3 [1].

The CVE affected kernel 5.4 in a unique way.
When io_uring is set up with the IORING_SETUP_IOPOLL flag and operates
on a filesystem implementing the iopoll function hook, the
io_uring_enter system call will use io_iopoll_check to retrieve
completed events.
If completed events are present, the io_iopoll_complete function is
called, which in turn calls io_commit_cqring without holding
completion_lock.

The io_commit_cqring function extracts entries from defer_list for processing.
The defer_list is appended to when a request is submitted with the
IOSQE_IO_DRAIN flag.
However, removing entries from defer_list without a lock risks
breaking the list structure.

To address this issue, completion_lock should be held before calling
io_commit_cqring.
It’s recommended to hold completion_lock using spin_lock_irq, as it’s
also used in the timer callback function (i.e., io_timeout_fn).
If I understand correctly, spin_lock_irqsave isn’t necessary here,
since io_iopoll_complete function isn’t used in scenarios where IRQs
may be masked.

io_iopoll_complete <- io_do_iopoll <- io_iopoll_getevents
    <- io_sq_thread
    <- io_iopoll_check <- io_uring_enter syscall
    <- io_iopoll_reap_events
        <- io_ring_ctx_free <- io_ring_ctx_wait_and_kill
        <- io_ring_ctx_wait_and_kill
            <- io_uring_create <- io_uring_setup <- io_uring_setup syscall
            <- io_uring_release <- io_uring_fops release function hook

Please let me know if there's anything I may have overlooked.

Best regards,
Chengen Du

[1] https://github.com/SUSE/kernel/commit/a16c5d2daa1cf45f8694ff72bd2665d474a763a7

On Fri, Nov 8, 2024 at 10:50 AM Chengen Du <chengen.du@canonical.com> wrote:
>
> From: Jens Axboe <axboe@kernel.dk>
>
> CVE-2023-21400
>
> BugLink: https://bugs.launchpad.net/bugs/2078659
>
> No direct upstream commit exists for this issue. It was fixed in
> 5.18 as part of a larger rework of the completion side.
>
> io_commit_cqring() writes the CQ ring tail to make it visible, but it
> also kicks off any deferred work we have. A ring setup with IOPOLL
> does not need any locking around the CQ ring updates, as we're always
> under the ctx uring_lock. But if we have deferred work that needs
> processing, then io_queue_deferred() assumes that the completion_lock
> is held, as it is for !IOPOLL.
>
> Add a lockdep assertion to check and document this fact, and have
> io_iopoll_complete() check if we have deferred work and run that
> separately with the appropriate lock grabbed.
>
> Cc: stable@vger.kernel.org # 5.10, 5.15
> Reported-by: dghost david <daviduniverse18@gmail.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> (backported from commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 linux-5.10.y)
> [chengen - apply the locking logic to the corresponding functions /
> use spin_lock_irq because the same lock is used in the timer callback
> function.]
> Signed-off-by: Chengen Du <chengen.du@canonical.com>
> ---
>  fs/io_uring.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 875dd8e0f766..aa75c32f2d16 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx)
>  {
>         struct io_rings *rings = ctx->rings;
>
> +       lockdep_assert_held(&ctx->completion_lock);
> +
>         if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
>                 /* order cqe stores with ring update */
>                 smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
> @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>                 }
>         }
>
> +       spin_lock_irq(&ctx->completion_lock);
>         io_commit_cqring(ctx);
> +       spin_unlock_irq(&ctx->completion_lock);
>         io_free_req_many(ctx, reqs, &to_free);
>  }
>
> --
> 2.43.0
>
Yuxuan Luo Nov. 18, 2024, 7:59 p.m. UTC | #2
On Fri, Nov 08, 2024 at 10:49:55AM +0800, Chengen Du wrote:

LGTM

Acked-by: Yuxuan Luo <yuxuan.luo@canonical.com>

> From: Jens Axboe <axboe@kernel.dk>
> 
> CVE-2023-21400
> 
> BugLink: https://bugs.launchpad.net/bugs/2078659
> 
> No direct upstream commit exists for this issue. It was fixed in
> 5.18 as part of a larger rework of the completion side.
> 
> io_commit_cqring() writes the CQ ring tail to make it visible, but it
> also kicks off any deferred work we have. A ring setup with IOPOLL
> does not need any locking around the CQ ring updates, as we're always
> under the ctx uring_lock. But if we have deferred work that needs
> processing, then io_queue_deferred() assumes that the completion_lock
> is held, as it is for !IOPOLL.
> 
> Add a lockdep assertion to check and document this fact, and have
> io_iopoll_complete() check if we have deferred work and run that
> separately with the appropriate lock grabbed.
> 
> Cc: stable@vger.kernel.org # 5.10, 5.15
> Reported-by: dghost david <daviduniverse18@gmail.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> (backported from commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 linux-5.10.y)
> [chengen - apply the locking logic to the corresponding functions /
> use spin_lock_irq because the same lock is used in the timer callback
> function.]
> Signed-off-by: Chengen Du <chengen.du@canonical.com>
> ---
>  fs/io_uring.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 875dd8e0f766..aa75c32f2d16 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx)
>  {
>  	struct io_rings *rings = ctx->rings;
>  
> +	lockdep_assert_held(&ctx->completion_lock);
> +
>  	if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
>  		/* order cqe stores with ring update */
>  		smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
> @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>  		}
>  	}
>  
> +	spin_lock_irq(&ctx->completion_lock);
>  	io_commit_cqring(ctx);
> +	spin_unlock_irq(&ctx->completion_lock);
>  	io_free_req_many(ctx, reqs, &to_free);
>  }
>  
> -- 
> 2.43.0
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Guoqing Jiang Nov. 19, 2024, 8:31 a.m. UTC | #3
Hi,

On 11/8/24 10:49, Chengen Du wrote:
> From: Jens Axboe <axboe@kernel.dk>
>
> CVE-2023-21400
>
> BugLink: https://bugs.launchpad.net/bugs/2078659
>
> No direct upstream commit exists for this issue. It was fixed in
> 5.18 as part of a larger rework of the completion side.
>
> io_commit_cqring() writes the CQ ring tail to make it visible, but it
> also kicks off any deferred work we have. A ring setup with IOPOLL
> does not need any locking around the CQ ring updates, as we're always
> under the ctx uring_lock. But if we have deferred work that needs
> processing, then io_queue_deferred() assumes that the completion_lock
> is held, as it is for !IOPOLL.
>
> Add a lockdep assertion to check and document this fact, and have
> io_iopoll_complete() check if we have deferred work and run that
> separately with the appropriate lock grabbed.

IIUC, the change in this patch doesn't align with the above description.

> Cc: stable@vger.kernel.org # 5.10, 5.15
> Reported-by: dghost david <daviduniverse18@gmail.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> (backported from commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 linux-5.10.y)
> [chengen - apply the locking logic to the corresponding functions /
> use spin_lock_irq because the same lock is used in the timer callback
> function.]
> Signed-off-by: Chengen Du <chengen.du@canonical.com>
> ---
>   fs/io_uring.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 875dd8e0f766..aa75c32f2d16 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx)
>   {
>   	struct io_rings *rings = ctx->rings;
>   
> +	lockdep_assert_held(&ctx->completion_lock);
> +
>   	if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
>   		/* order cqe stores with ring update */
>   		smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
> @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>   		}
>   	}
>   
> +	spin_lock_irq(&ctx->completion_lock);
>   	io_commit_cqring(ctx);
> +	spin_unlock_irq(&ctx->completion_lock);
>   	io_free_req_many(ctx, reqs, &to_free);
>   }

For focal, io_commit_cqring was implemented between L577 [1] and L595 
[2]. I think the part
from L581 to L582 have been changed to io_flush_timeouts() per commit  
360428f8c0cd
("io_uring: move timeouts flushing to a helper").  And 
io_queue_deferred() was based on the
code between L586 and L594 per these commits:

045189452210 io_uring: separate DRAIN flushing into a cold path
87987898a1db io_uring: remove REQ_F_IO_DRAINED
1b4a51b6d03d io_uring: drain next sqe instead of shadowing

Personally, I'd more prefer similar change as in SUSE's fix instead of 
lock the whole io_commit_cqring function.
[1]. 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/fs/io_uring.c?h=master-next#n577
[2]. 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/fs/io_uring.c?h=master-next#n595

Thanks,
Guoqing
Chengen Du Nov. 21, 2024, 7:21 a.m. UTC | #4
Hi Guoqing,

Thank you for your response.

Before submitting this patch, I analyzed the level of effort required
to make our patch align more closely with SUSE’s implementation.
I understand that narrowing the lock range would be beneficial, but
there are several considerations to take into account.

Regarding commit 810e401b34c4 (io_uring: ensure IOPOLL locks around
deferred work), it uses the io_commit_needs_flush function to
determine whether to invoke __io_commit_cqring_flush.
In the Focal version, we lack the attributes used in the
io_commit_needs_flush logic.
Additionally, the __io_commit_cqring function in Focal includes tasks
beyond simply updating rings->cq.tail.
This raises uncertainty about the tangible benefits of separating the
logic in our context.

Moreover, the commits you've referenced cannot be backported unchanged.
We would need to investigate them in detail to prevent regressions.
The execution order within the io_commit_cqring function differs as
well—__io_commit_cqring is invoked in the middle of the flow instead
of at the head or tail.
To narrow the lock range as SUSE has done, we would need to backport
other commits unrelated to the CVE to adjust the flow accordingly.

Another factor to consider is that the liburing package, which
provides user-space helpers for kernel io_uring functionality, is not
available in Focal.
The CVE in question has also existed for some time in Focal.
Therefore, we need to weigh the necessity of backporting multiple
unrelated commits to address a CVE that may not significantly impact
many users, especially given the risk of introducing regressions to
io_uring functionality.

These are my initial thoughts, though I recognize there may be aspects
I have overlooked.
If you believe it is worthwhile to proceed with separating the logic,
I am happy to focus on that approach.

Best regards,
Chengen Du

On Tue, Nov 19, 2024 at 4:31 PM Guoqing Jiang
<guoqing.jiang@canonical.com> wrote:
>
> Hi,
>
> On 11/8/24 10:49, Chengen Du wrote:
> > From: Jens Axboe <axboe@kernel.dk>
> >
> > CVE-2023-21400
> >
> > BugLink: https://bugs.launchpad.net/bugs/2078659
> >
> > No direct upstream commit exists for this issue. It was fixed in
> > 5.18 as part of a larger rework of the completion side.
> >
> > io_commit_cqring() writes the CQ ring tail to make it visible, but it
> > also kicks off any deferred work we have. A ring setup with IOPOLL
> > does not need any locking around the CQ ring updates, as we're always
> > under the ctx uring_lock. But if we have deferred work that needs
> > processing, then io_queue_deferred() assumes that the completion_lock
> > is held, as it is for !IOPOLL.
> >
> > Add a lockdep assertion to check and document this fact, and have
> > io_iopoll_complete() check if we have deferred work and run that
> > separately with the appropriate lock grabbed.
>
> IIUC, the change in this patch doesn't align with the above description.
>
> > Cc: stable@vger.kernel.org # 5.10, 5.15
> > Reported-by: dghost david <daviduniverse18@gmail.com>
> > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > (backported from commit 810e401b34c4c4c244d8b93b9947ea5b3d4d49f8 linux-5.10.y)
> > [chengen - apply the locking logic to the corresponding functions /
> > use spin_lock_irq because the same lock is used in the timer callback
> > function.]
> > Signed-off-by: Chengen Du <chengen.du@canonical.com>
> > ---
> >   fs/io_uring.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 875dd8e0f766..aa75c32f2d16 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -512,6 +512,8 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx)
> >   {
> >       struct io_rings *rings = ctx->rings;
> >
> > +     lockdep_assert_held(&ctx->completion_lock);
> > +
> >       if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
> >               /* order cqe stores with ring update */
> >               smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
> > @@ -841,7 +843,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
> >               }
> >       }
> >
> > +     spin_lock_irq(&ctx->completion_lock);
> >       io_commit_cqring(ctx);
> > +     spin_unlock_irq(&ctx->completion_lock);
> >       io_free_req_many(ctx, reqs, &to_free);
> >   }
>
> For focal, io_commit_cqring was implemented between L577 [1] and L595
> [2]. I think the part
> from L581 to L582 have been changed to io_flush_timeouts() per commit
> 360428f8c0cd
> ("io_uring: move timeouts flushing to a helper").  And
> io_queue_deferred() was based on the
> code between L586 and L594 per these commits:
>
> 045189452210 io_uring: separate DRAIN flushing into a cold path
> 87987898a1db io_uring: remove REQ_F_IO_DRAINED
> 1b4a51b6d03d io_uring: drain next sqe instead of shadowing
>
> Personally, I'd more prefer similar change as in SUSE's fix instead of
> lock the whole io_commit_cqring function.
> [1].
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/fs/io_uring.c?h=master-next#n577
> [2].
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/fs/io_uring.c?h=master-next#n595
>
> Thanks,
> Guoqing
Guoqing Jiang Nov. 22, 2024, 9:22 a.m. UTC | #5
On 11/21/24 15:21, Chengen Du wrote:
> Hi Guoqing,
>
> Thank you for your response.
>
> Before submitting this patch, I analyzed the level of effort required
> to make our patch align more closely with SUSE’s implementation.
> I understand that narrowing the lock range would be beneficial, but
> there are several considerations to take into account.
>
> Regarding commit 810e401b34c4 (io_uring: ensure IOPOLL locks around
> deferred work), it uses the io_commit_needs_flush function to
> determine whether to invoke __io_commit_cqring_flush.
> In the Focal version, we lack the attributes used in the
> io_commit_needs_flush logic.
> Additionally, the __io_commit_cqring function in Focal includes tasks
> beyond simply updating rings->cq.tail.
> This raises uncertainty about the tangible benefits of separating the
> logic in our context.
>
> Moreover, the commits you've referenced cannot be backported unchanged.
> We would need to investigate them in detail to prevent regressions.
> The execution order within the io_commit_cqring function differs as
> well—__io_commit_cqring is invoked in the middle of the flow instead
> of at the head or tail.
> To narrow the lock range as SUSE has done, we would need to backport
> other commits unrelated to the CVE to adjust the flow accordingly.
>
> Another factor to consider is that the liburing package, which
> provides user-space helpers for kernel io_uring functionality, is not
> available in Focal.
> The CVE in question has also existed for some time in Focal.
> Therefore, we need to weigh the necessity of backporting multiple
> unrelated commits to address a CVE that may not significantly impact
> many users, especially given the risk of introducing regressions to
> io_uring functionality.

Don't get me wrong, I didn't ask to backport those commits which I 
listed in last mail,
what I preferred is to follow as much as commit 810e401b34 did which 
only lock flush
relevant function.

> These are my initial thoughts, though I recognize there may be aspects
> I have overlooked.
> If you believe it is worthwhile to proceed with separating the logic,
> I am happy to focus on that approach.

I personally think it is better to separate the logic, but it is your 
call 🙂.

And I am not sure add "lockdep_assert_held(&ctx->completion_lock)" in 
__io_commit_cqring
is appropriate, looks this path doesn't hold completion_lock if I am not 
misreading.

io_uring_release -> io_ring_ctx_wait_and_kill
                                         -> io_iopoll_reap_events
                                              -> io_iopoll_getevents
                                                   -> io_do_iopoll
                                                        -> 
io_iopoll_complete
                                                             -> 
io_commit_cqring  -> __io_commit_cqring

Thanks,
Guoqing
Chengen Du Nov. 25, 2024, 3:57 a.m. UTC | #6
On Fri, Nov 22, 2024 at 5:23 PM Guoqing Jiang
<guoqing.jiang@canonical.com> wrote:
>
>
>
> On 11/21/24 15:21, Chengen Du wrote:
> > Hi Guoqing,
> >
> > Thank you for your response.
> >
> > Before submitting this patch, I analyzed the level of effort required
> > to make our patch align more closely with SUSE’s implementation.
> > I understand that narrowing the lock range would be beneficial, but
> > there are several considerations to take into account.
> >
> > Regarding commit 810e401b34c4 (io_uring: ensure IOPOLL locks around
> > deferred work), it uses the io_commit_needs_flush function to
> > determine whether to invoke __io_commit_cqring_flush.
> > In the Focal version, we lack the attributes used in the
> > io_commit_needs_flush logic.
> > Additionally, the __io_commit_cqring function in Focal includes tasks
> > beyond simply updating rings->cq.tail.
> > This raises uncertainty about the tangible benefits of separating the
> > logic in our context.
> >
> > Moreover, the commits you've referenced cannot be backported unchanged.
> > We would need to investigate them in detail to prevent regressions.
> > The execution order within the io_commit_cqring function differs as
> > well—__io_commit_cqring is invoked in the middle of the flow instead
> > of at the head or tail.
> > To narrow the lock range as SUSE has done, we would need to backport
> > other commits unrelated to the CVE to adjust the flow accordingly.
> >
> > Another factor to consider is that the liburing package, which
> > provides user-space helpers for kernel io_uring functionality, is not
> > available in Focal.
> > The CVE in question has also existed for some time in Focal.
> > Therefore, we need to weigh the necessity of backporting multiple
> > unrelated commits to address a CVE that may not significantly impact
> > many users, especially given the risk of introducing regressions to
> > io_uring functionality.
>
> Don't get me wrong, I didn't ask to backport those commits which I
> listed in last mail,
> what I preferred is to follow as much as commit 810e401b34 did which
> only lock flush
> relevant function.
>
> > These are my initial thoughts, though I recognize there may be aspects
> > I have overlooked.
> > If you believe it is worthwhile to proceed with separating the logic,
> > I am happy to focus on that approach.
>
> I personally think it is better to separate the logic, but it is your
> call 🙂.
Thank you for bringing this to my attention—I truly appreciate it.

Upon revisiting the code in detail, I discovered a bug related to
calling kill_fasync under completion_lock.
While this is a separate existing issue in Focal, it would be
fantastic if we could address it together.

To resolve this, we need to backport at least the following two commits first:
0791015837f1 - io_uring: remove extra check in __io_commit_cqring
4aa84f2ffa81 - io_uring: don't kill fasync under completion_lock

Once these are backported, we can address the CVE in a manner similar
to the patch applied in linux-5.10.y.
I will prepare version 2 of the patch for this CVE.
>
> And I am not sure add "lockdep_assert_held(&ctx->completion_lock)" in
> __io_commit_cqring
> is appropriate, looks this path doesn't hold completion_lock if I am not
> misreading.
>
> io_uring_release -> io_ring_ctx_wait_and_kill
>                                          -> io_iopoll_reap_events
>                                               -> io_iopoll_getevents
>                                                    -> io_do_iopoll
>                                                         ->
> io_iopoll_complete
>                                                              ->
> io_commit_cqring  -> __io_commit_cqring
This is precisely the issue we aim to resolve with the CVE.
>
> Thanks,
> Guoqing

Best regards,
Chengen Du
diff mbox series

Patch

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 875dd8e0f766..aa75c32f2d16 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -512,6 +512,8 @@  static void __io_commit_cqring(struct io_ring_ctx *ctx)
 {
 	struct io_rings *rings = ctx->rings;
 
+	lockdep_assert_held(&ctx->completion_lock);
+
 	if (ctx->cached_cq_tail != READ_ONCE(rings->cq.tail)) {
 		/* order cqe stores with ring update */
 		smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
@@ -841,7 +843,9 @@  static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 		}
 	}
 
+	spin_lock_irq(&ctx->completion_lock);
 	io_commit_cqring(ctx);
+	spin_unlock_irq(&ctx->completion_lock);
 	io_free_req_many(ctx, reqs, &to_free);
 }