diff mbox series

virtio: use shadow_avail_idx while checking number of heads

Message ID 20230825170448.1953409-1-i.maximets@ovn.org
State New
Headers show
Series virtio: use shadow_avail_idx while checking number of heads | expand

Commit Message

Ilya Maximets Aug. 25, 2023, 5:04 p.m. UTC
We do not need the most up to date number of heads, we only want to
know if there is at least one.

Use shadow variable as long as it is not equal to the last available
index checked.  This avoids expensive qatomic dereference of the
RCU-protected memory region cache as well as the memory access itself
and the subsequent memory barrier.

The change improves performance of the af-xdp network backend by 2-3%.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 hw/virtio/virtio.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Ilya Maximets Sept. 25, 2023, 11:20 a.m. UTC | #1
On 8/25/23 19:04, Ilya Maximets wrote:
> We do not need the most up to date number of heads, we only want to
> know if there is at least one.
> 
> Use shadow variable as long as it is not equal to the last available
> index checked.  This avoids expensive qatomic dereference of the
> RCU-protected memory region cache as well as the memory access itself
> and the subsequent memory barrier.
> 
> The change improves performance of the af-xdp network backend by 2-3%.
> 
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---

Kind reminder.

Best regards, Ilya Maximets.

>  hw/virtio/virtio.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 309038fd46..04bf7cc977 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>  /* Called within rcu_read_lock().  */
>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>  {
> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
> +    uint16_t num_heads;
> +
> +    if (vq->shadow_avail_idx != idx) {
> +        num_heads = vq->shadow_avail_idx - idx;
> +
> +        return num_heads;
> +    }
> +
> +    num_heads = vring_avail_idx(vq) - idx;
>  
>      /* Check it isn't doing very strange things with descriptor numbers. */
>      if (num_heads > vq->vring.num) {
Stefan Hajnoczi Sept. 25, 2023, 2:23 p.m. UTC | #2
On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
>
> We do not need the most up to date number of heads, we only want to
> know if there is at least one.
>
> Use shadow variable as long as it is not equal to the last available
> index checked.  This avoids expensive qatomic dereference of the
> RCU-protected memory region cache as well as the memory access itself
> and the subsequent memory barrier.
>
> The change improves performance of the af-xdp network backend by 2-3%.
>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>  hw/virtio/virtio.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 309038fd46..04bf7cc977 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>  /* Called within rcu_read_lock().  */
>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>  {
> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
> +    uint16_t num_heads;
> +
> +    if (vq->shadow_avail_idx != idx) {
> +        num_heads = vq->shadow_avail_idx - idx;
> +
> +        return num_heads;

This still needs to check num_heads > vq->vring.num and return -EINVAL
as is done below.

> +    }
> +
> +    num_heads = vring_avail_idx(vq) - idx;
>
>      /* Check it isn't doing very strange things with descriptor numbers. */
>      if (num_heads > vq->vring.num) {
> --
> 2.40.1
>
>
Ilya Maximets Sept. 25, 2023, 3:02 p.m. UTC | #3
On 9/25/23 16:23, Stefan Hajnoczi wrote:
> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> We do not need the most up to date number of heads, we only want to
>> know if there is at least one.
>>
>> Use shadow variable as long as it is not equal to the last available
>> index checked.  This avoids expensive qatomic dereference of the
>> RCU-protected memory region cache as well as the memory access itself
>> and the subsequent memory barrier.
>>
>> The change improves performance of the af-xdp network backend by 2-3%.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>  hw/virtio/virtio.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index 309038fd46..04bf7cc977 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>>  /* Called within rcu_read_lock().  */
>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>  {
>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
>> +    uint16_t num_heads;
>> +
>> +    if (vq->shadow_avail_idx != idx) {
>> +        num_heads = vq->shadow_avail_idx - idx;
>> +
>> +        return num_heads;
> 
> This still needs to check num_heads > vq->vring.num and return -EINVAL
> as is done below.

Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
will be incorrect.  However, I think we should just not return here in this
case and let vring_avail_idx() to grab an actual new value below.  Otherwise
we may never break out of this error.

Does that make sense?

> 
>> +    }
>> +
>> +    num_heads = vring_avail_idx(vq) - idx;
>>
>>      /* Check it isn't doing very strange things with descriptor numbers. */
>>      if (num_heads > vq->vring.num) {
>> --
>> 2.40.1
>>
>>
Stefan Hajnoczi Sept. 25, 2023, 3:12 p.m. UTC | #4
On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 9/25/23 16:23, Stefan Hajnoczi wrote:
> > On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>
> >> We do not need the most up to date number of heads, we only want to
> >> know if there is at least one.
> >>
> >> Use shadow variable as long as it is not equal to the last available
> >> index checked.  This avoids expensive qatomic dereference of the
> >> RCU-protected memory region cache as well as the memory access itself
> >> and the subsequent memory barrier.
> >>
> >> The change improves performance of the af-xdp network backend by 2-3%.
> >>
> >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> >> ---
> >>  hw/virtio/virtio.c | 10 +++++++++-
> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index 309038fd46..04bf7cc977 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
> >>  /* Called within rcu_read_lock().  */
> >>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
> >>  {
> >> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
> >> +    uint16_t num_heads;
> >> +
> >> +    if (vq->shadow_avail_idx != idx) {
> >> +        num_heads = vq->shadow_avail_idx - idx;
> >> +
> >> +        return num_heads;
> >
> > This still needs to check num_heads > vq->vring.num and return -EINVAL
> > as is done below.
>
> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
> will be incorrect.  However, I think we should just not return here in this
> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
> we may never break out of this error.
>
> Does that make sense?

No, because virtio_error() marks the device as broken. The device
requires a reset in order to function again. Fetching
vring_avail_idx() again won't help.

>
> >
> >> +    }
> >> +
> >> +    num_heads = vring_avail_idx(vq) - idx;
> >>
> >>      /* Check it isn't doing very strange things with descriptor numbers. */
> >>      if (num_heads > vq->vring.num) {
> >> --
> >> 2.40.1
> >>
> >>
>
Ilya Maximets Sept. 25, 2023, 3:36 p.m. UTC | #5
On 9/25/23 17:12, Stefan Hajnoczi wrote:
> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>
>>>> We do not need the most up to date number of heads, we only want to
>>>> know if there is at least one.
>>>>
>>>> Use shadow variable as long as it is not equal to the last available
>>>> index checked.  This avoids expensive qatomic dereference of the
>>>> RCU-protected memory region cache as well as the memory access itself
>>>> and the subsequent memory barrier.
>>>>
>>>> The change improves performance of the af-xdp network backend by 2-3%.
>>>>
>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>> ---
>>>>  hw/virtio/virtio.c | 10 +++++++++-
>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>> index 309038fd46..04bf7cc977 100644
>>>> --- a/hw/virtio/virtio.c
>>>> +++ b/hw/virtio/virtio.c
>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>>>>  /* Called within rcu_read_lock().  */
>>>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>>>  {
>>>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
>>>> +    uint16_t num_heads;
>>>> +
>>>> +    if (vq->shadow_avail_idx != idx) {
>>>> +        num_heads = vq->shadow_avail_idx - idx;
>>>> +
>>>> +        return num_heads;
>>>
>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
>>> as is done below.
>>
>> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
>> will be incorrect.  However, I think we should just not return here in this
>> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
>> we may never break out of this error.
>>
>> Does that make sense?
> 
> No, because virtio_error() marks the device as broken. The device
> requires a reset in order to function again. Fetching
> vring_avail_idx() again won't help.

OK, I see.  In this case we're talking about situation where
vring_avail_idx() was called in some other place and stored a bad value
in the shadow variable, then virtqueue_num_heads() got called.  Right?

AFAIU, we can still just fall through here and let vring_avail_idx()
to read the index again and fail the existing check.  That would happen
today without this patch applied.

I'm jut trying to avoid duplication of the virtio_error call, i.e.:

    if (vq->shadow_avail_idx != idx) {
        num_heads = vq->shadow_avail_idx - idx;

        /* Check it isn't doing very strange things with descriptor numbers. */
        if (num_heads > vq->vring.num) {
            virtio_error(vq->vdev, "Guest moved used index from %u to %u",
                         idx, vq->shadow_avail_idx);
            return -EINVAL;
        }
        return num_heads;
    }

vs

    if (vq->shadow_avail_idx != idx) {
        num_heads = vq->shadow_avail_idx - idx;

        /* Only use the shadow value if it was good initially. */
        if (num_heads <= vq->vring.num) {
            return num_heads;
        }
    }


What do you think?

Best regards, Ilya Maximets.
Stefan Hajnoczi Sept. 25, 2023, 3:38 p.m. UTC | #6
On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 9/25/23 17:12, Stefan Hajnoczi wrote:
> > On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>
> >> On 9/25/23 16:23, Stefan Hajnoczi wrote:
> >>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>
> >>>> We do not need the most up to date number of heads, we only want to
> >>>> know if there is at least one.
> >>>>
> >>>> Use shadow variable as long as it is not equal to the last available
> >>>> index checked.  This avoids expensive qatomic dereference of the
> >>>> RCU-protected memory region cache as well as the memory access itself
> >>>> and the subsequent memory barrier.
> >>>>
> >>>> The change improves performance of the af-xdp network backend by 2-3%.
> >>>>
> >>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> >>>> ---
> >>>>  hw/virtio/virtio.c | 10 +++++++++-
> >>>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>> index 309038fd46..04bf7cc977 100644
> >>>> --- a/hw/virtio/virtio.c
> >>>> +++ b/hw/virtio/virtio.c
> >>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
> >>>>  /* Called within rcu_read_lock().  */
> >>>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
> >>>>  {
> >>>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
> >>>> +    uint16_t num_heads;
> >>>> +
> >>>> +    if (vq->shadow_avail_idx != idx) {
> >>>> +        num_heads = vq->shadow_avail_idx - idx;
> >>>> +
> >>>> +        return num_heads;
> >>>
> >>> This still needs to check num_heads > vq->vring.num and return -EINVAL
> >>> as is done below.
> >>
> >> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
> >> will be incorrect.  However, I think we should just not return here in this
> >> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
> >> we may never break out of this error.
> >>
> >> Does that make sense?
> >
> > No, because virtio_error() marks the device as broken. The device
> > requires a reset in order to function again. Fetching
> > vring_avail_idx() again won't help.
>
> OK, I see.  In this case we're talking about situation where
> vring_avail_idx() was called in some other place and stored a bad value
> in the shadow variable, then virtqueue_num_heads() got called.  Right?
>
> AFAIU, we can still just fall through here and let vring_avail_idx()
> to read the index again and fail the existing check.  That would happen
> today without this patch applied.

Yes, that is fine.

>
> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
>
>     if (vq->shadow_avail_idx != idx) {
>         num_heads = vq->shadow_avail_idx - idx;
>
>         /* Check it isn't doing very strange things with descriptor numbers. */
>         if (num_heads > vq->vring.num) {
>             virtio_error(vq->vdev, "Guest moved used index from %u to %u",
>                          idx, vq->shadow_avail_idx);
>             return -EINVAL;
>         }
>         return num_heads;
>     }
>
> vs
>
>     if (vq->shadow_avail_idx != idx) {
>         num_heads = vq->shadow_avail_idx - idx;
>
>         /* Only use the shadow value if it was good initially. */
>         if (num_heads <= vq->vring.num) {
>             return num_heads;
>         }
>     }
>
>
> What do you think?

Sounds good.

>
> Best regards, Ilya Maximets.
Ilya Maximets Sept. 25, 2023, 8:58 p.m. UTC | #7
On 9/25/23 17:38, Stefan Hajnoczi wrote:
> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> On 9/25/23 17:12, Stefan Hajnoczi wrote:
>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>
>>>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
>>>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>>>
>>>>>> We do not need the most up to date number of heads, we only want to
>>>>>> know if there is at least one.
>>>>>>
>>>>>> Use shadow variable as long as it is not equal to the last available
>>>>>> index checked.  This avoids expensive qatomic dereference of the
>>>>>> RCU-protected memory region cache as well as the memory access itself
>>>>>> and the subsequent memory barrier.
>>>>>>
>>>>>> The change improves performance of the af-xdp network backend by 2-3%.
>>>>>>
>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>> ---
>>>>>>  hw/virtio/virtio.c | 10 +++++++++-
>>>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>>>> index 309038fd46..04bf7cc977 100644
>>>>>> --- a/hw/virtio/virtio.c
>>>>>> +++ b/hw/virtio/virtio.c
>>>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>>>>>>  /* Called within rcu_read_lock().  */
>>>>>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>>>>>  {
>>>>>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
>>>>>> +    uint16_t num_heads;
>>>>>> +
>>>>>> +    if (vq->shadow_avail_idx != idx) {
>>>>>> +        num_heads = vq->shadow_avail_idx - idx;
>>>>>> +
>>>>>> +        return num_heads;
>>>>>
>>>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
>>>>> as is done below.
>>>>
>>>> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
>>>> will be incorrect.  However, I think we should just not return here in this
>>>> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
>>>> we may never break out of this error.
>>>>
>>>> Does that make sense?
>>>
>>> No, because virtio_error() marks the device as broken. The device
>>> requires a reset in order to function again. Fetching
>>> vring_avail_idx() again won't help.
>>
>> OK, I see.  In this case we're talking about situation where
>> vring_avail_idx() was called in some other place and stored a bad value
>> in the shadow variable, then virtqueue_num_heads() got called.  Right?

Hmm, I suppose we also need a read barrier after all even if we use
a shadow index.  Assuming the index is correct, but the shadow variable
was updated by a call outside of this function, then we may miss a
barrier and read the descriptor out of order, in theory.  Read barrier
is going to be a compiler barrier on x86, so the performance gain from
this patch should still be mostly there.  I'll test that.

>>
>> AFAIU, we can still just fall through here and let vring_avail_idx()
>> to read the index again and fail the existing check.  That would happen
>> today without this patch applied.
> 
> Yes, that is fine.
> 
>>
>> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
>>
>>     if (vq->shadow_avail_idx != idx) {
>>         num_heads = vq->shadow_avail_idx - idx;
>>
>>         /* Check it isn't doing very strange things with descriptor numbers. */
>>         if (num_heads > vq->vring.num) {
>>             virtio_error(vq->vdev, "Guest moved used index from %u to %u",
>>                          idx, vq->shadow_avail_idx);
>>             return -EINVAL;
>>         }
>>         return num_heads;
>>     }
>>
>> vs
>>
>>     if (vq->shadow_avail_idx != idx) {
>>         num_heads = vq->shadow_avail_idx - idx;
>>
>>         /* Only use the shadow value if it was good initially. */
>>         if (num_heads <= vq->vring.num) {
>>             return num_heads;
>>         }
>>     }
>>
>>
>> What do you think?
> 
> Sounds good.
> 
>>
>> Best regards, Ilya Maximets.
Michael S. Tsirkin Sept. 25, 2023, 9:24 p.m. UTC | #8
On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
> On 9/25/23 17:38, Stefan Hajnoczi wrote:
> > On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>
> >> On 9/25/23 17:12, Stefan Hajnoczi wrote:
> >>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>
> >>>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
> >>>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>>>
> >>>>>> We do not need the most up to date number of heads, we only want to
> >>>>>> know if there is at least one.
> >>>>>>
> >>>>>> Use shadow variable as long as it is not equal to the last available
> >>>>>> index checked.  This avoids expensive qatomic dereference of the
> >>>>>> RCU-protected memory region cache as well as the memory access itself
> >>>>>> and the subsequent memory barrier.
> >>>>>>
> >>>>>> The change improves performance of the af-xdp network backend by 2-3%.
> >>>>>>
> >>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> >>>>>> ---
> >>>>>>  hw/virtio/virtio.c | 10 +++++++++-
> >>>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>>>> index 309038fd46..04bf7cc977 100644
> >>>>>> --- a/hw/virtio/virtio.c
> >>>>>> +++ b/hw/virtio/virtio.c
> >>>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
> >>>>>>  /* Called within rcu_read_lock().  */
> >>>>>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
> >>>>>>  {
> >>>>>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
> >>>>>> +    uint16_t num_heads;
> >>>>>> +
> >>>>>> +    if (vq->shadow_avail_idx != idx) {
> >>>>>> +        num_heads = vq->shadow_avail_idx - idx;
> >>>>>> +
> >>>>>> +        return num_heads;
> >>>>>
> >>>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
> >>>>> as is done below.
> >>>>
> >>>> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
> >>>> will be incorrect.  However, I think we should just not return here in this
> >>>> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
> >>>> we may never break out of this error.
> >>>>
> >>>> Does that make sense?
> >>>
> >>> No, because virtio_error() marks the device as broken. The device
> >>> requires a reset in order to function again. Fetching
> >>> vring_avail_idx() again won't help.
> >>
> >> OK, I see.  In this case we're talking about situation where
> >> vring_avail_idx() was called in some other place and stored a bad value
> >> in the shadow variable, then virtqueue_num_heads() got called.  Right?
> 
> Hmm, I suppose we also need a read barrier after all even if we use
> a shadow index.  Assuming the index is correct, but the shadow variable
> was updated by a call outside of this function, then we may miss a
> barrier and read the descriptor out of order, in theory.  Read barrier
> is going to be a compiler barrier on x86, so the performance gain from
> this patch should still be mostly there.  I'll test that.

I can't say I understand generally. shadow is under qemu control,
I don't think it can be updated concurrently by multiple CPUs.


> >>
> >> AFAIU, we can still just fall through here and let vring_avail_idx()
> >> to read the index again and fail the existing check.  That would happen
> >> today without this patch applied.
> > 
> > Yes, that is fine.
> > 
> >>
> >> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
> >>
> >>     if (vq->shadow_avail_idx != idx) {
> >>         num_heads = vq->shadow_avail_idx - idx;
> >>
> >>         /* Check it isn't doing very strange things with descriptor numbers. */
> >>         if (num_heads > vq->vring.num) {
> >>             virtio_error(vq->vdev, "Guest moved used index from %u to %u",
> >>                          idx, vq->shadow_avail_idx);
> >>             return -EINVAL;
> >>         }
> >>         return num_heads;
> >>     }
> >>
> >> vs
> >>
> >>     if (vq->shadow_avail_idx != idx) {
> >>         num_heads = vq->shadow_avail_idx - idx;
> >>
> >>         /* Only use the shadow value if it was good initially. */
> >>         if (num_heads <= vq->vring.num) {
> >>             return num_heads;
> >>         }
> >>     }
> >>
> >>
> >> What do you think?
> > 
> > Sounds good.
> > 
> >>
> >> Best regards, Ilya Maximets.
Ilya Maximets Sept. 25, 2023, 10:13 p.m. UTC | #9
On 9/25/23 23:24, Michael S. Tsirkin wrote:
> On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
>> On 9/25/23 17:38, Stefan Hajnoczi wrote:
>>> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>
>>>> On 9/25/23 17:12, Stefan Hajnoczi wrote:
>>>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>>>
>>>>>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
>>>>>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>>>>>
>>>>>>>> We do not need the most up to date number of heads, we only want to
>>>>>>>> know if there is at least one.
>>>>>>>>
>>>>>>>> Use shadow variable as long as it is not equal to the last available
>>>>>>>> index checked.  This avoids expensive qatomic dereference of the
>>>>>>>> RCU-protected memory region cache as well as the memory access itself
>>>>>>>> and the subsequent memory barrier.
>>>>>>>>
>>>>>>>> The change improves performance of the af-xdp network backend by 2-3%.
>>>>>>>>
>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>>> ---
>>>>>>>>  hw/virtio/virtio.c | 10 +++++++++-
>>>>>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>>>>>> index 309038fd46..04bf7cc977 100644
>>>>>>>> --- a/hw/virtio/virtio.c
>>>>>>>> +++ b/hw/virtio/virtio.c
>>>>>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>>>>>>>>  /* Called within rcu_read_lock().  */
>>>>>>>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>>>>>>>  {
>>>>>>>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
>>>>>>>> +    uint16_t num_heads;
>>>>>>>> +
>>>>>>>> +    if (vq->shadow_avail_idx != idx) {
>>>>>>>> +        num_heads = vq->shadow_avail_idx - idx;
>>>>>>>> +
>>>>>>>> +        return num_heads;
>>>>>>>
>>>>>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
>>>>>>> as is done below.
>>>>>>
>>>>>> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
>>>>>> will be incorrect.  However, I think we should just not return here in this
>>>>>> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
>>>>>> we may never break out of this error.
>>>>>>
>>>>>> Does that make sense?
>>>>>
>>>>> No, because virtio_error() marks the device as broken. The device
>>>>> requires a reset in order to function again. Fetching
>>>>> vring_avail_idx() again won't help.
>>>>
>>>> OK, I see.  In this case we're talking about situation where
>>>> vring_avail_idx() was called in some other place and stored a bad value
>>>> in the shadow variable, then virtqueue_num_heads() got called.  Right?
>>
>> Hmm, I suppose we also need a read barrier after all even if we use
>> a shadow index.  Assuming the index is correct, but the shadow variable
>> was updated by a call outside of this function, then we may miss a
>> barrier and read the descriptor out of order, in theory.  Read barrier
>> is going to be a compiler barrier on x86, so the performance gain from
>> this patch should still be mostly there.  I'll test that.
> 
> I can't say I understand generally. shadow is under qemu control,
> I don't think it can be updated concurrently by multiple CPUs.

It can't, I agree.  Scenario I'm thinking about is the following:

1. vring_avail_idx() is called from one of the places other than
   virtqueue_num_heads().  Shadow is updated with the current value.
   Some users of vring_avail_idx() do not use barriers after the call.

2. virtqueue_split_get_avail_bytes() is called.

3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads().

4. virtqueue_num_heads() checks the shadow and returns early.

5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and
   reads the descriptor.

If between steps 1 and 5 we do not have a read barrier, we potentially
risk reading descriptor data that is not yet fully written, because
there is no guarantee that reading the last_avail_idx on step 1 wasn't
reordered with the descriptor read.

In current code we always have smp_rmb() in virtqueue_num_heads().
But if we return from this function without a barrier, we may have an
issue, IIUC.

I agree that it's kind of a very unlikely scenario and we will probably
have a control dependency between steps 1 and 5 that will prevent the
issue, but it might be safer to just have an explicit barrier in
virtqueue_num_heads().

Does that make sense?  Or am I missing something else here?

> 
> 
>>>>
>>>> AFAIU, we can still just fall through here and let vring_avail_idx()
>>>> to read the index again and fail the existing check.  That would happen
>>>> today without this patch applied.
>>>
>>> Yes, that is fine.
>>>
>>>>
>>>> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
>>>>
>>>>     if (vq->shadow_avail_idx != idx) {
>>>>         num_heads = vq->shadow_avail_idx - idx;
>>>>
>>>>         /* Check it isn't doing very strange things with descriptor numbers. */
>>>>         if (num_heads > vq->vring.num) {
>>>>             virtio_error(vq->vdev, "Guest moved used index from %u to %u",
>>>>                          idx, vq->shadow_avail_idx);
>>>>             return -EINVAL;
>>>>         }
>>>>         return num_heads;
>>>>     }
>>>>
>>>> vs
>>>>
>>>>     if (vq->shadow_avail_idx != idx) {
>>>>         num_heads = vq->shadow_avail_idx - idx;
>>>>
>>>>         /* Only use the shadow value if it was good initially. */
>>>>         if (num_heads <= vq->vring.num) {
>>>>             return num_heads;
>>>>         }
>>>>     }
>>>>
>>>>
>>>> What do you think?
>>>
>>> Sounds good.
>>>
>>>>
>>>> Best regards, Ilya Maximets.
>
Michael S. Tsirkin Sept. 25, 2023, 10:24 p.m. UTC | #10
On Tue, Sep 26, 2023 at 12:13:11AM +0200, Ilya Maximets wrote:
> On 9/25/23 23:24, Michael S. Tsirkin wrote:
> > On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
> >> On 9/25/23 17:38, Stefan Hajnoczi wrote:
> >>> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>
> >>>> On 9/25/23 17:12, Stefan Hajnoczi wrote:
> >>>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>>>
> >>>>>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
> >>>>>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>>>>>
> >>>>>>>> We do not need the most up to date number of heads, we only want to
> >>>>>>>> know if there is at least one.
> >>>>>>>>
> >>>>>>>> Use shadow variable as long as it is not equal to the last available
> >>>>>>>> index checked.  This avoids expensive qatomic dereference of the
> >>>>>>>> RCU-protected memory region cache as well as the memory access itself
> >>>>>>>> and the subsequent memory barrier.
> >>>>>>>>
> >>>>>>>> The change improves performance of the af-xdp network backend by 2-3%.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> >>>>>>>> ---
> >>>>>>>>  hw/virtio/virtio.c | 10 +++++++++-
> >>>>>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>>>>>> index 309038fd46..04bf7cc977 100644
> >>>>>>>> --- a/hw/virtio/virtio.c
> >>>>>>>> +++ b/hw/virtio/virtio.c
> >>>>>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
> >>>>>>>>  /* Called within rcu_read_lock().  */
> >>>>>>>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
> >>>>>>>>  {
> >>>>>>>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
> >>>>>>>> +    uint16_t num_heads;
> >>>>>>>> +
> >>>>>>>> +    if (vq->shadow_avail_idx != idx) {
> >>>>>>>> +        num_heads = vq->shadow_avail_idx - idx;
> >>>>>>>> +
> >>>>>>>> +        return num_heads;
> >>>>>>>
> >>>>>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
> >>>>>>> as is done below.
> >>>>>>
> >>>>>> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
> >>>>>> will be incorrect.  However, I think we should just not return here in this
> >>>>>> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
> >>>>>> we may never break out of this error.
> >>>>>>
> >>>>>> Does that make sense?
> >>>>>
> >>>>> No, because virtio_error() marks the device as broken. The device
> >>>>> requires a reset in order to function again. Fetching
> >>>>> vring_avail_idx() again won't help.
> >>>>
> >>>> OK, I see.  In this case we're talking about situation where
> >>>> vring_avail_idx() was called in some other place and stored a bad value
> >>>> in the shadow variable, then virtqueue_num_heads() got called.  Right?
> >>
> >> Hmm, I suppose we also need a read barrier after all even if we use
> >> a shadow index.  Assuming the index is correct, but the shadow variable
> >> was updated by a call outside of this function, then we may miss a
> >> barrier and read the descriptor out of order, in theory.  Read barrier
> >> is going to be a compiler barrier on x86, so the performance gain from
> >> this patch should still be mostly there.  I'll test that.
> > 
> > I can't say I understand generally. shadow is under qemu control,
> > I don't think it can be updated concurrently by multiple CPUs.
> 
> It can't, I agree.  Scenario I'm thinking about is the following:
> 
> 1. vring_avail_idx() is called from one of the places other than
>    virtqueue_num_heads().  Shadow is updated with the current value.
>    Some users of vring_avail_idx() do not use barriers after the call.
> 
> 2. virtqueue_split_get_avail_bytes() is called.
> 
> 3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads().
> 
> 4. virtqueue_num_heads() checks the shadow and returns early.
> 
> 5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and
>    reads the descriptor.
> 
> If between steps 1 and 5 we do not have a read barrier, we potentially
> risk reading descriptor data that is not yet fully written, because
> there is no guarantee that reading the last_avail_idx on step 1 wasn't
> reordered with the descriptor read.
> 
> In current code we always have smp_rmb() in virtqueue_num_heads().
> But if we return from this function without a barrier, we may have an
> issue, IIUC.
> 
> I agree that it's kind of a very unlikely scenario and we will probably
> have a control dependency between steps 1 and 5 that will prevent the
> issue, but it might be safer to just have an explicit barrier in
> virtqueue_num_heads().
> 
> Does that make sense?  Or am I missing something else here?

Aha, got it. Good point, yes. Pls document in a code comment.


> > 
> > 
> >>>>
> >>>> AFAIU, we can still just fall through here and let vring_avail_idx()
> >>>> to read the index again and fail the existing check.  That would happen
> >>>> today without this patch applied.
> >>>
> >>> Yes, that is fine.
> >>>
> >>>>
> >>>> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
> >>>>
> >>>>     if (vq->shadow_avail_idx != idx) {
> >>>>         num_heads = vq->shadow_avail_idx - idx;
> >>>>
> >>>>         /* Check it isn't doing very strange things with descriptor numbers. */
> >>>>         if (num_heads > vq->vring.num) {
> >>>>             virtio_error(vq->vdev, "Guest moved used index from %u to %u",
> >>>>                          idx, vq->shadow_avail_idx);
> >>>>             return -EINVAL;
> >>>>         }
> >>>>         return num_heads;
> >>>>     }
> >>>>
> >>>> vs
> >>>>
> >>>>     if (vq->shadow_avail_idx != idx) {
> >>>>         num_heads = vq->shadow_avail_idx - idx;
> >>>>
> >>>>         /* Only use the shadow value if it was good initially. */
> >>>>         if (num_heads <= vq->vring.num) {
> >>>>             return num_heads;
> >>>>         }
> >>>>     }
> >>>>
> >>>>
> >>>> What do you think?
> >>>
> >>> Sounds good.
> >>>
> >>>>
> >>>> Best regards, Ilya Maximets.
> >
Ilya Maximets Sept. 27, 2023, 2:01 p.m. UTC | #11
On 9/26/23 00:24, Michael S. Tsirkin wrote:
> On Tue, Sep 26, 2023 at 12:13:11AM +0200, Ilya Maximets wrote:
>> On 9/25/23 23:24, Michael S. Tsirkin wrote:
>>> On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
>>>> On 9/25/23 17:38, Stefan Hajnoczi wrote:
>>>>> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>>>
>>>>>> On 9/25/23 17:12, Stefan Hajnoczi wrote:
>>>>>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>>>>>
>>>>>>>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
>>>>>>>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>>>>>>>
>>>>>>>>>> We do not need the most up to date number of heads, we only want to
>>>>>>>>>> know if there is at least one.
>>>>>>>>>>
>>>>>>>>>> Use shadow variable as long as it is not equal to the last available
>>>>>>>>>> index checked.  This avoids expensive qatomic dereference of the
>>>>>>>>>> RCU-protected memory region cache as well as the memory access itself
>>>>>>>>>> and the subsequent memory barrier.
>>>>>>>>>>
>>>>>>>>>> The change improves performance of the af-xdp network backend by 2-3%.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>>>>> ---
>>>>>>>>>>  hw/virtio/virtio.c | 10 +++++++++-
>>>>>>>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>>>>>>>> index 309038fd46..04bf7cc977 100644
>>>>>>>>>> --- a/hw/virtio/virtio.c
>>>>>>>>>> +++ b/hw/virtio/virtio.c
>>>>>>>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>>>>>>>>>>  /* Called within rcu_read_lock().  */
>>>>>>>>>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>>>>>>>>>  {
>>>>>>>>>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
>>>>>>>>>> +    uint16_t num_heads;
>>>>>>>>>> +
>>>>>>>>>> +    if (vq->shadow_avail_idx != idx) {
>>>>>>>>>> +        num_heads = vq->shadow_avail_idx - idx;
>>>>>>>>>> +
>>>>>>>>>> +        return num_heads;
>>>>>>>>>
>>>>>>>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
>>>>>>>>> as is done below.
>>>>>>>>
>>>>>>>> Hmm, yeas, you're right.  If the value was incorrect initially, the shadow
>>>>>>>> will be incorrect.  However, I think we should just not return here in this
>>>>>>>> case and let vring_avail_idx() to grab an actual new value below.  Otherwise
>>>>>>>> we may never break out of this error.
>>>>>>>>
>>>>>>>> Does that make sense?
>>>>>>>
>>>>>>> No, because virtio_error() marks the device as broken. The device
>>>>>>> requires a reset in order to function again. Fetching
>>>>>>> vring_avail_idx() again won't help.
>>>>>>
>>>>>> OK, I see.  In this case we're talking about situation where
>>>>>> vring_avail_idx() was called in some other place and stored a bad value
>>>>>> in the shadow variable, then virtqueue_num_heads() got called.  Right?
>>>>
>>>> Hmm, I suppose we also need a read barrier after all even if we use
>>>> a shadow index.  Assuming the index is correct, but the shadow variable
>>>> was updated by a call outside of this function, then we may miss a
>>>> barrier and read the descriptor out of order, in theory.  Read barrier
>>>> is going to be a compiler barrier on x86, so the performance gain from
>>>> this patch should still be mostly there.  I'll test that.
>>>
>>> I can't say I understand generally. shadow is under qemu control,
>>> I don't think it can be updated concurrently by multiple CPUs.
>>
>> It can't, I agree.  Scenario I'm thinking about is the following:
>>
>> 1. vring_avail_idx() is called from one of the places other than
>>    virtqueue_num_heads().  Shadow is updated with the current value.
>>    Some users of vring_avail_idx() do not use barriers after the call.
>>
>> 2. virtqueue_split_get_avail_bytes() is called.
>>
>> 3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads().
>>
>> 4. virtqueue_num_heads() checks the shadow and returns early.
>>
>> 5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and
>>    reads the descriptor.
>>
>> If between steps 1 and 5 we do not have a read barrier, we potentially
>> risk reading descriptor data that is not yet fully written, because
>> there is no guarantee that reading the last_avail_idx on step 1 wasn't
>> reordered with the descriptor read.
>>
>> In current code we always have smp_rmb() in virtqueue_num_heads().
>> But if we return from this function without a barrier, we may have an
>> issue, IIUC.
>>
>> I agree that it's kind of a very unlikely scenario and we will probably
>> have a control dependency between steps 1 and 5 that will prevent the
>> issue, but it might be safer to just have an explicit barrier in
>> virtqueue_num_heads().
>>
>> Does that make sense?  Or am I missing something else here?
> 
> Aha, got it. Good point, yes. Pls document in a code comment.

Done.  Posted v2:
  https://lore.kernel.org/qemu-devel/20230927135157.2316982-1-i.maximets@ovn.org/

> 
> 
>>>
>>>
>>>>>>
>>>>>> AFAIU, we can still just fall through here and let vring_avail_idx()
>>>>>> to read the index again and fail the existing check.  That would happen
>>>>>> today without this patch applied.
>>>>>
>>>>> Yes, that is fine.
>>>>>
>>>>>>
>>>>>> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
>>>>>>
>>>>>>     if (vq->shadow_avail_idx != idx) {
>>>>>>         num_heads = vq->shadow_avail_idx - idx;
>>>>>>
>>>>>>         /* Check it isn't doing very strange things with descriptor numbers. */
>>>>>>         if (num_heads > vq->vring.num) {
>>>>>>             virtio_error(vq->vdev, "Guest moved used index from %u to %u",
>>>>>>                          idx, vq->shadow_avail_idx);
>>>>>>             return -EINVAL;
>>>>>>         }
>>>>>>         return num_heads;
>>>>>>     }
>>>>>>
>>>>>> vs
>>>>>>
>>>>>>     if (vq->shadow_avail_idx != idx) {
>>>>>>         num_heads = vq->shadow_avail_idx - idx;
>>>>>>
>>>>>>         /* Only use the shadow value if it was good initially. */
>>>>>>         if (num_heads <= vq->vring.num) {
>>>>>>             return num_heads;
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> Sounds good.
>>>>>
>>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>
>
diff mbox series

Patch

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 309038fd46..04bf7cc977 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -999,7 +999,15 @@  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
 /* Called within rcu_read_lock().  */
 static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
 {
-    uint16_t num_heads = vring_avail_idx(vq) - idx;
+    uint16_t num_heads;
+
+    if (vq->shadow_avail_idx != idx) {
+        num_heads = vq->shadow_avail_idx - idx;
+
+        return num_heads;
+    }
+
+    num_heads = vring_avail_idx(vq) - idx;
 
     /* Check it isn't doing very strange things with descriptor numbers. */
     if (num_heads > vq->vring.num) {