diff mbox series

xive/p9: Remove assert from xive_eq_for_target()

Message ID 20201127073212.219148-1-clg@kaod.org
State Accepted
Headers show
Series xive/p9: Remove assert from xive_eq_for_target() | expand

Commit Message

Cédric Le Goater Nov. 27, 2020, 7:32 a.m. UTC
XIVE VPs are structures describing the vCPUs of guests. When starting
a guest, these are allocated and enabled and some checks are done on
the location of the associated ENDs, which describe the event
queues. If the block of the VP and the block of the ENDs do not match,
the XIVE driver asserts.

Unfortunately, there is no way to check that a VP identifier is part
of a VP block that was previously allocated and it is relatively easy
to crash the host with a bogus VP id. That can be done with a QEMU
hack on a machine using vsmt.

Simply remove the assert, the OS should gracefully handle the error.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/xive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Greg Kurz Nov. 27, 2020, 8:08 a.m. UTC | #1
On Fri, 27 Nov 2020 08:32:12 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> XIVE VPs are structures describing the vCPUs of guests. When starting
> a guest, these are allocated and enabled and some checks are done on
> the location of the associated ENDs, which describe the event
> queues. If the block of the VP and the block of the ENDs do not match,
> the XIVE driver asserts.
> 
> Unfortunately, there is no way to check that a VP identifier is part
> of a VP block that was previously allocated and it is relatively easy
> to crash the host with a bogus VP id. That can be done with a QEMU
> hack on a machine using vsmt.
> 

Reported-by: Greg Kurz <groug@kaod.org>

:)

> Simply remove the assert, the OS should gracefully handle the error.
> 

This seems to be reasonable when xive_eq_for_target() is called from
an OPAL call since they'd all return OPAL_PARAMETER to the OS in this
case.

Some other paths maybe need more care though, eg:

xive_ipi_init()
    __xive_set_irq_config()
        xive_set_irq_targetting()
            xive_eq_for_target()

If xive_eq_for_target() fails to map the target to a valid EQ,
this ends up being ignored in xive_ipi_init() with this patch.
Is it okay ?

> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/xive.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/xive.c b/hw/xive.c
> index 7d4e029f19cb..c442ea5e30ed 100644
> --- a/hw/xive.c
> +++ b/hw/xive.c
> @@ -2152,7 +2152,7 @@ static inline bool xive_eq_for_target(uint32_t target, uint8_t prio,
>  	if (eq_blk != vp_blk) {
>  		xive_err(x, "eq_blk != vp_blk (%d vs. %d) for target 0x%08x/%d\n",
>  			 eq_blk, vp_blk, target, prio);
> -		assert(false);
> +		return false;
>  	}
>  
>  	if (out_eq_blk)
Cédric Le Goater Nov. 27, 2020, 8:32 a.m. UTC | #2
On 11/27/20 9:08 AM, Greg Kurz wrote:
> On Fri, 27 Nov 2020 08:32:12 +0100
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>> XIVE VPs are structures describing the vCPUs of guests. When starting
>> a guest, these are allocated and enabled and some checks are done on
>> the location of the associated ENDs, which describe the event
>> queues. If the block of the VP and the block of the ENDs do not match,
>> the XIVE driver asserts.
>>
>> Unfortunately, there is no way to check that a VP identifier is part
>> of a VP block that was previously allocated and it is relatively easy
>> to crash the host with a bogus VP id. That can be done with a QEMU
>> hack on a machine using vsmt.
>>
> 
> Reported-by: Greg Kurz <groug@kaod.org>
> 
> :)
> 
>> Simply remove the assert, the OS should gracefully handle the error.
>>
> 
> This seems to be reasonable when xive_eq_for_target() is called from
> an OPAL call since they'd all return OPAL_PARAMETER to the OS in this
> case.

Linux behaves correctly. There are errors but the rollback is correct.

 
> Some other paths maybe need more care though, eg:
> 
> xive_ipi_init()
>     __xive_set_irq_config()
>         xive_set_irq_targetting()
>             xive_eq_for_target()
> 
> If xive_eq_for_target() fails to map the target to a valid EQ,
> this ends up being ignored in xive_ipi_init() with this patch.
> Is it okay ?

This is the P8 emulation layer fully handled by OPAL. It the VP
is bogus, then it means that OPAL just screwed up. 

And this seems to be the case because it's using cpu->pir and not 
a vp identifier ... If this is working, it's pure luck. Another 
good reason to drop support for this emulation which is a source 
of complexity. It's dropped for P10. 

See :

  patchwork.ozlabs.org/patch/1157193  xive/p9: remove XICS emulation
  patchwork.ozlabs.org/patch/1157182  xive/p9: remove percpu logging
  patchwork.ozlabs.org/patch/1157172  xive/p9: remove XICS emulation OPAL calls

I would be very happy to push these again and get rid of 1100 
lines of useless bringup complexity. 

C.


> 
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/xive.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/xive.c b/hw/xive.c
>> index 7d4e029f19cb..c442ea5e30ed 100644
>> --- a/hw/xive.c
>> +++ b/hw/xive.c
>> @@ -2152,7 +2152,7 @@ static inline bool xive_eq_for_target(uint32_t target, uint8_t prio,
>>  	if (eq_blk != vp_blk) {
>>  		xive_err(x, "eq_blk != vp_blk (%d vs. %d) for target 0x%08x/%d\n",
>>  			 eq_blk, vp_blk, target, prio);
>> -		assert(false);
>> +		return false;
>>  	}
>>  
>>  	if (out_eq_blk)
>
Greg Kurz Nov. 27, 2020, 9:23 a.m. UTC | #3
On Fri, 27 Nov 2020 09:32:43 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> On 11/27/20 9:08 AM, Greg Kurz wrote:
> > On Fri, 27 Nov 2020 08:32:12 +0100
> > Cédric Le Goater <clg@kaod.org> wrote:
> > 
> >> XIVE VPs are structures describing the vCPUs of guests. When starting
> >> a guest, these are allocated and enabled and some checks are done on
> >> the location of the associated ENDs, which describe the event
> >> queues. If the block of the VP and the block of the ENDs do not match,
> >> the XIVE driver asserts.
> >>
> >> Unfortunately, there is no way to check that a VP identifier is part
> >> of a VP block that was previously allocated and it is relatively easy
> >> to crash the host with a bogus VP id. That can be done with a QEMU
> >> hack on a machine using vsmt.
> >>
> > 
> > Reported-by: Greg Kurz <groug@kaod.org>
> > 
> > :)
> > 
> >> Simply remove the assert, the OS should gracefully handle the error.
> >>
> > 
> > This seems to be reasonable when xive_eq_for_target() is called from
> > an OPAL call since they'd all return OPAL_PARAMETER to the OS in this
> > case.
> 
> Linux behaves correctly. There are errors but the rollback is correct.
> 

Cool.

>  
> > Some other paths maybe need more care though, eg:
> > 
> > xive_ipi_init()
> >     __xive_set_irq_config()
> >         xive_set_irq_targetting()
> >             xive_eq_for_target()
> > 
> > If xive_eq_for_target() fails to map the target to a valid EQ,
> > this ends up being ignored in xive_ipi_init() with this patch.
> > Is it okay ?
> 
> This is the P8 emulation layer fully handled by OPAL. It the VP
> is bogus, then it means that OPAL just screwed up. 
> 
> And this seems to be the case because it's using cpu->pir and not 
> a vp identifier ... If this is working, it's pure luck. Another 
> good reason to drop support for this emulation which is a source 
> of complexity. It's dropped for P10. 
> 
> See :
> 
>   patchwork.ozlabs.org/patch/1157193  xive/p9: remove XICS emulation
>   patchwork.ozlabs.org/patch/1157182  xive/p9: remove percpu logging
>   patchwork.ozlabs.org/patch/1157172  xive/p9: remove XICS emulation OPAL calls
> 
> I would be very happy to push these again and get rid of 1100 
> lines of useless bringup complexity. 
> 

Hmm... some existing setups might still be using power8 compat mode
for legitimate reasons, eg. workload only certified on POWER8. I'm
not sure you can get rid of the XICS emulation that easily.

> C.
> 
> 
> > 
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>  hw/xive.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/hw/xive.c b/hw/xive.c
> >> index 7d4e029f19cb..c442ea5e30ed 100644
> >> --- a/hw/xive.c
> >> +++ b/hw/xive.c
> >> @@ -2152,7 +2152,7 @@ static inline bool xive_eq_for_target(uint32_t target, uint8_t prio,
> >>  	if (eq_blk != vp_blk) {
> >>  		xive_err(x, "eq_blk != vp_blk (%d vs. %d) for target 0x%08x/%d\n",
> >>  			 eq_blk, vp_blk, target, prio);
> >> -		assert(false);
> >> +		return false;
> >>  	}
> >>  
> >>  	if (out_eq_blk)
> > 
>
Cédric Le Goater Nov. 27, 2020, 9:44 a.m. UTC | #4
On 11/27/20 10:23 AM, Greg Kurz wrote:
> On Fri, 27 Nov 2020 09:32:43 +0100
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>> On 11/27/20 9:08 AM, Greg Kurz wrote:
>>> On Fri, 27 Nov 2020 08:32:12 +0100
>>> Cédric Le Goater <clg@kaod.org> wrote:
>>>
>>>> XIVE VPs are structures describing the vCPUs of guests. When starting
>>>> a guest, these are allocated and enabled and some checks are done on
>>>> the location of the associated ENDs, which describe the event
>>>> queues. If the block of the VP and the block of the ENDs do not match,
>>>> the XIVE driver asserts.
>>>>
>>>> Unfortunately, there is no way to check that a VP identifier is part
>>>> of a VP block that was previously allocated and it is relatively easy
>>>> to crash the host with a bogus VP id. That can be done with a QEMU
>>>> hack on a machine using vsmt.
>>>>
>>>
>>> Reported-by: Greg Kurz <groug@kaod.org>
>>>
>>> :)
>>>
>>>> Simply remove the assert, the OS should gracefully handle the error.
>>>>
>>>
>>> This seems to be reasonable when xive_eq_for_target() is called from
>>> an OPAL call since they'd all return OPAL_PARAMETER to the OS in this
>>> case.
>>
>> Linux behaves correctly. There are errors but the rollback is correct.
>>
> 
> Cool.
> 
>>  
>>> Some other paths maybe need more care though, eg:
>>>
>>> xive_ipi_init()
>>>     __xive_set_irq_config()
>>>         xive_set_irq_targetting()
>>>             xive_eq_for_target()
>>>
>>> If xive_eq_for_target() fails to map the target to a valid EQ,
>>> this ends up being ignored in xive_ipi_init() with this patch.
>>> Is it okay ?
>>
>> This is the P8 emulation layer fully handled by OPAL. It the VP
>> is bogus, then it means that OPAL just screwed up. 
>>
>> And this seems to be the case because it's using cpu->pir and not 
>> a vp identifier ... If this is working, it's pure luck. Another 
>> good reason to drop support for this emulation which is a source 
>> of complexity. It's dropped for P10. 
>>
>> See :
>>
>>   patchwork.ozlabs.org/patch/1157193  xive/p9: remove XICS emulation
>>   patchwork.ozlabs.org/patch/1157182  xive/p9: remove percpu logging
>>   patchwork.ozlabs.org/patch/1157172  xive/p9: remove XICS emulation OPAL calls
>>
>> I would be very happy to push these again and get rid of 1100 
>> lines of useless bringup complexity. 
>>
> 
> Hmm... some existing setups might still be using power8 compat mode
> for legitimate reasons, eg. workload only certified on POWER8. 

To be clear, I am talking of XICS emulation in OPAL/BML and not in KVM
or PowerVM. 

If I am correct, the certification of an application is tied to an OS 
which is tied to a set of HW. RH7 is certified to run on POWER8 but 
not on POWER9. So, if it might work to run a RH7 on a POWER9 but 
I doubt Redhat would guarantee any POWER8 workload there. 

C.
Greg Kurz Nov. 27, 2020, 11:36 a.m. UTC | #5
On Fri, 27 Nov 2020 10:44:01 +0100
Cédric Le Goater <clg@kaod.org> wrote:

[...]
> > 
> > Hmm... some existing setups might still be using power8 compat mode
> > for legitimate reasons, eg. workload only certified on POWER8. 
> 
> To be clear, I am talking of XICS emulation in OPAL/BML and not in KVM
> or PowerVM. 
> 

Ah, I got confused, sorry. Thanks for the clarification.

> If I am correct, the certification of an application is tied to an OS 
> which is tied to a set of HW. RH7 is certified to run on POWER8 but 
> not on POWER9. So, if it might work to run a RH7 on a POWER9 but 
> I doubt Redhat would guarantee any POWER8 workload there. 
> 

RH guarantees that a RHEL7 guest runs in POWER8 compat mode on a
POWER9 host AFAIK.

> C.
Cédric Le Goater Nov. 27, 2020, 11:57 a.m. UTC | #6
On 11/27/20 12:36 PM, Greg Kurz wrote:
> On Fri, 27 Nov 2020 10:44:01 +0100
> Cédric Le Goater <clg@kaod.org> wrote:
> 
> [...]
>>>
>>> Hmm... some existing setups might still be using power8 compat mode
>>> for legitimate reasons, eg. workload only certified on POWER8. 
>>
>> To be clear, I am talking of XICS emulation in OPAL/BML and not in KVM
>> or PowerVM. 
>>
> 
> Ah, I got confused, sorry. Thanks for the clarification.
> 
>> If I am correct, the certification of an application is tied to an OS 
>> which is tied to a set of HW. RH7 is certified to run on POWER8 but 
>> not on POWER9. So, if it might work to run a RH7 on a POWER9 but 
>> I doubt Redhat would guarantee any POWER8 workload there. 
>>
> 
> RH guarantees that a RHEL7 guest runs in POWER8 compat mode on a
> POWER9 host AFAIK.

Yes. That's make sense to me. It's the whole purpose of the compat
mode for KVM guests or LPARs if PowerVM.

On OPAL/BML, we are more in the experimental domain or HW bringup.

C.
Vasant Hegde Dec. 16, 2020, 8:03 a.m. UTC | #7
On 11/27/20 1:02 PM, Cédric Le Goater wrote:
> XIVE VPs are structures describing the vCPUs of guests. When starting
> a guest, these are allocated and enabled and some checks are done on
> the location of the associated ENDs, which describe the event
> queues. If the block of the VP and the block of the ENDs do not match,
> the XIVE driver asserts.
> 
> Unfortunately, there is no way to check that a VP identifier is part
> of a VP block that was previously allocated and it is relatively easy
> to crash the host with a bogus VP id. That can be done with a QEMU
> hack on a machine using vsmt.
> 
> Simply remove the assert, the OS should gracefully handle the error.

Thanks! Merged to master as of f07ea956.

Cedric,

I see that you have marked this to stable tree. Do we need this in all supported 
stable branches or only in latest stable i.e. v6.7.x?

-Vasant
Cédric Le Goater Dec. 16, 2020, 8:33 a.m. UTC | #8
On 12/16/20 9:03 AM, Vasant Hegde wrote:
> On 11/27/20 1:02 PM, Cédric Le Goater wrote:
>> XIVE VPs are structures describing the vCPUs of guests. When starting
>> a guest, these are allocated and enabled and some checks are done on
>> the location of the associated ENDs, which describe the event
>> queues. If the block of the VP and the block of the ENDs do not match,
>> the XIVE driver asserts.
>>
>> Unfortunately, there is no way to check that a VP identifier is part
>> of a VP block that was previously allocated and it is relatively easy
>> to crash the host with a bogus VP id. That can be done with a QEMU
>> hack on a machine using vsmt.
>>
>> Simply remove the assert, the OS should gracefully handle the error.
> 
> Thanks! Merged to master as of f07ea956.
> 
> Cedric,
> 
> I see that you have marked this to stable tree. Do we need this in 
> all supported stable branches or only in latest stable i.e. v6.7.x?

Since KVM XIVE is supported, so that's v6.3+

Thanks,

C.
diff mbox series

Patch

diff --git a/hw/xive.c b/hw/xive.c
index 7d4e029f19cb..c442ea5e30ed 100644
--- a/hw/xive.c
+++ b/hw/xive.c
@@ -2152,7 +2152,7 @@  static inline bool xive_eq_for_target(uint32_t target, uint8_t prio,
 	if (eq_blk != vp_blk) {
 		xive_err(x, "eq_blk != vp_blk (%d vs. %d) for target 0x%08x/%d\n",
 			 eq_blk, vp_blk, target, prio);
-		assert(false);
+		return false;
 	}
 
 	if (out_eq_blk)