diff mbox series

[v2,03/19] spapr: introduce the XIVE interrupt sources

Message ID 20171209084338.29395-4-clg@kaod.org
State New
Headers show
Series spapr: Guest exploitation of the XIVE interrupt controller (POWER9) | expand

Commit Message

Cédric Le Goater Dec. 9, 2017, 8:43 a.m. UTC
Each XIVE interrupt source is associated with a two bit state machine
called an Event State Buffer (ESB) : the first bit "P" means that an
interrupt is "pending" and waiting for an EOI and the bit "Q" (queued)
means a new interrupt was triggered while another was still pending.

When an event is triggered, the associated interrupt state bits are
fetched and modified and forwarded to the virtualization engine of the
controller doing the routing. These can also be controlled by MMIO, to
trigger events or turn off the sources for instance. See code for more
details on the states and transitions.

The MMIO space for the ESBs is 512GB large on the bare-metal system
(PowerNV) and the BAR depends on the chip id. In our model for the
sPAPR machine, we choose to only map the sub-region for the
provisioned IRQ numbers and to use the mapping address of chip 0 of a
real system.

In the real world, each source may have different characteristics
depending on the revision of a controller or the CPU. Early systems
had two different MMIO pages for trigger and for EOI. We choose to use
the same characteristics for all sources to simplify the model. The
minimum CPU level for XIVE exploitation mode will be DD2.X as it has
full support.

The OS will obtain the address of the MMIO page of the ESB entry
associated with a source and its characteristic using the
H_INT_GET_SOURCE_INFO hcall. This will be addressed in the patch
introducing the hcalls.

The spapr_xive_irq() routine in charge of triggering the CPU interrupt
line will be filled later on.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

 Changes since v1:

 - merged in the same patch the qemu_irq handlers
 - reworked the event notification logic of the qemu_irq handlers.  
 - introduced XIVE_ESB_STORE_EOI support
 - removed 'esb_shift' field 
 - removed a useless check on the validity of the IVE in the memory
   region handlers.
 - fixed spapr_xive_pq_trigger() to return true when XIVE_ESB_QUEUED
   is set
 - removed the overall ESB memory region. We now have only one region
   for the provisioned sources.
 - improved 'info pic' output

 hw/intc/spapr_xive.c        | 254 +++++++++++++++++++++++++++++++++++++++++++-
 hw/intc/xive-internal.h     |  10 ++
 include/hw/ppc/spapr_xive.h |   9 ++
 3 files changed, 271 insertions(+), 2 deletions(-)

Comments

Cédric Le Goater Dec. 14, 2017, 3:24 p.m. UTC | #1
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 5b1f78e06a1e..ecc15d889b74 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -24,8 +24,17 @@ struct sPAPRXive {
>      /* Properties */
>      uint32_t     nr_irqs;
>  
> +    /* IRQ */
> +    qemu_irq     *qirqs;
> +
>      /* XIVE internal tables */
>      XiveIVE      *ivt;
> +    uint8_t      *sbe;
> +    uint32_t     sbe_size;
> +
> +    /* ESB memory region */
> +    hwaddr       esb_base;
> +    MemoryRegion esb_iomem;
>  };
>  
>  bool spapr_xive_irq_enable(sPAPRXive *xive, uint32_t lisn);
> 

The addition of the XIVE source fields directly under the sPAPRXive 
object is really a design choice. But I am starting to think that 
having multiple XIVE source objects would be a good idea.

Roughly speaking, a XIVE source is a bunch of PQ bits plus a MMIO 
region to manipulate them and, in the QEMU model, a set of associated 
qemu_irqs to do the same from the handlers. 

In real HW, the PSI host bridge controller on the P9 for instance, 
a register holds all the P bits of the IRQs (no Q bits because the 
IRQs are only LSIs) and there is a specific MMIO region for them. 
PSIHB also has a register to store the assertion level of each IRQ.   
So this is quite similar to what we are adding above and in the
next patch for the LSI support. 

The source triggering only depends on the PQ bits (plus the LSI 
level) and the result is a simple forward of the event notification 
to the central XIVE engine : the IVRE, doing the routing. The IVRE
is really our sPAPRXive object. 

The API between the source and the IVRE is extremely simple :

  static void spapr_xive_irq(sPAPRXive *xive, int lisn)

The IVRE then scans its IVT, finds the EQ, and moves on to the 
presenter.

So, we can keep the IVRE engine (sPAPRXive) attached directly to 
the machine like we have today, this is good, and introduce multiple 
XIVE source objects. The sPAPR machine would have : 

 - one for the IPIs [ 0 - nr_servers ]
 - one generic for the devices [ 4096 -  ]
 - one for each phb ? 

The source address in the overall ESB MMIO region would be calculated 
from the offset of the source IRQ numbers in the IRQ number space. 
The offset could very well be hardcoded for each device. I don't see 
any XICS compatibility problems as we are sharing correctly the IRQ 
number space already.


I am starting this discussion because the support for XIVE in the 
QEMU PowerNV machine will need multiple sources, just like for 
POWER8. PnvXive will be a bit different because the IVRE tables 
(IVT and EQDT) are in the virtual machine memory. Most of the settings 
are done in the VM. The QEMU PowerNV machine will still have to 
implement the triggering and the routing logic using the guest tables. 


Regards,

C.
Benjamin Herrenschmidt Dec. 18, 2017, 12:59 a.m. UTC | #2
On Thu, 2017-12-14 at 16:24 +0100, Cédric Le Goater wrote:
> The API between the source and the IVRE is extremely simple :
> 
>   static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> 
> The IVRE then scans its IVT, finds the EQ, and moves on to the 
> presenter.

In HW it's an MMIO store between the two units (from the source to the
IVRE notification port). I wonder in the long run if we should model
that the same way...

> So, we can keep the IVRE engine (sPAPRXive) attached directly to 
> the machine like we have today, this is good, and introduce multiple 
> XIVE source objects. The sPAPR machine would have : 
> 
>  - one for the IPIs [ 0 - nr_servers ]
>  - one generic for the devices [ 4096 -  ]
>  - one for each phb ? 
> 
> The source address in the overall ESB MMIO region would be calculated 
> from the offset of the source IRQ numbers in the IRQ number space. 
> The offset could very well be hardcoded for each device. I don't see 
> any XICS compatibility problems as we are sharing correctly the IRQ 
> number space already.
> 
> 
> I am starting this discussion because the support for XIVE in the 
> QEMU PowerNV machine will need multiple sources, just like for 
> POWER8. PnvXive will be a bit different because the IVRE tables 
> (IVT and EQDT) are in the virtual machine memory. Most of the settings 
> are done in the VM. The QEMU PowerNV machine will still have to 
> implement the triggering and the routing logic using the guest tables.
Cédric Le Goater Dec. 19, 2017, 6:37 a.m. UTC | #3
On 12/18/2017 01:59 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2017-12-14 at 16:24 +0100, Cédric Le Goater wrote:
>> The API between the source and the IVRE is extremely simple :
>>
>>   static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>>
>> The IVRE then scans its IVT, finds the EQ, and moves on to the 
>> presenter.
> 
> In HW it's an MMIO store between the two units (from the source to the
> IVRE notification port). I wonder in the long run if we should model
> that the same way...

It's a problem for PowerNV. IVSEs should all have an 'IVT offset' 
register and a 'notify trigger port address' address register for 
this purpose. Real HW performs a 4bytes store of the IRQ number 
to forward the notification to the IVRE. It even makes the model 
a little simpler because we don't have to look for the appropriate 
PnvXive object to handle the routing.  

For sPAPR, we don't have such MMIOs but still, we could trigger 
directly the sPAPRXive object without using the qemu_irq objects
which stand in the middle. XIVE IPIs don't use them at all and
only use MMIOs.    

>> So, we can keep the IVRE engine (sPAPRXive) attached directly to 
>> the machine like we have today, this is good, and introduce multiple 
>> XIVE source objects. The sPAPR machine would have : 
>>
>>  - one for the IPIs [ 0 - nr_servers ]
>>  - one generic for the devices [ 4096 -  ]
>>  - one for each phb ? 
>>
>> The source address in the overall ESB MMIO region would be calculated 
>> from the offset of the source IRQ numbers in the IRQ number space. 
>> The offset could very well be hardcoded for each device. I don't see 
>> any XICS compatibility problems as we are sharing correctly the IRQ 
>> number space already.
>>
>>
>> I am starting this discussion because the support for XIVE in the 
>> QEMU PowerNV machine will need multiple sources, just like for 
>> POWER8. PnvXive will be a bit different because the IVRE tables 
>> (IVT and EQDT) are in the virtual machine memory. Most of the settings 
>> are done in the VM. The QEMU PowerNV machine will still have to 
>> implement the triggering and the routing logic using the guest tables.
David Gibson Dec. 20, 2017, 5:13 a.m. UTC | #4
On Tue, Dec 19, 2017 at 07:37:31AM +0100, Cédric Le Goater wrote:
> On 12/18/2017 01:59 AM, Benjamin Herrenschmidt wrote:
> > On Thu, 2017-12-14 at 16:24 +0100, Cédric Le Goater wrote:
> >> The API between the source and the IVRE is extremely simple :
> >>
> >>   static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> >>
> >> The IVRE then scans its IVT, finds the EQ, and moves on to the 
> >> presenter.
> > 
> > In HW it's an MMIO store between the two units (from the source to the
> > IVRE notification port). I wonder in the long run if we should model
> > that the same way...
> 
> It's a problem for PowerNV. IVSEs should all have an 'IVT offset' 
> register and a 'notify trigger port address' address register for 
> this purpose. Real HW performs a 4bytes store of the IRQ number 
> to forward the notification to the IVRE. It even makes the model 
> a little simpler because we don't have to look for the appropriate 
> PnvXive object to handle the routing.  
> 
> For sPAPR, we don't have such MMIOs but still, we could trigger 
> directly the sPAPRXive object without using the qemu_irq objects
> which stand in the middle. XIVE IPIs don't use them at all and
> only use MMIOs.

Yeah, I think we're going to want a model more explicitly close to
what the hardware does.  It's tempting to shortcut it for PAPR, but a)
it'll probably cause us less trouble when we need to implement powernv
and b) I think it's less likely to break as we fill out the various
details we need.

> 
> >> So, we can keep the IVRE engine (sPAPRXive) attached directly to 
> >> the machine like we have today, this is good, and introduce multiple 
> >> XIVE source objects. The sPAPR machine would have : 
> >>
> >>  - one for the IPIs [ 0 - nr_servers ]
> >>  - one generic for the devices [ 4096 -  ]
> >>  - one for each phb ? 
> >>
> >> The source address in the overall ESB MMIO region would be calculated 
> >> from the offset of the source IRQ numbers in the IRQ number space. 
> >> The offset could very well be hardcoded for each device. I don't see 
> >> any XICS compatibility problems as we are sharing correctly the IRQ 
> >> number space already.
> >>
> >>
> >> I am starting this discussion because the support for XIVE in the 
> >> QEMU PowerNV machine will need multiple sources, just like for 
> >> POWER8. PnvXive will be a bit different because the IVRE tables 
> >> (IVT and EQDT) are in the virtual machine memory. Most of the settings 
> >> are done in the VM. The QEMU PowerNV machine will still have to 
> >> implement the triggering and the routing logic using the guest tables. 
>
David Gibson Dec. 20, 2017, 5:22 a.m. UTC | #5
On Sat, Dec 09, 2017 at 09:43:22AM +0100, Cédric Le Goater wrote:
> Each XIVE interrupt source is associated with a two bit state machine
> called an Event State Buffer (ESB) : the first bit "P" means that an
> interrupt is "pending" and waiting for an EOI and the bit "Q" (queued)
> means a new interrupt was triggered while another was still pending.
> 
> When an event is triggered, the associated interrupt state bits are
> fetched and modified and forwarded to the virtualization engine of the
> controller doing the routing. These can also be controlled by MMIO, to
> trigger events or turn off the sources for instance. See code for more
> details on the states and transitions.
> 
> The MMIO space for the ESBs is 512GB large on the bare-metal system
> (PowerNV) and the BAR depends on the chip id. In our model for the
> sPAPR machine, we choose to only map the sub-region for the
> provisioned IRQ numbers and to use the mapping address of chip 0 of a
> real system.

I think we probably want a device property to make the virtualized
base address arbitrary.  It's fine for it to default to the chip 0
base, but that'll make it easier to adapt if we need to later on.

As noted in the followup messages, I think you're going to want to
move this stuff from the current xive object into a "block of sources"
object.

Apart from that this looks pretty sound.

> In the real world, each source may have different characteristics
> depending on the revision of a controller or the CPU. Early systems
> had two different MMIO pages for trigger and for EOI. We choose to use
> the same characteristics for all sources to simplify the model. The
> minimum CPU level for XIVE exploitation mode will be DD2.X as it has
> full support.
> 
> The OS will obtain the address of the MMIO page of the ESB entry
> associated with a source and its characteristic using the
> H_INT_GET_SOURCE_INFO hcall. This will be addressed in the patch
> introducing the hcalls.
> 
> The spapr_xive_irq() routine in charge of triggering the CPU interrupt
> line will be filled later on.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
> 
>  Changes since v1:
> 
>  - merged in the same patch the qemu_irq handlers
>  - reworked the event notification logic of the qemu_irq handlers.  
>  - introduced XIVE_ESB_STORE_EOI support
>  - removed 'esb_shift' field 
>  - removed a useless check on the validity of the IVE in the memory
>    region handlers.
>  - fixed spapr_xive_pq_trigger() to return true when XIVE_ESB_QUEUED
>    is set
>  - removed the overall ESB memory region. We now have only one region
>    for the provisioned sources.
>  - improved 'info pic' output
> 
>  hw/intc/spapr_xive.c        | 254 +++++++++++++++++++++++++++++++++++++++++++-
>  hw/intc/xive-internal.h     |  10 ++
>  include/hw/ppc/spapr_xive.h |   9 ++
>  3 files changed, 271 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index e6e8841add17..43df6814619d 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -18,23 +18,252 @@
>  
>  #include "xive-internal.h"
>  
> +static void spapr_xive_irq(sPAPRXive *xive, int lisn)
> +{
> +
> +}
> +
>  /*
> - * Main XIVE object
> + * XIVE Interrupt Source
> + */
> +
> +/*
> + * "magic" Event State Buffer (ESB) MMIO offsets.
> + *
> + * Each interrupt source has a 2-bit state machine called ESB
> + * which can be controlled by MMIO. It's made of 2 bits, P and
> + * Q. P indicates that an interrupt is pending (has been sent
> + * to a queue and is waiting for an EOI). Q indicates that the
> + * interrupt has been triggered while pending.
> + *
> + * This acts as a coalescing mechanism in order to guarantee
> + * that a given interrupt only occurs at most once in a queue.
> + *
> + * When doing an EOI, the Q bit will indicate if the interrupt
> + * needs to be re-triggered.
> + *
> + * The following offsets into the ESB MMIO allow to read or
> + * manipulate the PQ bits. They must be used with an 8-bytes
> + * load instruction. They all return the previous state of the
> + * interrupt (atomically).
> + *
> + * Additionally, some ESB pages support doing an EOI via a
> + * store at 0 and some ESBs support doing a trigger via a
> + * separate trigger page.
> + */
> +#define XIVE_ESB_STORE_EOI      0x400 /* Store */
> +#define XIVE_ESB_LOAD_EOI       0x000 /* Load */
> +#define XIVE_ESB_GET            0x800 /* Load */
> +#define XIVE_ESB_SET_PQ_00      0xc00 /* Load */
> +#define XIVE_ESB_SET_PQ_01      0xd00 /* Load */
> +#define XIVE_ESB_SET_PQ_10      0xe00 /* Load */
> +#define XIVE_ESB_SET_PQ_11      0xf00 /* Load */
> +
> +#define XIVE_ESB_VAL_P          0x2
> +#define XIVE_ESB_VAL_Q          0x1
> +
> +#define XIVE_ESB_RESET          0x0
> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
> +
> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t lisn)
> +{
> +    uint32_t byte = lisn / 4;
> +    uint32_t bit  = (lisn % 4) * 2;
> +
> +    assert(byte < xive->sbe_size);
> +
> +    return (xive->sbe[byte] >> bit) & 0x3;
> +}
> +
> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t lisn, uint8_t pq)
> +{
> +    uint32_t byte = lisn / 4;
> +    uint32_t bit  = (lisn % 4) * 2;
> +    uint8_t old, new;
> +
> +    assert(byte < xive->sbe_size);
> +
> +    old = xive->sbe[byte];
> +
> +    new = xive->sbe[byte] & ~(0x3 << bit);
> +    new |= (pq & 0x3) << bit;
> +
> +    xive->sbe[byte] = new;
> +
> +    return (old >> bit) & 0x3;
> +}
> +
> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t lisn)
> +{
> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
> +
> +    switch (old_pq) {
> +    case XIVE_ESB_RESET:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
> +        return false;
> +    case XIVE_ESB_PENDING:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
> +        return false;
> +    case XIVE_ESB_QUEUED:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
> +        return true;
> +    case XIVE_ESB_OFF:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
> +        return false;
> +    default:
> +         g_assert_not_reached();
> +    }
> +}
> +
> +/*
> + * Returns whether the event notification should be forwarded to the
> + * IVE for routing.
>   */
> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t lisn)
> +{
> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
>  
> +    switch (old_pq) {
> +    case XIVE_ESB_RESET:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
> +        return true;
> +    case XIVE_ESB_PENDING:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
> +        return false;
> +    case XIVE_ESB_QUEUED:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
> +        return false;
> +    case XIVE_ESB_OFF:
> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
> +        return false;
> +    default:
> +         g_assert_not_reached();
> +    }
> +}
> +
> +/*
> + * XIVE Interrupt Source MMIOs
> + */
> +
> +/*
> + * Some HW use a separate page for trigger. We only support the case
> + * in which the trigger can be done in the same page as the EOI.
> + */
> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> +    uint32_t offset = addr & 0xF00;
> +    uint32_t lisn = addr >> ESB_SHIFT;
> +    uint64_t ret = -1;
> +
> +    switch (offset) {
> +    case XIVE_ESB_LOAD_EOI:
> +        /*
> +         * EOI on load is not used anymore as we now advertise
> +         * XIVE_ESB_STORE_EOI support for the interrupt sources
> +         */
> +        ret = spapr_xive_pq_eoi(xive, lisn);
> +        break;
> +
> +    case XIVE_ESB_GET:
> +        ret = spapr_xive_pq_get(xive, lisn);
> +        break;
> +
> +    case XIVE_ESB_SET_PQ_00:
> +    case XIVE_ESB_SET_PQ_01:
> +    case XIVE_ESB_SET_PQ_10:
> +    case XIVE_ESB_SET_PQ_11:
> +        ret = spapr_xive_pq_set(xive, lisn, (offset >> 8) & 0x3);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
> +    }
> +
> +    return ret;
> +}
> +
> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
> +                                 uint64_t value, unsigned size)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> +    uint32_t offset = addr & 0xF00;
> +    uint32_t lisn = addr >> ESB_SHIFT;
> +    bool notify = false;
> +
> +    switch (offset) {
> +    case 0:
> +        notify = spapr_xive_pq_trigger(xive, lisn);
> +        break;
> +    case XIVE_ESB_STORE_EOI:
> +        /* If the Q bit is set, we should forward a new source event
> +         * notification
> +         */
> +        notify = spapr_xive_pq_eoi(xive, lisn);
> +        break;
> +    default:
> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
> +                      offset);
> +        return;
> +    }
> +
> +    /* Forward the source event notification for routing */
> +    if (notify) {
> +        spapr_xive_irq(xive, lisn);
> +    }
> +}
> +
> +static const MemoryRegionOps spapr_xive_esb_ops = {
> +    .read = spapr_xive_esb_read,
> +    .write = spapr_xive_esb_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +    .impl = {
> +        .min_access_size = 8,
> +        .max_access_size = 8,
> +    },
> +};
> +
> +static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
> +    bool notify = false;
> +
> +    if (val) {
> +        notify = spapr_xive_pq_trigger(xive, lisn);
> +    }
> +
> +    /* Forward the source event notification for routing */
> +    if (notify) {
> +        spapr_xive_irq(xive, lisn);
> +    }
> +}
> +
> +/*
> + * Main XIVE object
> + */
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>  {
>      int i;
>  
>      for (i = 0; i < xive->nr_irqs; i++) {
>          XiveIVE *ive = &xive->ivt[i];
> +        uint8_t pq;
>  
>          if (!(ive->w & IVE_VALID)) {
>              continue;
>          }
>  
> -        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
> +        pq = spapr_xive_pq_get(xive, i);
> +
> +        monitor_printf(mon, "  %4x %s %c%c %08x %08x\n", i,
>                         ive->w & IVE_MASKED ? "M" : " ",
> +                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
> +                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
>                         (int) GETFIELD(IVE_EQ_INDEX, ive->w),
>                         (int) GETFIELD(IVE_EQ_DATA, ive->w));
>      }
> @@ -52,6 +281,9 @@ static void spapr_xive_reset(DeviceState *dev)
>              ive->w |= IVE_MASKED;
>          }
>      }
> +
> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
> +    memset(xive->sbe, 0x55, xive->sbe_size);
>  }
>  
>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
> @@ -65,6 +297,23 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>  
>      /* Allocate the IVT (Interrupt Virtualization Table) */
>      xive->ivt = g_new0(XiveIVE, xive->nr_irqs);
> +
> +    /* QEMU IRQs */
> +    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
> +                                     xive->nr_irqs);
> +
> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
> +    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
> +    xive->sbe = g_malloc0(xive->sbe_size);
> +
> +    /* VC BAR. Use address of chip 0 to install the ESB memory region
> +     * for *all* interrupt sources */
> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
> +
> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive),
> +                          &spapr_xive_esb_ops, xive, "xive.esb",
> +                          (1ull << ESB_SHIFT) * xive->nr_irqs);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_iomem);
>  }
>  
>  static const VMStateDescription vmstate_spapr_xive_ive = {
> @@ -92,6 +341,7 @@ static const VMStateDescription vmstate_spapr_xive = {
>          VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
>          VMSTATE_STRUCT_VARRAY_UINT32(ivt, sPAPRXive, nr_irqs, 1,
>                                       vmstate_spapr_xive_ive, XiveIVE),
> +        VMSTATE_VBUFFER_UINT32(sbe, sPAPRXive, 1, NULL, sbe_size),
>          VMSTATE_END_OF_LIST()
>      },
>  };
> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
> index 132b71a6daf0..872648dd96a2 100644
> --- a/hw/intc/xive-internal.h
> +++ b/hw/intc/xive-internal.h
> @@ -16,6 +16,16 @@
>  #define SETFIELD(m, v, val)                             \
>          (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
>  
> +/*
> + * XIVE MMIO regions
> + */
> +#define P9_MMIO_BASE     0x006000000000000ull
> +
> +/* VC BAR contains set translations for the ESBs and the EQs. */
> +#define VC_BAR_DEFAULT   0x10000000000ull
> +#define VC_BAR_SIZE      0x08000000000ull
> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
> +
>  /* IVE/EAS
>   *
>   * One per interrupt source. Targets that interrupt to a given EQ
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 5b1f78e06a1e..ecc15d889b74 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -24,8 +24,17 @@ struct sPAPRXive {
>      /* Properties */
>      uint32_t     nr_irqs;
>  
> +    /* IRQ */
> +    qemu_irq     *qirqs;
> +
>      /* XIVE internal tables */
>      XiveIVE      *ivt;
> +    uint8_t      *sbe;
> +    uint32_t     sbe_size;
> +
> +    /* ESB memory region */
> +    hwaddr       esb_base;
> +    MemoryRegion esb_iomem;
>  };
>  
>  bool spapr_xive_irq_enable(sPAPRXive *xive, uint32_t lisn);
Cédric Le Goater Dec. 20, 2017, 7:54 a.m. UTC | #6
On 12/20/2017 06:22 AM, David Gibson wrote:
> On Sat, Dec 09, 2017 at 09:43:22AM +0100, Cédric Le Goater wrote:
>> Each XIVE interrupt source is associated with a two bit state machine
>> called an Event State Buffer (ESB) : the first bit "P" means that an
>> interrupt is "pending" and waiting for an EOI and the bit "Q" (queued)
>> means a new interrupt was triggered while another was still pending.
>>
>> When an event is triggered, the associated interrupt state bits are
>> fetched and modified and forwarded to the virtualization engine of the
>> controller doing the routing. These can also be controlled by MMIO, to
>> trigger events or turn off the sources for instance. See code for more
>> details on the states and transitions.
>>
>> The MMIO space for the ESBs is 512GB large on the bare-metal system
>> (PowerNV) and the BAR depends on the chip id. In our model for the
>> sPAPR machine, we choose to only map the sub-region for the
>> provisioned IRQ numbers and to use the mapping address of chip 0 of a
>> real system.
> 
> I think we probably want a device property to make the virtualized
> base address arbitrary.  It's fine for it to default to the chip 0
> base, but that'll make it easier to adapt if we need to later on.

yes. We can add a "bar" property for this purpose like for some of 
the pnv models

> As noted in the followup messages, I think you're going to want to
> move this stuff from the current xive object into a "block of sources"
> object.

yes. I have now a new Xive source model for the POWER9 PSIHB controller.
It should help to find common grounds. This is what I added to support
XIVE in the current PSIHB:

  +    /* P9 */
  +    MemoryRegion esb_iomem;
  +    uint8_t sbe[4]; /* enough for 13 P&Q bits */
  +    uint32_t ivt_offset;

The ESB region mapping is handled at the machine level as it depends 
on the chip id.

The 'ivt_offset' is only used to forward the event notification to 
the routine engine :

  +static void pnv_psi_notify(PnvPsi *psi, uint32_t lisn)
  +{
  +    uint64_t notif_port =
  +        psi->regs[PSIHB_REG(PSIHB9_ESB_NOTIF_ADDR)];
  +    bool valid = notif_port & PSIHB9_ESB_NOTIF_VALID;
  +    uint64_t notify_addr = notif_port & ~PSIHB9_ESB_NOTIF_VALID;
  +    uint32_t data = cpu_to_be32(psi->ivt_offset | lisn);
  +
  +    if (valid) {
  +        cpu_physical_memory_write(notify_addr, &data, sizeof(data));
  +    }
  +}

So It really depends on the controller type. I think that could be a 
class handler.

Thanks,

C. 


> Apart from that this looks pretty sound.
> 
>> In the real world, each source may have different characteristics
>> depending on the revision of a controller or the CPU. Early systems
>> had two different MMIO pages for trigger and for EOI. We choose to use
>> the same characteristics for all sources to simplify the model. The
>> minimum CPU level for XIVE exploitation mode will be DD2.X as it has
>> full support.
>>
>> The OS will obtain the address of the MMIO page of the ESB entry
>> associated with a source and its characteristic using the
>> H_INT_GET_SOURCE_INFO hcall. This will be addressed in the patch
>> introducing the hcalls.
>>
>> The spapr_xive_irq() routine in charge of triggering the CPU interrupt
>> line will be filled later on.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>
>>  Changes since v1:
>>
>>  - merged in the same patch the qemu_irq handlers
>>  - reworked the event notification logic of the qemu_irq handlers.  
>>  - introduced XIVE_ESB_STORE_EOI support
>>  - removed 'esb_shift' field 
>>  - removed a useless check on the validity of the IVE in the memory
>>    region handlers.
>>  - fixed spapr_xive_pq_trigger() to return true when XIVE_ESB_QUEUED
>>    is set
>>  - removed the overall ESB memory region. We now have only one region
>>    for the provisioned sources.
>>  - improved 'info pic' output
>>
>>  hw/intc/spapr_xive.c        | 254 +++++++++++++++++++++++++++++++++++++++++++-
>>  hw/intc/xive-internal.h     |  10 ++
>>  include/hw/ppc/spapr_xive.h |   9 ++
>>  3 files changed, 271 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index e6e8841add17..43df6814619d 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -18,23 +18,252 @@
>>  
>>  #include "xive-internal.h"
>>  
>> +static void spapr_xive_irq(sPAPRXive *xive, int lisn)
>> +{
>> +
>> +}
>> +
>>  /*
>> - * Main XIVE object
>> + * XIVE Interrupt Source
>> + */
>> +
>> +/*
>> + * "magic" Event State Buffer (ESB) MMIO offsets.
>> + *
>> + * Each interrupt source has a 2-bit state machine called ESB
>> + * which can be controlled by MMIO. It's made of 2 bits, P and
>> + * Q. P indicates that an interrupt is pending (has been sent
>> + * to a queue and is waiting for an EOI). Q indicates that the
>> + * interrupt has been triggered while pending.
>> + *
>> + * This acts as a coalescing mechanism in order to guarantee
>> + * that a given interrupt only occurs at most once in a queue.
>> + *
>> + * When doing an EOI, the Q bit will indicate if the interrupt
>> + * needs to be re-triggered.
>> + *
>> + * The following offsets into the ESB MMIO allow to read or
>> + * manipulate the PQ bits. They must be used with an 8-bytes
>> + * load instruction. They all return the previous state of the
>> + * interrupt (atomically).
>> + *
>> + * Additionally, some ESB pages support doing an EOI via a
>> + * store at 0 and some ESBs support doing a trigger via a
>> + * separate trigger page.
>> + */
>> +#define XIVE_ESB_STORE_EOI      0x400 /* Store */
>> +#define XIVE_ESB_LOAD_EOI       0x000 /* Load */
>> +#define XIVE_ESB_GET            0x800 /* Load */
>> +#define XIVE_ESB_SET_PQ_00      0xc00 /* Load */
>> +#define XIVE_ESB_SET_PQ_01      0xd00 /* Load */
>> +#define XIVE_ESB_SET_PQ_10      0xe00 /* Load */
>> +#define XIVE_ESB_SET_PQ_11      0xf00 /* Load */
>> +
>> +#define XIVE_ESB_VAL_P          0x2
>> +#define XIVE_ESB_VAL_Q          0x1
>> +
>> +#define XIVE_ESB_RESET          0x0
>> +#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
>> +#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
>> +#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
>> +
>> +static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    uint32_t byte = lisn / 4;
>> +    uint32_t bit  = (lisn % 4) * 2;
>> +
>> +    assert(byte < xive->sbe_size);
>> +
>> +    return (xive->sbe[byte] >> bit) & 0x3;
>> +}
>> +
>> +static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t lisn, uint8_t pq)
>> +{
>> +    uint32_t byte = lisn / 4;
>> +    uint32_t bit  = (lisn % 4) * 2;
>> +    uint8_t old, new;
>> +
>> +    assert(byte < xive->sbe_size);
>> +
>> +    old = xive->sbe[byte];
>> +
>> +    new = xive->sbe[byte] & ~(0x3 << bit);
>> +    new |= (pq & 0x3) << bit;
>> +
>> +    xive->sbe[byte] = new;
>> +
>> +    return (old >> bit) & 0x3;
>> +}
>> +
>> +static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
>> +
>> +    switch (old_pq) {
>> +    case XIVE_ESB_RESET:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
>> +        return false;
>> +    case XIVE_ESB_PENDING:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
>> +        return false;
>> +    case XIVE_ESB_QUEUED:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
>> +        return true;
>> +    case XIVE_ESB_OFF:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
>> +        return false;
>> +    default:
>> +         g_assert_not_reached();
>> +    }
>> +}
>> +
>> +/*
>> + * Returns whether the event notification should be forwarded to the
>> + * IVE for routing.
>>   */
>> +static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t lisn)
>> +{
>> +    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
>>  
>> +    switch (old_pq) {
>> +    case XIVE_ESB_RESET:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
>> +        return true;
>> +    case XIVE_ESB_PENDING:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
>> +        return false;
>> +    case XIVE_ESB_QUEUED:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
>> +        return false;
>> +    case XIVE_ESB_OFF:
>> +        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
>> +        return false;
>> +    default:
>> +         g_assert_not_reached();
>> +    }
>> +}
>> +
>> +/*
>> + * XIVE Interrupt Source MMIOs
>> + */
>> +
>> +/*
>> + * Some HW use a separate page for trigger. We only support the case
>> + * in which the trigger can be done in the same page as the EOI.
>> + */
>> +static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +    uint32_t offset = addr & 0xF00;
>> +    uint32_t lisn = addr >> ESB_SHIFT;
>> +    uint64_t ret = -1;
>> +
>> +    switch (offset) {
>> +    case XIVE_ESB_LOAD_EOI:
>> +        /*
>> +         * EOI on load is not used anymore as we now advertise
>> +         * XIVE_ESB_STORE_EOI support for the interrupt sources
>> +         */
>> +        ret = spapr_xive_pq_eoi(xive, lisn);
>> +        break;
>> +
>> +    case XIVE_ESB_GET:
>> +        ret = spapr_xive_pq_get(xive, lisn);
>> +        break;
>> +
>> +    case XIVE_ESB_SET_PQ_00:
>> +    case XIVE_ESB_SET_PQ_01:
>> +    case XIVE_ESB_SET_PQ_10:
>> +    case XIVE_ESB_SET_PQ_11:
>> +        ret = spapr_xive_pq_set(xive, lisn, (offset >> 8) & 0x3);
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static void spapr_xive_esb_write(void *opaque, hwaddr addr,
>> +                                 uint64_t value, unsigned size)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +    uint32_t offset = addr & 0xF00;
>> +    uint32_t lisn = addr >> ESB_SHIFT;
>> +    bool notify = false;
>> +
>> +    switch (offset) {
>> +    case 0:
>> +        notify = spapr_xive_pq_trigger(xive, lisn);
>> +        break;
>> +    case XIVE_ESB_STORE_EOI:
>> +        /* If the Q bit is set, we should forward a new source event
>> +         * notification
>> +         */
>> +        notify = spapr_xive_pq_eoi(xive, lisn);
>> +        break;
>> +    default:
>> +        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
>> +                      offset);
>> +        return;
>> +    }
>> +
>> +    /* Forward the source event notification for routing */
>> +    if (notify) {
>> +        spapr_xive_irq(xive, lisn);
>> +    }
>> +}
>> +
>> +static const MemoryRegionOps spapr_xive_esb_ops = {
>> +    .read = spapr_xive_esb_read,
>> +    .write = spapr_xive_esb_write,
>> +    .endianness = DEVICE_BIG_ENDIAN,
>> +    .valid = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +    .impl = {
>> +        .min_access_size = 8,
>> +        .max_access_size = 8,
>> +    },
>> +};
>> +
>> +static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(opaque);
>> +    bool notify = false;
>> +
>> +    if (val) {
>> +        notify = spapr_xive_pq_trigger(xive, lisn);
>> +    }
>> +
>> +    /* Forward the source event notification for routing */
>> +    if (notify) {
>> +        spapr_xive_irq(xive, lisn);
>> +    }
>> +}
>> +
>> +/*
>> + * Main XIVE object
>> + */
>>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>>  {
>>      int i;
>>  
>>      for (i = 0; i < xive->nr_irqs; i++) {
>>          XiveIVE *ive = &xive->ivt[i];
>> +        uint8_t pq;
>>  
>>          if (!(ive->w & IVE_VALID)) {
>>              continue;
>>          }
>>  
>> -        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
>> +        pq = spapr_xive_pq_get(xive, i);
>> +
>> +        monitor_printf(mon, "  %4x %s %c%c %08x %08x\n", i,
>>                         ive->w & IVE_MASKED ? "M" : " ",
>> +                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
>> +                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
>>                         (int) GETFIELD(IVE_EQ_INDEX, ive->w),
>>                         (int) GETFIELD(IVE_EQ_DATA, ive->w));
>>      }
>> @@ -52,6 +281,9 @@ static void spapr_xive_reset(DeviceState *dev)
>>              ive->w |= IVE_MASKED;
>>          }
>>      }
>> +
>> +    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
>> +    memset(xive->sbe, 0x55, xive->sbe_size);
>>  }
>>  
>>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
>> @@ -65,6 +297,23 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>  
>>      /* Allocate the IVT (Interrupt Virtualization Table) */
>>      xive->ivt = g_new0(XiveIVE, xive->nr_irqs);
>> +
>> +    /* QEMU IRQs */
>> +    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
>> +                                     xive->nr_irqs);
>> +
>> +    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
>> +    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
>> +    xive->sbe = g_malloc0(xive->sbe_size);
>> +
>> +    /* VC BAR. Use address of chip 0 to install the ESB memory region
>> +     * for *all* interrupt sources */
>> +    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
>> +
>> +    memory_region_init_io(&xive->esb_iomem, OBJECT(xive),
>> +                          &spapr_xive_esb_ops, xive, "xive.esb",
>> +                          (1ull << ESB_SHIFT) * xive->nr_irqs);
>> +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_iomem);
>>  }
>>  
>>  static const VMStateDescription vmstate_spapr_xive_ive = {
>> @@ -92,6 +341,7 @@ static const VMStateDescription vmstate_spapr_xive = {
>>          VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
>>          VMSTATE_STRUCT_VARRAY_UINT32(ivt, sPAPRXive, nr_irqs, 1,
>>                                       vmstate_spapr_xive_ive, XiveIVE),
>> +        VMSTATE_VBUFFER_UINT32(sbe, sPAPRXive, 1, NULL, sbe_size),
>>          VMSTATE_END_OF_LIST()
>>      },
>>  };
>> diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
>> index 132b71a6daf0..872648dd96a2 100644
>> --- a/hw/intc/xive-internal.h
>> +++ b/hw/intc/xive-internal.h
>> @@ -16,6 +16,16 @@
>>  #define SETFIELD(m, v, val)                             \
>>          (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
>>  
>> +/*
>> + * XIVE MMIO regions
>> + */
>> +#define P9_MMIO_BASE     0x006000000000000ull
>> +
>> +/* VC BAR contains set translations for the ESBs and the EQs. */
>> +#define VC_BAR_DEFAULT   0x10000000000ull
>> +#define VC_BAR_SIZE      0x08000000000ull
>> +#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
>> +
>>  /* IVE/EAS
>>   *
>>   * One per interrupt source. Targets that interrupt to a given EQ
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 5b1f78e06a1e..ecc15d889b74 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -24,8 +24,17 @@ struct sPAPRXive {
>>      /* Properties */
>>      uint32_t     nr_irqs;
>>  
>> +    /* IRQ */
>> +    qemu_irq     *qirqs;
>> +
>>      /* XIVE internal tables */
>>      XiveIVE      *ivt;
>> +    uint8_t      *sbe;
>> +    uint32_t     sbe_size;
>> +
>> +    /* ESB memory region */
>> +    hwaddr       esb_base;
>> +    MemoryRegion esb_iomem;
>>  };
>>  
>>  bool spapr_xive_irq_enable(sPAPRXive *xive, uint32_t lisn);
>
Cédric Le Goater Dec. 20, 2017, 6:08 p.m. UTC | #7
On 12/20/2017 08:54 AM, Cédric Le Goater wrote:
> On 12/20/2017 06:22 AM, David Gibson wrote:
>> On Sat, Dec 09, 2017 at 09:43:22AM +0100, Cédric Le Goater wrote:
>>> Each XIVE interrupt source is associated with a two bit state machine
>>> called an Event State Buffer (ESB) : the first bit "P" means that an
>>> interrupt is "pending" and waiting for an EOI and the bit "Q" (queued)
>>> means a new interrupt was triggered while another was still pending.
>>>
>>> When an event is triggered, the associated interrupt state bits are
>>> fetched and modified and forwarded to the virtualization engine of the
>>> controller doing the routing. These can also be controlled by MMIO, to
>>> trigger events or turn off the sources for instance. See code for more
>>> details on the states and transitions.
>>>
>>> The MMIO space for the ESBs is 512GB large on the bare-metal system
>>> (PowerNV) and the BAR depends on the chip id. In our model for the
>>> sPAPR machine, we choose to only map the sub-region for the
>>> provisioned IRQ numbers and to use the mapping address of chip 0 of a
>>> real system.
>>
>> I think we probably want a device property to make the virtualized
>> base address arbitrary.  It's fine for it to default to the chip 0
>> base, but that'll make it easier to adapt if we need to later on.
> 
> yes. We can add a "bar" property for this purpose like for some of 
> the pnv models
> 
>> As noted in the followup messages, I think you're going to want to
>> move this stuff from the current xive object into a "block of sources"
>> object.

I have (re)introduced a XiveSource object. Only a single instance, and 
under the sPAPRXive object (because it is easier to create). Adding a 
source list should not be too problematic if needed. 

So the XiveSource is generic and I hope to be able to do the same for 
the presenter. 

Just like for XICS, I am also adding a :

	typedef struct XiveFabricClass {
	    InterfaceClass parent;
	    void (*notify)(XiveFabric *xive, int lisn);
	} XiveFabricClass;

which we can use for both the pnv and pseries machines, but the fabric 
is not the machine itself, it is the Xive routing engine, an object 
below.

C.
diff mbox series

Patch

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index e6e8841add17..43df6814619d 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -18,23 +18,252 @@ 
 
 #include "xive-internal.h"
 
+static void spapr_xive_irq(sPAPRXive *xive, int lisn)
+{
+
+}
+
 /*
- * Main XIVE object
+ * XIVE Interrupt Source
+ */
+
+/*
+ * "magic" Event State Buffer (ESB) MMIO offsets.
+ *
+ * Each interrupt source has a 2-bit state machine called ESB
+ * which can be controlled by MMIO. It's made of 2 bits, P and
+ * Q. P indicates that an interrupt is pending (has been sent
+ * to a queue and is waiting for an EOI). Q indicates that the
+ * interrupt has been triggered while pending.
+ *
+ * This acts as a coalescing mechanism in order to guarantee
+ * that a given interrupt only occurs at most once in a queue.
+ *
+ * When doing an EOI, the Q bit will indicate if the interrupt
+ * needs to be re-triggered.
+ *
+ * The following offsets into the ESB MMIO allow to read or
+ * manipulate the PQ bits. They must be used with an 8-bytes
+ * load instruction. They all return the previous state of the
+ * interrupt (atomically).
+ *
+ * Additionally, some ESB pages support doing an EOI via a
+ * store at 0 and some ESBs support doing a trigger via a
+ * separate trigger page.
+ */
+#define XIVE_ESB_STORE_EOI      0x400 /* Store */
+#define XIVE_ESB_LOAD_EOI       0x000 /* Load */
+#define XIVE_ESB_GET            0x800 /* Load */
+#define XIVE_ESB_SET_PQ_00      0xc00 /* Load */
+#define XIVE_ESB_SET_PQ_01      0xd00 /* Load */
+#define XIVE_ESB_SET_PQ_10      0xe00 /* Load */
+#define XIVE_ESB_SET_PQ_11      0xf00 /* Load */
+
+#define XIVE_ESB_VAL_P          0x2
+#define XIVE_ESB_VAL_Q          0x1
+
+#define XIVE_ESB_RESET          0x0
+#define XIVE_ESB_PENDING        XIVE_ESB_VAL_P
+#define XIVE_ESB_QUEUED         (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
+#define XIVE_ESB_OFF            XIVE_ESB_VAL_Q
+
+static uint8_t spapr_xive_pq_get(sPAPRXive *xive, uint32_t lisn)
+{
+    uint32_t byte = lisn / 4;
+    uint32_t bit  = (lisn % 4) * 2;
+
+    assert(byte < xive->sbe_size);
+
+    return (xive->sbe[byte] >> bit) & 0x3;
+}
+
+static uint8_t spapr_xive_pq_set(sPAPRXive *xive, uint32_t lisn, uint8_t pq)
+{
+    uint32_t byte = lisn / 4;
+    uint32_t bit  = (lisn % 4) * 2;
+    uint8_t old, new;
+
+    assert(byte < xive->sbe_size);
+
+    old = xive->sbe[byte];
+
+    new = xive->sbe[byte] & ~(0x3 << bit);
+    new |= (pq & 0x3) << bit;
+
+    xive->sbe[byte] = new;
+
+    return (old >> bit) & 0x3;
+}
+
+static bool spapr_xive_pq_eoi(sPAPRXive *xive, uint32_t lisn)
+{
+    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
+
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
+        return false;
+    case XIVE_ESB_PENDING:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_RESET);
+        return false;
+    case XIVE_ESB_QUEUED:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_OFF:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+/*
+ * Returns whether the event notification should be forwarded to the
+ * IVE for routing.
  */
+static bool spapr_xive_pq_trigger(sPAPRXive *xive, uint32_t lisn)
+{
+    uint8_t old_pq = spapr_xive_pq_get(xive, lisn);
 
+    switch (old_pq) {
+    case XIVE_ESB_RESET:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_PENDING);
+        return true;
+    case XIVE_ESB_PENDING:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
+        return false;
+    case XIVE_ESB_QUEUED:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_QUEUED);
+        return false;
+    case XIVE_ESB_OFF:
+        spapr_xive_pq_set(xive, lisn, XIVE_ESB_OFF);
+        return false;
+    default:
+         g_assert_not_reached();
+    }
+}
+
+/*
+ * XIVE Interrupt Source MMIOs
+ */
+
+/*
+ * Some HW use a separate page for trigger. We only support the case
+ * in which the trigger can be done in the same page as the EOI.
+ */
+static uint64_t spapr_xive_esb_read(void *opaque, hwaddr addr, unsigned size)
+{
+    sPAPRXive *xive = SPAPR_XIVE(opaque);
+    uint32_t offset = addr & 0xF00;
+    uint32_t lisn = addr >> ESB_SHIFT;
+    uint64_t ret = -1;
+
+    switch (offset) {
+    case XIVE_ESB_LOAD_EOI:
+        /*
+         * EOI on load is not used anymore as we now advertise
+         * XIVE_ESB_STORE_EOI support for the interrupt sources
+         */
+        ret = spapr_xive_pq_eoi(xive, lisn);
+        break;
+
+    case XIVE_ESB_GET:
+        ret = spapr_xive_pq_get(xive, lisn);
+        break;
+
+    case XIVE_ESB_SET_PQ_00:
+    case XIVE_ESB_SET_PQ_01:
+    case XIVE_ESB_SET_PQ_10:
+    case XIVE_ESB_SET_PQ_11:
+        ret = spapr_xive_pq_set(xive, lisn, (offset >> 8) & 0x3);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB addr %d\n", offset);
+    }
+
+    return ret;
+}
+
+static void spapr_xive_esb_write(void *opaque, hwaddr addr,
+                                 uint64_t value, unsigned size)
+{
+    sPAPRXive *xive = SPAPR_XIVE(opaque);
+    uint32_t offset = addr & 0xF00;
+    uint32_t lisn = addr >> ESB_SHIFT;
+    bool notify = false;
+
+    switch (offset) {
+    case 0:
+        notify = spapr_xive_pq_trigger(xive, lisn);
+        break;
+    case XIVE_ESB_STORE_EOI:
+        /* If the Q bit is set, we should forward a new source event
+         * notification
+         */
+        notify = spapr_xive_pq_eoi(xive, lisn);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr %d\n",
+                      offset);
+        return;
+    }
+
+    /* Forward the source event notification for routing */
+    if (notify) {
+        spapr_xive_irq(xive, lisn);
+    }
+}
+
+static const MemoryRegionOps spapr_xive_esb_ops = {
+    .read = spapr_xive_esb_read,
+    .write = spapr_xive_esb_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 8,
+        .max_access_size = 8,
+    },
+};
+
+static void spapr_xive_source_set_irq(void *opaque, int lisn, int val)
+{
+    sPAPRXive *xive = SPAPR_XIVE(opaque);
+    bool notify = false;
+
+    if (val) {
+        notify = spapr_xive_pq_trigger(xive, lisn);
+    }
+
+    /* Forward the source event notification for routing */
+    if (notify) {
+        spapr_xive_irq(xive, lisn);
+    }
+}
+
+/*
+ * Main XIVE object
+ */
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
 {
     int i;
 
     for (i = 0; i < xive->nr_irqs; i++) {
         XiveIVE *ive = &xive->ivt[i];
+        uint8_t pq;
 
         if (!(ive->w & IVE_VALID)) {
             continue;
         }
 
-        monitor_printf(mon, "  %4x %s %08x %08x\n", i,
+        pq = spapr_xive_pq_get(xive, i);
+
+        monitor_printf(mon, "  %4x %s %c%c %08x %08x\n", i,
                        ive->w & IVE_MASKED ? "M" : " ",
+                       pq & XIVE_ESB_VAL_P ? 'P' : '-',
+                       pq & XIVE_ESB_VAL_Q ? 'Q' : '-',
                        (int) GETFIELD(IVE_EQ_INDEX, ive->w),
                        (int) GETFIELD(IVE_EQ_DATA, ive->w));
     }
@@ -52,6 +281,9 @@  static void spapr_xive_reset(DeviceState *dev)
             ive->w |= IVE_MASKED;
         }
     }
+
+    /* SBEs are initialized to 0b01 which corresponds to "ints off" */
+    memset(xive->sbe, 0x55, xive->sbe_size);
 }
 
 static void spapr_xive_realize(DeviceState *dev, Error **errp)
@@ -65,6 +297,23 @@  static void spapr_xive_realize(DeviceState *dev, Error **errp)
 
     /* Allocate the IVT (Interrupt Virtualization Table) */
     xive->ivt = g_new0(XiveIVE, xive->nr_irqs);
+
+    /* QEMU IRQs */
+    xive->qirqs = qemu_allocate_irqs(spapr_xive_source_set_irq, xive,
+                                     xive->nr_irqs);
+
+    /* Allocate SBEs (State Bit Entry). 2 bits, so 4 entries per byte */
+    xive->sbe_size = DIV_ROUND_UP(xive->nr_irqs, 4);
+    xive->sbe = g_malloc0(xive->sbe_size);
+
+    /* VC BAR. Use address of chip 0 to install the ESB memory region
+     * for *all* interrupt sources */
+    xive->esb_base = (P9_MMIO_BASE | VC_BAR_DEFAULT);
+
+    memory_region_init_io(&xive->esb_iomem, OBJECT(xive),
+                          &spapr_xive_esb_ops, xive, "xive.esb",
+                          (1ull << ESB_SHIFT) * xive->nr_irqs);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->esb_iomem);
 }
 
 static const VMStateDescription vmstate_spapr_xive_ive = {
@@ -92,6 +341,7 @@  static const VMStateDescription vmstate_spapr_xive = {
         VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
         VMSTATE_STRUCT_VARRAY_UINT32(ivt, sPAPRXive, nr_irqs, 1,
                                      vmstate_spapr_xive_ive, XiveIVE),
+        VMSTATE_VBUFFER_UINT32(sbe, sPAPRXive, 1, NULL, sbe_size),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/hw/intc/xive-internal.h b/hw/intc/xive-internal.h
index 132b71a6daf0..872648dd96a2 100644
--- a/hw/intc/xive-internal.h
+++ b/hw/intc/xive-internal.h
@@ -16,6 +16,16 @@ 
 #define SETFIELD(m, v, val)                             \
         (((v) & ~(m)) | ((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
 
+/*
+ * XIVE MMIO regions
+ */
+#define P9_MMIO_BASE     0x006000000000000ull
+
+/* VC BAR contains set translations for the ESBs and the EQs. */
+#define VC_BAR_DEFAULT   0x10000000000ull
+#define VC_BAR_SIZE      0x08000000000ull
+#define ESB_SHIFT        16 /* One 64k page. OPAL has two */
+
 /* IVE/EAS
  *
  * One per interrupt source. Targets that interrupt to a given EQ
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 5b1f78e06a1e..ecc15d889b74 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -24,8 +24,17 @@  struct sPAPRXive {
     /* Properties */
     uint32_t     nr_irqs;
 
+    /* IRQ */
+    qemu_irq     *qirqs;
+
     /* XIVE internal tables */
     XiveIVE      *ivt;
+    uint8_t      *sbe;
+    uint32_t     sbe_size;
+
+    /* ESB memory region */
+    hwaddr       esb_base;
+    MemoryRegion esb_iomem;
 };
 
 bool spapr_xive_irq_enable(sPAPRXive *xive, uint32_t lisn);