mbox series

[RFC,v2,0/5] physmem: Have flaview API check bus permission from MemTxAttrs argument

Message ID 20210823164157.751807-1-philmd@redhat.com
Headers show
Series physmem: Have flaview API check bus permission from MemTxAttrs argument | expand

Message

Philippe Mathieu-Daudé Aug. 23, 2021, 4:41 p.m. UTC
This series aim to kill a recent class of bug, the infamous
"DMA reentrancy" issues found by Alexander while fuzzing.

Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:

- MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
- MEMTXPERM_UNRESTRICTED (allow list approach)
- MEMTXPERM_RAM_DEVICE (example of deny list approach)

If a transaction permission is not allowed (for example access
to non-RAM device), we return the specific MEMTX_BUS_ERROR.

Permissions are checked in after the flatview is resolved, and
before the access is done, in a new function: flatview_access_allowed().

I'll post another series on top as example, fixing the SD card
bugs.

Since v1 ("hw: Forbid DMA write accesses to MMIO regions") [1]:
- rewrite based on Peter / Stefan feedbacks

Based on "hw: Let the DMA API take a MemTxAttrs argument" [2].

Supersedes: <20200903110831.353476-1-philmd@redhat.com>
Based-on: <20210702092439.989969-1-philmd@redhat.com>

[1] https://www.mail-archive.com/qemu-block@nongnu.org/msg72924.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg820359.html

Philippe Mathieu-Daudé (5):
  softmmu/physmem: Simplify flatview_write and
    address_space_access_valid
  hw/intc/arm_gicv3: Check for !MEMTX_OK instead of MEMTX_ERROR
  exec/memattrs: Introduce MemTxAttrs::bus_perm field
  softmmu/physmem: Introduce flatview_access_allowed() to check bus
    perms
  softmmu/physmem: Have flaview API check MemTxAttrs::bus_perm field

 include/exec/memattrs.h    | 21 +++++++++++++
 hw/intc/arm_gicv3_dist.c   |  4 +--
 hw/intc/arm_gicv3_redist.c |  4 +--
 softmmu/physmem.c          | 61 ++++++++++++++++++++++++++++++++------
 4 files changed, 77 insertions(+), 13 deletions(-)

Comments

Peter Maydell Aug. 23, 2021, 7:10 p.m. UTC | #1
On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>
> This series aim to kill a recent class of bug, the infamous
> "DMA reentrancy" issues found by Alexander while fuzzing.
>
> Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
>
> - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> - MEMTXPERM_UNRESTRICTED (allow list approach)
> - MEMTXPERM_RAM_DEVICE (example of deny list approach)
>
> If a transaction permission is not allowed (for example access
> to non-RAM device), we return the specific MEMTX_BUS_ERROR.
>
> Permissions are checked in after the flatview is resolved, and
> before the access is done, in a new function: flatview_access_allowed().

So I'm not going to say 'no' to this, because we have a real
recursive-device-handling problem and I don't have a better
idea to hand, but the thing about this is that we end up with
behaviour which is not what the real hardware does. I'm not
aware of any DMA device which has this kind of "can only DMA
to/from RAM, and aborts on access to a device" behaviour...

-- PMM
Peter Xu Aug. 23, 2021, 8:50 p.m. UTC | #2
On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote:
> On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> >
> > This series aim to kill a recent class of bug, the infamous
> > "DMA reentrancy" issues found by Alexander while fuzzing.
> >
> > Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
> >
> > - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> > - MEMTXPERM_UNRESTRICTED (allow list approach)
> > - MEMTXPERM_RAM_DEVICE (example of deny list approach)
> >
> > If a transaction permission is not allowed (for example access
> > to non-RAM device), we return the specific MEMTX_BUS_ERROR.
> >
> > Permissions are checked in after the flatview is resolved, and
> > before the access is done, in a new function: flatview_access_allowed().
> 
> So I'm not going to say 'no' to this, because we have a real
> recursive-device-handling problem and I don't have a better
> idea to hand, but the thing about this is that we end up with
> behaviour which is not what the real hardware does. I'm not
> aware of any DMA device which has this kind of "can only DMA
> to/from RAM, and aborts on access to a device" behaviour...

Sorry for not being familiar with the context - is there more info regarding
the problem to fix?  I'm looking at the links mentioned in the old series:

https://lore.kernel.org/qemu-devel/20200903110831.353476-12-philmd@redhat.com/
https://bugs.launchpad.net/qemu/+bug/1886362
https://bugs.launchpad.net/qemu/+bug/1888606

They seem all marked as fixed now.

Thanks,
Alexander Bulekov Aug. 23, 2021, 10:26 p.m. UTC | #3
On 210823 1650, Peter Xu wrote:
> On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote:
> > On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> > >
> > > This series aim to kill a recent class of bug, the infamous
> > > "DMA reentrancy" issues found by Alexander while fuzzing.
> > >
> > > Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
> > >
> > > - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> > > - MEMTXPERM_UNRESTRICTED (allow list approach)
> > > - MEMTXPERM_RAM_DEVICE (example of deny list approach)
> > >
> > > If a transaction permission is not allowed (for example access
> > > to non-RAM device), we return the specific MEMTX_BUS_ERROR.
> > >
> > > Permissions are checked in after the flatview is resolved, and
> > > before the access is done, in a new function: flatview_access_allowed().
> > 
> > So I'm not going to say 'no' to this, because we have a real
> > recursive-device-handling problem and I don't have a better
> > idea to hand, but the thing about this is that we end up with
> > behaviour which is not what the real hardware does. I'm not
> > aware of any DMA device which has this kind of "can only DMA
> > to/from RAM, and aborts on access to a device" behaviour...
> 
> Sorry for not being familiar with the context - is there more info regarding
> the problem to fix?  I'm looking at the links mentioned in the old series:
> 
> https://lore.kernel.org/qemu-devel/20200903110831.353476-12-philmd@redhat.com/
> https://bugs.launchpad.net/qemu/+bug/1886362
> https://bugs.launchpad.net/qemu/+bug/1888606
> 
> They seem all marked as fixed now.

Here are some that should still reproduce:
https://gitlab.com/qemu-project/qemu/-/issues/542
https://gitlab.com/qemu-project/qemu/-/issues/540
https://gitlab.com/qemu-project/qemu/-/issues/541
https://gitlab.com/qemu-project/qemu/-/issues/62
https://lore.kernel.org/qemu-devel/20210218140629.373646-1-ppandit@redhat.com/ (CVE-2021-20255)

There's also this one, that I don't think I ever created a bug report
for (working on it now):
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33247
-Alex

> 
> Thanks,
> 
> -- 
> Peter Xu
>
Philippe Mathieu-Daudé Aug. 24, 2021, 7:24 a.m. UTC | #4
On 8/24/21 12:26 AM, Alexander Bulekov wrote:
> On 210823 1650, Peter Xu wrote:
>> On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote:
>>> On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>>>>
>>>> This series aim to kill a recent class of bug, the infamous
>>>> "DMA reentrancy" issues found by Alexander while fuzzing.
>>>>
>>>> Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
>>>>
>>>> - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
>>>> - MEMTXPERM_UNRESTRICTED (allow list approach)
>>>> - MEMTXPERM_RAM_DEVICE (example of deny list approach)
>>>>
>>>> If a transaction permission is not allowed (for example access
>>>> to non-RAM device), we return the specific MEMTX_BUS_ERROR.
>>>>
>>>> Permissions are checked in after the flatview is resolved, and
>>>> before the access is done, in a new function: flatview_access_allowed().
>>>
>>> So I'm not going to say 'no' to this, because we have a real
>>> recursive-device-handling problem and I don't have a better
>>> idea to hand, but the thing about this is that we end up with
>>> behaviour which is not what the real hardware does. I'm not
>>> aware of any DMA device which has this kind of "can only DMA
>>> to/from RAM, and aborts on access to a device" behaviour...
>>
>> Sorry for not being familiar with the context - is there more info regarding
>> the problem to fix?  I'm looking at the links mentioned in the old series:
>>
>> https://lore.kernel.org/qemu-devel/20200903110831.353476-12-philmd@redhat.com/
>> https://bugs.launchpad.net/qemu/+bug/1886362
>> https://bugs.launchpad.net/qemu/+bug/1888606
>>
>> They seem all marked as fixed now.
> 
> Here are some that should still reproduce:
> https://gitlab.com/qemu-project/qemu/-/issues/542
> https://gitlab.com/qemu-project/qemu/-/issues/540
> https://gitlab.com/qemu-project/qemu/-/issues/541
> https://gitlab.com/qemu-project/qemu/-/issues/62
> https://lore.kernel.org/qemu-devel/20210218140629.373646-1-ppandit@redhat.com/ (CVE-2021-20255)

Also 305, 451, 557.

Issues list tracked here:
https://gitlab.com/qemu-project/qemu/-/issues/556
(Thanks Alex for updating it!)

> 
> There's also this one, that I don't think I ever created a bug report
> for (working on it now):
> https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33247
> -Alex
> 
>>
>> Thanks,
>>
>> -- 
>> Peter Xu
>>
>
Stefan Hajnoczi Aug. 24, 2021, 8:58 a.m. UTC | #5
On Mon, Aug 23, 2021 at 06:41:52PM +0200, Philippe Mathieu-Daudé wrote:
> This series aim to kill a recent class of bug, the infamous
> "DMA reentrancy" issues found by Alexander while fuzzing.
> 
> Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
> 
> - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> - MEMTXPERM_UNRESTRICTED (allow list approach)
> - MEMTXPERM_RAM_DEVICE (example of deny list approach)

These names don't hint at their descriptions. I wouldn't be able to tell
what they do based on the name. I'll think more about naming after
reviewing the patches.
Edgar E. Iglesias Aug. 24, 2021, 9:25 a.m. UTC | #6
On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote:
> On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> >
> > This series aim to kill a recent class of bug, the infamous
> > "DMA reentrancy" issues found by Alexander while fuzzing.
> >
> > Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
> >
> > - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> > - MEMTXPERM_UNRESTRICTED (allow list approach)
> > - MEMTXPERM_RAM_DEVICE (example of deny list approach)
> >
> > If a transaction permission is not allowed (for example access
> > to non-RAM device), we return the specific MEMTX_BUS_ERROR.
> >
> > Permissions are checked in after the flatview is resolved, and
> > before the access is done, in a new function: flatview_access_allowed().
> 
> So I'm not going to say 'no' to this, because we have a real
> recursive-device-handling problem and I don't have a better
> idea to hand, but the thing about this is that we end up with
> behaviour which is not what the real hardware does. I'm not
> aware of any DMA device which has this kind of "can only DMA
> to/from RAM, and aborts on access to a device" behaviour...
>

Yes, I agree.

Having said that, There are DMA devices that do indicate to the
interconnect and peripherals if they are targeting "normal" memory
or device (together with cacheability, buffering and ordering
attributes). Accessing registers of a device with "normal" memory
and cache attributes is not a good idea and may lead to all kinds
of weirdness on real HW.

IMO, it would be better to model something like those attributes
rather than permissions. And it's probably a good idea to not
call it MEMTXPERM_RAM_DEVICE since in the AMBA documentation
it's Normal Memory vs Device (calling it RAM_DEVICE is confusing
to me at least).

Adding a "memory" attribute to MemTxAttrs may be enough?
If it's set, the access would only target memories. If targeting a device
it could perhaps be logged and dropped rather than aborted?
If not set (the default), accesses would target anything...

Cheers,
Edgar
Peter Maydell Aug. 24, 2021, 9:49 a.m. UTC | #7
On Mon, 23 Aug 2021 at 21:50, Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote:
> > On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> > >
> > > This series aim to kill a recent class of bug, the infamous
> > > "DMA reentrancy" issues found by Alexander while fuzzing.
> > >
> > > Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
> > >
> > > - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> > > - MEMTXPERM_UNRESTRICTED (allow list approach)
> > > - MEMTXPERM_RAM_DEVICE (example of deny list approach)
> > >
> > > If a transaction permission is not allowed (for example access
> > > to non-RAM device), we return the specific MEMTX_BUS_ERROR.
> > >
> > > Permissions are checked in after the flatview is resolved, and
> > > before the access is done, in a new function: flatview_access_allowed().
> >
> > So I'm not going to say 'no' to this, because we have a real
> > recursive-device-handling problem and I don't have a better
> > idea to hand, but the thing about this is that we end up with
> > behaviour which is not what the real hardware does. I'm not
> > aware of any DMA device which has this kind of "can only DMA
> > to/from RAM, and aborts on access to a device" behaviour...
>
> Sorry for not being familiar with the context - is there more info regarding
> the problem to fix?

So, the general problem is that we have a whole class of bugs
that look like this:

 * Device A is DMA-capable. It also has a set of memory
   mapped registers which can be used to control it.
 * Malicious guest code (or the fuzzer) programs A's DMA
   engine to do a DMA read or write to the address where
   A's own registers are mapped.
 * Typically, the MemoryRegionOps write function for the
   register block will handle the "write to start-dma
   register" by doing the DMA, ie calling address_space_write(),
   pci_dma_write(), or equivalent. Because of the target address
   the guest code has set, that will result in the memory
   subsystem calling back into the same MemoryRegionOps
   write function, recursively.
 * Our code implementing the model of device A is not at
   all expecting this re-entrancy, and might crash, access
   freed memory, or otherwise misbehave.

You can elaborate on that basic scenario, for instance with
a loop of multiple devices where you program device A to
do a DMA write to device B's registers which starts device B doing
a DMA write to A's registers. Nor is it inherently limited to
DMA -- device A could be made to assert a qemu_irq that is
connected to device B in a way that causes device B to
do something that results in code calling back into device A;
or maybe device A DMAs to a register in device B that implements
a device-reset on device A. DMA is just the easiest for guest
code to set up and has the least restrictions on how it's
connected up.

In some specific cases we have "fixed" individual instances
of this bug by putting in checks or changes to whatever device
model the fuzzer happened to find a problem with, or put in
slightly wider-scale attempts to catch this (eg commit 22dc8663d9f
which prevents re-entering a NIC device's packet-rx callback
function if the guest has set it up so that a received packet
results in DMA that triggers another received packet). But
we don't have a coherent model of how we ought to structure
device models that can avoid this problem in a general way,
and I think that until we do we're liable to keep running into
specific bugs, some of which will be (or at least be labelled
as) "security issues".

Philippe's series here tries to fix this for at least any
variety of this bug where there is a DMA access in the
loop, by forbidding DMA accesses to MMIO regions not backed
by RAM. That works, in that it breaks the loop; as I
mentioned in my other email, it does it in a way that's
not the way real h/w behaves. Unfortunately "what does real
h/w do?" is not in this situation necessarily a helpful
guide for QEMU design: I think that real hardware:
(a) often doesn't see the same kind of problem because
   the design will usually decouple the DMA engine from
   the register-access logic in a way that means it
   naturally doesn't literally lock up
(b) often won't have been designed to deal with "software
   programs a DMA-to-self" either, but the threat model
   for real hw is different, in that software has many ways
   of making the overall system crash, hang or misbehave;
   it often doesn't have the "need to allow untrusted
   software to touch this device" situation.

One could have QEMU work somewhat like (a) by mandating
that all DMA of any kind was done in separate bottom-half
routines and not directly from the register-write code.
That would probably reduce performance and be a lot of
code to restructure. It would deal also with another class
of "guest code can make QEMU hang by programming it to do
an enormous DMA all at once or by setting up an infinitely
looping chain of DMA commands" bugs, though.

I was vaguely tossing an idea around in the back of my mind
about whether you could have a flag on devices that marked
them as "this device is currently involved in IO", such that
you could then just fail the last DMA (or qemu_irq_set, or
whatever) that would complete the loop back to a device that
was already doing IO. But that would need a lot of thinking
through to figure out if it's feasible, and it's probably
a lot of code change.

thanks
-- PMM
Gerd Hoffmann Aug. 24, 2021, 12:01 p.m. UTC | #8
Hi,

> I was vaguely tossing an idea around in the back of my mind
> about whether you could have a flag on devices that marked
> them as "this device is currently involved in IO", such that
> you could then just fail the last DMA (or qemu_irq_set, or
> whatever) that would complete the loop back to a device that
> was already doing IO. But that would need a lot of thinking
> through to figure out if it's feasible, and it's probably
> a lot of code change.

Quick & dirty hack trying the above.  Not much code, it is opt-in per
MemoryRegion (so less overhead for devices which already handle all DMA
in a BH), tracks state in DeviceState.  Adds a check to a rather hot
code path though.  Not tested yet (stopped investigating when I noticed
Philippe tries to fix the same thing with another approach).  Not
benchmarked.

Maybe it helps ...

take care,
  Gerd

From 80e58a2cd2c630f0bddd9d0eaee71abb7eeb9440 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Tue, 17 Aug 2021 07:35:37 +0200
Subject: [PATCH] allow track active mmio handlers

---
 include/exec/memory.h  |  1 +
 include/hw/qdev-core.h |  1 +
 softmmu/memory.c       | 24 ++++++++++++++++++++++--
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c3d417d317f0..b1883d45e817 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -265,6 +265,7 @@ struct MemoryRegionOps {
          */
         bool unaligned;
     } impl;
+    bool block_reenter;
 };
 
 typedef struct MemoryRegionClass {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index bafc311bfa1b..4cf281a81fa9 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -191,6 +191,7 @@ struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+    bool io_handler_active;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index bfedaf9c4dfc..5eb5dd465dd2 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -437,7 +437,18 @@ static MemTxResult  memory_region_read_accessor(MemoryRegion *mr,
 {
     uint64_t tmp;
 
-    tmp = mr->ops->read(mr->opaque, addr, size);
+    if (mr->ops->block_reenter) {
+        DeviceState *dev = DEVICE(mr->owner);
+        if (!dev->io_handler_active) {
+            dev->io_handler_active = true;
+            tmp = mr->ops->read(mr->opaque, addr, size);
+            dev->io_handler_active = false;
+        } else {
+            tmp = MEMTX_OK;
+        }
+    } else {
+        tmp = mr->ops->read(mr->opaque, addr, size);
+    }
     if (mr->subpage) {
         trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
     } else if (trace_event_get_state_backends(TRACE_MEMORY_REGION_OPS_READ)) {
@@ -489,7 +500,16 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
         trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size,
                                       memory_region_name(mr));
     }
-    mr->ops->write(mr->opaque, addr, tmp, size);
+    if (mr->ops->block_reenter) {
+        DeviceState *dev = DEVICE(mr->owner);
+        if (!dev->io_handler_active) {
+            dev->io_handler_active = true;
+            mr->ops->write(mr->opaque, addr, tmp, size);
+            dev->io_handler_active = false;
+        }
+    } else {
+        mr->ops->write(mr->opaque, addr, tmp, size);
+    }
     return MEMTX_OK;
 }
Li Qiang Aug. 24, 2021, 12:12 p.m. UTC | #9
Gerd Hoffmann <kraxel@redhat.com> 于2021年8月24日周二 下午8:02写道:
>
>   Hi,
>
> > I was vaguely tossing an idea around in the back of my mind
> > about whether you could have a flag on devices that marked
> > them as "this device is currently involved in IO", such that
> > you could then just fail the last DMA (or qemu_irq_set, or
> > whatever) that would complete the loop back to a device that
> > was already doing IO. But that would need a lot of thinking
> > through to figure out if it's feasible, and it's probably
> > a lot of code change.
>
> Quick & dirty hack trying the above.  Not much code, it is opt-in per
> MemoryRegion (so less overhead for devices which already handle all DMA
> in a BH), tracks state in DeviceState.  Adds a check to a rather hot
> code path though.  Not tested yet (stopped investigating when I noticed
> Philippe tries to fix the same thing with another approach).  Not
> benchmarked.
>
> Maybe it helps ...
>

Gerd's patch just remind my approach here, Just add here:

https://mail.gnu.org/archive/html/qemu-devel/2020-09/msg00906.html

But I check and record it in the device MR handler.

Thanks,
Li Qiang


> take care,
>   Gerd
>
> From 80e58a2cd2c630f0bddd9d0eaee71abb7eeb9440 Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann <kraxel@redhat.com>
> Date: Tue, 17 Aug 2021 07:35:37 +0200
> Subject: [PATCH] allow track active mmio handlers
>
> ---
>  include/exec/memory.h  |  1 +
>  include/hw/qdev-core.h |  1 +
>  softmmu/memory.c       | 24 ++++++++++++++++++++++--
>  3 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index c3d417d317f0..b1883d45e817 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -265,6 +265,7 @@ struct MemoryRegionOps {
>           */
>          bool unaligned;
>      } impl;
> +    bool block_reenter;
>  };
>
>  typedef struct MemoryRegionClass {
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index bafc311bfa1b..4cf281a81fa9 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -191,6 +191,7 @@ struct DeviceState {
>      int instance_id_alias;
>      int alias_required_for_version;
>      ResettableState reset;
> +    bool io_handler_active;
>  };
>
>  struct DeviceListener {
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index bfedaf9c4dfc..5eb5dd465dd2 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -437,7 +437,18 @@ static MemTxResult  memory_region_read_accessor(MemoryRegion *mr,
>  {
>      uint64_t tmp;
>
> -    tmp = mr->ops->read(mr->opaque, addr, size);
> +    if (mr->ops->block_reenter) {
> +        DeviceState *dev = DEVICE(mr->owner);
> +        if (!dev->io_handler_active) {
> +            dev->io_handler_active = true;
> +            tmp = mr->ops->read(mr->opaque, addr, size);
> +            dev->io_handler_active = false;
> +        } else {
> +            tmp = MEMTX_OK;
> +        }
> +    } else {
> +        tmp = mr->ops->read(mr->opaque, addr, size);
> +    }
>      if (mr->subpage) {
>          trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
>      } else if (trace_event_get_state_backends(TRACE_MEMORY_REGION_OPS_READ)) {
> @@ -489,7 +500,16 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
>          trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size,
>                                        memory_region_name(mr));
>      }
> -    mr->ops->write(mr->opaque, addr, tmp, size);
> +    if (mr->ops->block_reenter) {
> +        DeviceState *dev = DEVICE(mr->owner);
> +        if (!dev->io_handler_active) {
> +            dev->io_handler_active = true;
> +            mr->ops->write(mr->opaque, addr, tmp, size);
> +            dev->io_handler_active = false;
> +        }
> +    } else {
> +        mr->ops->write(mr->opaque, addr, tmp, size);
> +    }
>      return MEMTX_OK;
>  }
>
> --
> 2.31.1
>
Stefan Hajnoczi Aug. 24, 2021, 1:26 p.m. UTC | #10
On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote:
> On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> >
> > This series aim to kill a recent class of bug, the infamous
> > "DMA reentrancy" issues found by Alexander while fuzzing.
> >
> > Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
> >
> > - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> > - MEMTXPERM_UNRESTRICTED (allow list approach)
> > - MEMTXPERM_RAM_DEVICE (example of deny list approach)
> >
> > If a transaction permission is not allowed (for example access
> > to non-RAM device), we return the specific MEMTX_BUS_ERROR.
> >
> > Permissions are checked in after the flatview is resolved, and
> > before the access is done, in a new function: flatview_access_allowed().
> 
> So I'm not going to say 'no' to this, because we have a real
> recursive-device-handling problem and I don't have a better
> idea to hand, but the thing about this is that we end up with
> behaviour which is not what the real hardware does. I'm not
> aware of any DMA device which has this kind of "can only DMA
> to/from RAM, and aborts on access to a device" behaviour...

Points that have come up in previous discussions on this topic:

- We probably won't be able to find out the actual hardware behavior for
  all device models in QEMU. Strict RAM-only DMA restrictions can be
  merged early in the QEMU 6.2 development cycle so there's plenty of
  time to identify regressions. The benefit of a strict policy is that
  we eliminate this class of bugs for most devices now and in the
  future.

- If the risk of regressions is too high, then this API can be used on a
  case-by-case basis to fix bugs such as those identified by Alexander's
  fuzzer. We'll be plagued with this class of bugs in the future though,
  so I prefer a strict policy.

- DMA capabilities depend on the host bus adapter/controller. In order
  to faithfully emulate real hardware we need to know how it behaves.
  That needs to be done for each host bus adapter (e.g. PCI
  controllers).

- SysBus devices each have their own behavior wrt device-to-device DMA.

Stefan
Peter Xu Aug. 24, 2021, 7:34 p.m. UTC | #11
Hi, Peter, Gerd,

On Tue, Aug 24, 2021 at 02:01:53PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > I was vaguely tossing an idea around in the back of my mind
> > about whether you could have a flag on devices that marked
> > them as "this device is currently involved in IO", such that
> > you could then just fail the last DMA (or qemu_irq_set, or
> > whatever) that would complete the loop back to a device that
> > was already doing IO. But that would need a lot of thinking
> > through to figure out if it's feasible, and it's probably
> > a lot of code change.

(Thanks for the write-up, Peter; it helps a lot)

> 
> Quick & dirty hack trying the above.  Not much code, it is opt-in per
> MemoryRegion (so less overhead for devices which already handle all DMA
> in a BH), tracks state in DeviceState.  Adds a check to a rather hot
> code path though.  Not tested yet (stopped investigating when I noticed
> Philippe tries to fix the same thing with another approach).  Not
> benchmarked.
> 
> Maybe it helps ...
> 
> take care,
>   Gerd
> 
> From 80e58a2cd2c630f0bddd9d0eaee71abb7eeb9440 Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann <kraxel@redhat.com>
> Date: Tue, 17 Aug 2021 07:35:37 +0200
> Subject: [PATCH] allow track active mmio handlers
> 
> ---
>  include/exec/memory.h  |  1 +
>  include/hw/qdev-core.h |  1 +
>  softmmu/memory.c       | 24 ++++++++++++++++++++++--
>  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index c3d417d317f0..b1883d45e817 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -265,6 +265,7 @@ struct MemoryRegionOps {
>           */
>          bool unaligned;
>      } impl;
> +    bool block_reenter;
>  };
>  
>  typedef struct MemoryRegionClass {
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index bafc311bfa1b..4cf281a81fa9 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -191,6 +191,7 @@ struct DeviceState {
>      int instance_id_alias;
>      int alias_required_for_version;
>      ResettableState reset;
> +    bool io_handler_active;
>  };
>  
>  struct DeviceListener {
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index bfedaf9c4dfc..5eb5dd465dd2 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -437,7 +437,18 @@ static MemTxResult  memory_region_read_accessor(MemoryRegion *mr,
>  {
>      uint64_t tmp;
>  
> -    tmp = mr->ops->read(mr->opaque, addr, size);
> +    if (mr->ops->block_reenter) {
> +        DeviceState *dev = DEVICE(mr->owner);
> +        if (!dev->io_handler_active) {
> +            dev->io_handler_active = true;
> +            tmp = mr->ops->read(mr->opaque, addr, size);
> +            dev->io_handler_active = false;
> +        } else {
> +            tmp = MEMTX_OK;
> +        }
> +    } else {
> +        tmp = mr->ops->read(mr->opaque, addr, size);
> +    }
>      if (mr->subpage) {
>          trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
>      } else if (trace_event_get_state_backends(TRACE_MEMORY_REGION_OPS_READ)) {
> @@ -489,7 +500,16 @@ static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
>          trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, size,
>                                        memory_region_name(mr));
>      }
> -    mr->ops->write(mr->opaque, addr, tmp, size);
> +    if (mr->ops->block_reenter) {
> +        DeviceState *dev = DEVICE(mr->owner);
> +        if (!dev->io_handler_active) {
> +            dev->io_handler_active = true;
> +            mr->ops->write(mr->opaque, addr, tmp, size);
> +            dev->io_handler_active = false;
> +        }
> +    } else {
> +        mr->ops->write(mr->opaque, addr, tmp, size);
> +    }
>      return MEMTX_OK;
>  }

Can I read this as a better approach if it still allows P2P so things that
Paolo and Qiang used to mention will still work?

https://lore.kernel.org/qemu-devel/7e4fd726-07e9-dc09-d66b-5692dd51820f@redhat.com/
https://lore.kernel.org/qemu-devel/CAKXe6S+v4z_PYbZ6MMzEZk7Q0Qc+q9tzL+a8918U_-XR=aj7RA@mail.gmail.com/

Can we do that similarly for qemu_set_irq() and friends but based on IRQState?

Thanks,