[v2,06/15] ppc/vof: Fix unaligned FDT property access

Message ID	20240627-san-v2-6-750bb0946dbd@daynix.com
State	New
Headers	show Return-Path: <qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Akihiko Odaki <akihiko.odaki@daynix.com> Date: Thu, 27 Jun 2024 22:37:49 +0900 Subject: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240627-san-v2-6-750bb0946dbd@daynix.com> References: <20240627-san-v2-0-750bb0946dbd@daynix.com> In-Reply-To: <20240627-san-v2-0-750bb0946dbd@daynix.com> To: Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, =?utf-8?q?Philippe_Mathieu-?= =?utf-8?q?Daud=C3=A9?= <philmd@linaro.org>, Yanan Wang <wangyanan55@huawei.com>, John Snow <jsnow@redhat.com>, BALATON Zoltan <balaton@eik.bme.hu>, Jiaxun Yang <jiaxun.yang@flygoat.com>, Nicholas Piggin <npiggin@gmail.com>, Daniel Henrique Barboza <danielhb413@gmail.com>, David Gibson <david@gibson.dropbear.id.au>, Harsh Prateek Bora <harshpb@linux.ibm.com>, Alexey Kardashevskiy <aik@ozlabs.ru>, "Michael S. Tsirkin" <mst@redhat.com>, =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Paolo Bonzini <pbonzini@redhat.com>, David Hildenbrand <david@redhat.com>, Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com> Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, qemu-ppc@nongnu.org, Akihiko Odaki <akihiko.odaki@daynix.com> Received-SPF: none client-ip=2607:f8b0:4864:20::32b; envelope-from=akihiko.odaki@daynix.com; helo=mail-ot1-x32b.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-ppc-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Series	Fix check-qtest-ppc64 sanitizer errors \| expand [v2,00/15] Fix check-qtest-ppc64 sanitizer errors [v2,01/15] cpu: Free cpu_ases [v2,02/15] hw/ide: Convert macio ide_irq into GPIO line [v2,03/15] hw/ide: Remove internal DMA qemu_irq [v2,04/15] hw/isa/vt82c686: Define a GPIO line between vt82c686 and i8259 [v2,05/15] spapr: Free stdout path [v2,06/15] ppc/vof: Fix unaligned FDT property access [v2,07/15] hw/virtio: Free vqs after vhost_dev_cleanup() [v2,08/15] migration: Free removed SaveStateEntry [v2,09/15] memory: Do not create circular reference with subregion [v2,10/15] tests/qtest: Use qtest_add_data_func_full() [v2,11/15] tests/qtest: Free unused QMP response [v2,12/15] tests/qtest: Free old machine variable name [v2,13/15] tests/qtest: Delete previous boot file [v2,14/15] tests/qtest: Free paths [v2,15/15] tests/qtest: Free GThread

Akihiko Odaki June 27, 2024, 1:37 p.m. UTC

FDT properties are aligned by 4 bytes, not 8 bytes.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/ppc/vof.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Peter Maydell June 28, 2024, 3:20 p.m. UTC | #1

On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>
> FDT properties are aligned by 4 bytes, not 8 bytes.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  hw/ppc/vof.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> index e3b430a81f4f..b5b6514d79fc 100644
> --- a/hw/ppc/vof.c
> +++ b/hw/ppc/vof.c
> @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
>      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
>      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
>      if (sc == 2) {
> -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
>      } else {
>          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
>      }

I did wonder if there was a better way to do what this is doing,
but neither we (in system/device_tree.c) nor libfdt seem to
provide one.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

David Gibson June 29, 2024, 3:16 a.m. UTC | #2

On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >
> > FDT properties are aligned by 4 bytes, not 8 bytes.
> >
> > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > ---
> >  hw/ppc/vof.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > index e3b430a81f4f..b5b6514d79fc 100644
> > --- a/hw/ppc/vof.c
> > +++ b/hw/ppc/vof.c
> > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> >      if (sc == 2) {
> > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> >      } else {
> >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> >      }
> 
> I did wonder if there was a better way to do what this is doing,
> but neither we (in system/device_tree.c) nor libfdt seem to
> provide one.

libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
not an automatic aligned-or-unaligned helper.   Maybe we should add that?

Nicholas Piggin July 4, 2024, 11:54 a.m. UTC | #3

On Sat Jun 29, 2024 at 1:16 PM AEST, David Gibson wrote:
> On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > >
> > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > >
> > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > ---
> > >  hw/ppc/vof.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > index e3b430a81f4f..b5b6514d79fc 100644
> > > --- a/hw/ppc/vof.c
> > > +++ b/hw/ppc/vof.c
> > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > >      if (sc == 2) {
> > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > >      } else {
> > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > >      }
> > 
> > I did wonder if there was a better way to do what this is doing,
> > but neither we (in system/device_tree.c) nor libfdt seem to
> > provide one.
>
> libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> not an automatic aligned-or-unaligned helper.   Maybe we should add that?

Runtime test if the pointer is aligned?

What about just fdt_prop32_ld() and fdt_prop64_ld() where you know it's
4 byte aligned. Then just do 2 x 4 byte loads for the 64-bit, I don't
think performance would matter so much to try get a single load.

Thanks,
Nick

Peter Maydell July 4, 2024, 12:15 p.m. UTC | #4

On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
>
> On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > >
> > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > >
> > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > ---
> > >  hw/ppc/vof.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > index e3b430a81f4f..b5b6514d79fc 100644
> > > --- a/hw/ppc/vof.c
> > > +++ b/hw/ppc/vof.c
> > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > >      if (sc == 2) {
> > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > >      } else {
> > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > >      }
> >
> > I did wonder if there was a better way to do what this is doing,
> > but neither we (in system/device_tree.c) nor libfdt seem to
> > provide one.
>
> libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> not an automatic aligned-or-unaligned helper.   Maybe we should add that?

fdt32_ld() and friends only do the "load from this bit of memory"
part, which we already have QEMU utility functions for (and which
are this patch uses).

This particular bit of code is dealing with an fdt property ("memory")
that is an array of (address, size) tuples where address and size
can independently be either 32 or 64 bits, and it wants the
size value of tuple 0. So the missing functionality is something at
a higher level than fdt32_ld() which would let you say "give me
tuple N field X" with some way to specify the tuple layout. (Which
is an awkward kind of API to write in C.)

Slightly less general, but for this case we could perhaps have
something like the getprop equivalent of qemu_fdt_setprop_sized_cells():

  uint64_t value_array[2];
  qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
                               ac, sc);
  /*
   * fills in value_array[0] with address, value_array[1] with size,
   * probably barfs if the varargs-list of cell-sizes doesn't
   * cover the whole property, similar to the current assert on
   * proplen.
   */
  mem0_end = value_array[0];

thanks
-- PMM

Nicholas Piggin July 5, 2024, 1:18 a.m. UTC | #5

On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > >
> > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > >
> > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > ---
> > > >  hw/ppc/vof.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > --- a/hw/ppc/vof.c
> > > > +++ b/hw/ppc/vof.c
> > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > >      if (sc == 2) {
> > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > >      } else {
> > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > >      }
> > >
> > > I did wonder if there was a better way to do what this is doing,
> > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > provide one.
> >
> > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
>
> fdt32_ld() and friends only do the "load from this bit of memory"
> part, which we already have QEMU utility functions for (and which
> are this patch uses).
>
> This particular bit of code is dealing with an fdt property ("memory")
> that is an array of (address, size) tuples where address and size
> can independently be either 32 or 64 bits, and it wants the
> size value of tuple 0. So the missing functionality is something at
> a higher level than fdt32_ld() which would let you say "give me
> tuple N field X" with some way to specify the tuple layout. (Which
> is an awkward kind of API to write in C.)
>
> Slightly less general, but for this case we could perhaps have
> something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
>
>   uint64_t value_array[2];
>   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
>                                ac, sc);
>   /*
>    * fills in value_array[0] with address, value_array[1] with size,
>    * probably barfs if the varargs-list of cell-sizes doesn't
>    * cover the whole property, similar to the current assert on
>    * proplen.
>    */
>   mem0_end = value_array[0];

Since 4/8 byte cells are most common and size is probably
normally known, what about something simpler to start with?

Thanks,
Nick

---
diff --git a/libfdt/libfdt.h b/libfdt/libfdt.h
index 0677fea..c4b6355 100644
--- a/libfdt/libfdt.h
+++ b/libfdt/libfdt.h
@@ -148,6 +148,15 @@ static inline uint32_t fdt32_ld(const fdt32_t *p)
 		| bp[3];
 }
 
+/*
+ * Load the value from a 32-bit cell of a property. Cells are 32-bit aligned
+ * so can use a single load.
+ */
+static inline uint32_t fdt32_ld_prop(const fdt32_t *p)
+{
+	return fdt32_to_cpu(*p);
+}
+
 static inline void fdt32_st(void *property, uint32_t value)
 {
 	uint8_t *bp = (uint8_t *)property;
@@ -172,6 +181,18 @@ static inline uint64_t fdt64_ld(const fdt64_t *p)
 		| bp[7];
 }
 
+/*
+ * Load the value from a 64-bit cell of a property. Cells are 32-bit aligned
+ * so can use two loads.
+ */
+static inline uint64_t fdt64_ld_prop(const fdt64_t *p)
+{
+	const fdt64_t *_p = p;
+
+	return ((uint64_t)fdt32_to_cpu(_p[0]) << 32)
+		| fdt32_to_cpu(_p[1]);
+}
+
 static inline void fdt64_st(void *property, uint64_t value)
 {
 	uint8_t *bp = (uint8_t *)property;

David Gibson July 5, 2024, 1:33 a.m. UTC | #6

On Thu, Jul 04, 2024 at 01:15:57PM +0100, Peter Maydell wrote:
> On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > >
> > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > >
> > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > ---
> > > >  hw/ppc/vof.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > --- a/hw/ppc/vof.c
> > > > +++ b/hw/ppc/vof.c
> > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > >      if (sc == 2) {
> > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > >      } else {
> > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > >      }
> > >
> > > I did wonder if there was a better way to do what this is doing,
> > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > provide one.
> >
> > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> 
> fdt32_ld() and friends only do the "load from this bit of memory"
> part, which we already have QEMU utility functions for (and which
> are this patch uses).
> 
> This particular bit of code is dealing with an fdt property ("memory")
> that is an array of (address, size) tuples where address and size
> can independently be either 32 or 64 bits, and it wants the
> size value of tuple 0. So the missing functionality is something at
> a higher level than fdt32_ld() which would let you say "give me
> tuple N field X" with some way to specify the tuple layout. (Which
> is an awkward kind of API to write in C.)

Ah, right.  Yeah.. that's a pretty awkward API in C.

> Slightly less general, but for this case we could perhaps have
> something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> 
>   uint64_t value_array[2];
>   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
>                                ac, sc);
>   /*
>    * fills in value_array[0] with address, value_array[1] with size,
>    * probably barfs if the varargs-list of cell-sizes doesn't
>    * cover the whole property, similar to the current assert on
>    * proplen.
>    */
>   mem0_end = value_array[0];

Seems reasonable to me.  The only other thought I had was something
like Python's struct.unpack() [0].  But your suggestion is probably
more natural in C.

[0] https://docs.python.org/3/library/struct.html#struct.unpack

David Gibson July 5, 2024, 1:41 a.m. UTC | #7

On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> > >
> > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > > >
> > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > >
> > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > ---
> > > > >  hw/ppc/vof.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > --- a/hw/ppc/vof.c
> > > > > +++ b/hw/ppc/vof.c
> > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > >      if (sc == 2) {
> > > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > >      } else {
> > > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > >      }
> > > >
> > > > I did wonder if there was a better way to do what this is doing,
> > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > provide one.
> > >
> > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> >
> > fdt32_ld() and friends only do the "load from this bit of memory"
> > part, which we already have QEMU utility functions for (and which
> > are this patch uses).
> >
> > This particular bit of code is dealing with an fdt property ("memory")
> > that is an array of (address, size) tuples where address and size
> > can independently be either 32 or 64 bits, and it wants the
> > size value of tuple 0. So the missing functionality is something at
> > a higher level than fdt32_ld() which would let you say "give me
> > tuple N field X" with some way to specify the tuple layout. (Which
> > is an awkward kind of API to write in C.)
> >
> > Slightly less general, but for this case we could perhaps have
> > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> >
> >   uint64_t value_array[2];
> >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> >                                ac, sc);
> >   /*
> >    * fills in value_array[0] with address, value_array[1] with size,
> >    * probably barfs if the varargs-list of cell-sizes doesn't
> >    * cover the whole property, similar to the current assert on
> >    * proplen.
> >    */
> >   mem0_end = value_array[0];
> 
> Since 4/8 byte cells are most common and size is probably
> normally known, what about something simpler to start with?

Hrm, I don't think this helps much.  As Peter points out the actual
load isn't really the issue, it's locating the right spot for it.

> 
> Thanks,
> Nick
> 
> ---
> diff --git a/libfdt/libfdt.h b/libfdt/libfdt.h
> index 0677fea..c4b6355 100644
> --- a/libfdt/libfdt.h
> +++ b/libfdt/libfdt.h
> @@ -148,6 +148,15 @@ static inline uint32_t fdt32_ld(const fdt32_t *p)
>  		| bp[3];
>  }
>  
> +/*
> + * Load the value from a 32-bit cell of a property. Cells are 32-bit aligned
> + * so can use a single load.
> + */
> +static inline uint32_t fdt32_ld_prop(const fdt32_t *p)
> +{
> +	return fdt32_to_cpu(*p);
> +}
> +
>  static inline void fdt32_st(void *property, uint32_t value)
>  {
>  	uint8_t *bp = (uint8_t *)property;
> @@ -172,6 +181,18 @@ static inline uint64_t fdt64_ld(const fdt64_t *p)
>  		| bp[7];
>  }
>  
> +/*
> + * Load the value from a 64-bit cell of a property. Cells are 32-bit aligned
> + * so can use two loads.
> + */
> +static inline uint64_t fdt64_ld_prop(const fdt64_t *p)
> +{
> +	const fdt64_t *_p = p;
> +
> +	return ((uint64_t)fdt32_to_cpu(_p[0]) << 32)
> +		| fdt32_to_cpu(_p[1]);
> +}
> +
>  static inline void fdt64_st(void *property, uint64_t value)
>  {
>  	uint8_t *bp = (uint8_t *)property;
>

Nicholas Piggin July 5, 2024, 4:40 a.m. UTC | #8

On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >
> > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > > > >
> > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > >
> > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > ---
> > > > > >  hw/ppc/vof.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > --- a/hw/ppc/vof.c
> > > > > > +++ b/hw/ppc/vof.c
> > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > > >      if (sc == 2) {
> > > > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > >      } else {
> > > > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > >      }
> > > > >
> > > > > I did wonder if there was a better way to do what this is doing,
> > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > provide one.
> > > >
> > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> > >
> > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > part, which we already have QEMU utility functions for (and which
> > > are this patch uses).
> > >
> > > This particular bit of code is dealing with an fdt property ("memory")
> > > that is an array of (address, size) tuples where address and size
> > > can independently be either 32 or 64 bits, and it wants the
> > > size value of tuple 0. So the missing functionality is something at
> > > a higher level than fdt32_ld() which would let you say "give me
> > > tuple N field X" with some way to specify the tuple layout. (Which
> > > is an awkward kind of API to write in C.)
> > >
> > > Slightly less general, but for this case we could perhaps have
> > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > >
> > >   uint64_t value_array[2];
> > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> > >                                ac, sc);
> > >   /*
> > >    * fills in value_array[0] with address, value_array[1] with size,
> > >    * probably barfs if the varargs-list of cell-sizes doesn't
> > >    * cover the whole property, similar to the current assert on
> > >    * proplen.
> > >    */
> > >   mem0_end = value_array[0];
> > 
> > Since 4/8 byte cells are most common and size is probably
> > normally known, what about something simpler to start with?
>
> Hrm, I don't think this helps much.  As Peter points out the actual
> load isn't really the issue, it's locating the right spot for it.

I don't really see why that's a problem, it's just a pointer
addition - base + fdt_address_cells * 4. The problem was in
the memory access (yes it's fixed with the patch but you could
add a general libfdt way to do it).

Some fancy function like above could be used, But is it really
worth implementing such a thing for this?

Thanks,
Nick

David Gibson July 5, 2024, 5:12 a.m. UTC | #9

On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >
> > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > > > > >
> > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > >
> > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > > ---
> > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > --- a/hw/ppc/vof.c
> > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > > > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > > > >      if (sc == 2) {
> > > > > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > > >      } else {
> > > > > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > >      }
> > > > > >
> > > > > > I did wonder if there was a better way to do what this is doing,
> > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > provide one.
> > > > >
> > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> > > >
> > > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > > part, which we already have QEMU utility functions for (and which
> > > > are this patch uses).
> > > >
> > > > This particular bit of code is dealing with an fdt property ("memory")
> > > > that is an array of (address, size) tuples where address and size
> > > > can independently be either 32 or 64 bits, and it wants the
> > > > size value of tuple 0. So the missing functionality is something at
> > > > a higher level than fdt32_ld() which would let you say "give me
> > > > tuple N field X" with some way to specify the tuple layout. (Which
> > > > is an awkward kind of API to write in C.)
> > > >
> > > > Slightly less general, but for this case we could perhaps have
> > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > > >
> > > >   uint64_t value_array[2];
> > > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> > > >                                ac, sc);
> > > >   /*
> > > >    * fills in value_array[0] with address, value_array[1] with size,
> > > >    * probably barfs if the varargs-list of cell-sizes doesn't
> > > >    * cover the whole property, similar to the current assert on
> > > >    * proplen.
> > > >    */
> > > >   mem0_end = value_array[0];
> > > 
> > > Since 4/8 byte cells are most common and size is probably
> > > normally known, what about something simpler to start with?
> >
> > Hrm, I don't think this helps much.  As Peter points out the actual
> > load isn't really the issue, it's locating the right spot for it.
> 
> I don't really see why that's a problem, it's just a pointer
> addition - base + fdt_address_cells * 4. The problem was in

This is harder if #address-cells and #size-cells are different, or if
you're parsing ranges and #address-cells is different between parent
and child node.

> the memory access (yes it's fixed with the patch but you could
> add a general libfdt way to do it).

Huh.. well I'm getting different impressions of what the problem
actually is from what I initially read versus Peter Maydell's
comments, so I don't really know what to think.

If it's just the load then fdt32_ld() etc. already exist.  Or is it
really such a hot path that unconditionally handling unaligned
accesses isn't tenable?

> Some fancy function like above could be used, But is it really
> worth implementing such a thing for this?
> 
> Thanks,
> Nick
>

Nicholas Piggin July 5, 2024, 7:50 a.m. UTC | #10

On Fri Jul 5, 2024 at 3:12 PM AEST, David Gibson wrote:
> On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >
> > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > > > > > >
> > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > > >
> > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > > > ---
> > > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > > --- a/hw/ppc/vof.c
> > > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > > > > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > > > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > > > > >      if (sc == 2) {
> > > > > > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > > > >      } else {
> > > > > > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > >      }
> > > > > > >
> > > > > > > I did wonder if there was a better way to do what this is doing,
> > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > > provide one.
> > > > > >
> > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> > > > >
> > > > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > > > part, which we already have QEMU utility functions for (and which
> > > > > are this patch uses).
> > > > >
> > > > > This particular bit of code is dealing with an fdt property ("memory")
> > > > > that is an array of (address, size) tuples where address and size
> > > > > can independently be either 32 or 64 bits, and it wants the
> > > > > size value of tuple 0. So the missing functionality is something at
> > > > > a higher level than fdt32_ld() which would let you say "give me
> > > > > tuple N field X" with some way to specify the tuple layout. (Which
> > > > > is an awkward kind of API to write in C.)
> > > > >
> > > > > Slightly less general, but for this case we could perhaps have
> > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > > > >
> > > > >   uint64_t value_array[2];
> > > > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> > > > >                                ac, sc);
> > > > >   /*
> > > > >    * fills in value_array[0] with address, value_array[1] with size,
> > > > >    * probably barfs if the varargs-list of cell-sizes doesn't
> > > > >    * cover the whole property, similar to the current assert on
> > > > >    * proplen.
> > > > >    */
> > > > >   mem0_end = value_array[0];
> > > > 
> > > > Since 4/8 byte cells are most common and size is probably
> > > > normally known, what about something simpler to start with?
> > >
> > > Hrm, I don't think this helps much.  As Peter points out the actual
> > > load isn't really the issue, it's locating the right spot for it.
> > 
> > I don't really see why that's a problem, it's just a pointer
> > addition - base + fdt_address_cells * 4. The problem was in
>
> This is harder if #address-cells and #size-cells are different, or if
> you're parsing ranges and #address-cells is different between parent
> and child node.
>
> > the memory access (yes it's fixed with the patch but you could
> > add a general libfdt way to do it).
>
> Huh.. well I'm getting different impressions of what the problem
> actually is from what I initially read versus Peter Maydell's
> comments, so I don't really know what to think.

If I'm not mistaken, the sanitizer caught an unaligned 64-bit
load which is the bug.

The tuple address calculation itself I think is not buggy. I suppose
Peter was thinking of an accessor that takes care of addressing and
alignment. I don't think we're at the point it warrants it here, but
could be convinced (maybe a bunch of other code would use it).

I think the API is a little dangerous for overflows though, hard to
static check. sscanf() style could be checked by the compiler but
seems overkill to implement.

> If it's just the load then fdt32_ld() etc. already exist.  Or is it
> really such a hot path that unconditionally handling unaligned
> accesses isn't tenable?

Yeah that's true, hardly any point to adding the faster variant.

It could just be fixed like this then? The original patch is a
fix too, but I do prefer using the same style for both, and
I think using the fdt accessor is nicer to read.

Thanks,
Nick

---

diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
index e3b430a81f..a666a133d7 100644
--- a/hw/ppc/vof.c
+++ b/hw/ppc/vof.c
@@ -646,9 +646,9 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
     mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
     g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
     if (sc == 2) {
-        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
+        mem0_end = fdt64_ld((fdt64_t *)(mem0_reg + sizeof(uint32_t) * ac));
     } else {
-        mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
+        mem0_end = fdt32_ld((fdt32_t *)(mem0_reg + sizeof(uint32_t) * ac));
     }

     g_array_sort(claimed, of_claimed_compare_func);

Akihiko Odaki July 6, 2024, 9:07 a.m. UTC | #11

On 2024/07/05 16:50, Nicholas Piggin wrote:
> On Fri Jul 5, 2024 at 3:12 PM AEST, David Gibson wrote:
>> On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
>>> On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
>>>> On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
>>>>> On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
>>>>>> On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
>>>>>>>
>>>>>>> On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
>>>>>>>> On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>>>>>
>>>>>>>>> FDT properties are aligned by 4 bytes, not 8 bytes.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>>>>>>> ---
>>>>>>>>>   hw/ppc/vof.c | 2 +-
>>>>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
>>>>>>>>> index e3b430a81f4f..b5b6514d79fc 100644
>>>>>>>>> --- a/hw/ppc/vof.c
>>>>>>>>> +++ b/hw/ppc/vof.c
>>>>>>>>> @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
>>>>>>>>>       mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
>>>>>>>>>       g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
>>>>>>>>>       if (sc == 2) {
>>>>>>>>> -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
>>>>>>>>> +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
>>>>>>>>>       } else {
>>>>>>>>>           mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
>>>>>>>>>       }
>>>>>>>>
>>>>>>>> I did wonder if there was a better way to do what this is doing,
>>>>>>>> but neither we (in system/device_tree.c) nor libfdt seem to
>>>>>>>> provide one.
>>>>>>>
>>>>>>> libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
>>>>>>> not an automatic aligned-or-unaligned helper.   Maybe we should add that?
>>>>>>
>>>>>> fdt32_ld() and friends only do the "load from this bit of memory"
>>>>>> part, which we already have QEMU utility functions for (and which
>>>>>> are this patch uses).
>>>>>>
>>>>>> This particular bit of code is dealing with an fdt property ("memory")
>>>>>> that is an array of (address, size) tuples where address and size
>>>>>> can independently be either 32 or 64 bits, and it wants the
>>>>>> size value of tuple 0. So the missing functionality is something at
>>>>>> a higher level than fdt32_ld() which would let you say "give me
>>>>>> tuple N field X" with some way to specify the tuple layout. (Which
>>>>>> is an awkward kind of API to write in C.)
>>>>>>
>>>>>> Slightly less general, but for this case we could perhaps have
>>>>>> something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
>>>>>>
>>>>>>    uint64_t value_array[2];
>>>>>>    qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
>>>>>>                                 ac, sc);
>>>>>>    /*
>>>>>>     * fills in value_array[0] with address, value_array[1] with size,
>>>>>>     * probably barfs if the varargs-list of cell-sizes doesn't
>>>>>>     * cover the whole property, similar to the current assert on
>>>>>>     * proplen.
>>>>>>     */
>>>>>>    mem0_end = value_array[0];
>>>>>
>>>>> Since 4/8 byte cells are most common and size is probably
>>>>> normally known, what about something simpler to start with?
>>>>
>>>> Hrm, I don't think this helps much.  As Peter points out the actual
>>>> load isn't really the issue, it's locating the right spot for it.
>>>
>>> I don't really see why that's a problem, it's just a pointer
>>> addition - base + fdt_address_cells * 4. The problem was in
>>
>> This is harder if #address-cells and #size-cells are different, or if
>> you're parsing ranges and #address-cells is different between parent
>> and child node.
>>
>>> the memory access (yes it's fixed with the patch but you could
>>> add a general libfdt way to do it).
>>
>> Huh.. well I'm getting different impressions of what the problem
>> actually is from what I initially read versus Peter Maydell's
>> comments, so I don't really know what to think.
> 
> If I'm not mistaken, the sanitizer caught an unaligned 64-bit
> load which is the bug.
> 
> The tuple address calculation itself I think is not buggy. I suppose
> Peter was thinking of an accessor that takes care of addressing and
> alignment. I don't think we're at the point it warrants it here, but
> could be convinced (maybe a bunch of other code would use it).
> 
> I think the API is a little dangerous for overflows though, hard to
> static check. sscanf() style could be checked by the compiler but
> seems overkill to implement.
> 
>> If it's just the load then fdt32_ld() etc. already exist.  Or is it
>> really such a hot path that unconditionally handling unaligned
>> accesses isn't tenable?
> 
> Yeah that's true, hardly any point to adding the faster variant.
> 
> It could just be fixed like this then? The original patch is a
> fix too, but I do prefer using the same style for both, and
> I think using the fdt accessor is nicer to read.
> 
> Thanks,
> Nick
> 
> ---
> 
> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> index e3b430a81f..a666a133d7 100644
> --- a/hw/ppc/vof.c
> +++ b/hw/ppc/vof.c
> @@ -646,9 +646,9 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
>       mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
>       g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
>       if (sc == 2) {
> -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> +        mem0_end = fdt64_ld((fdt64_t *)(mem0_reg + sizeof(uint32_t) * ac));

I don't like the extra cast to fdt64_t. Strictly speaking, casting into 
uint64_t * is undefined in the standard and the compiler is free to 
optimize it as an aligned access after the cast. clang and gcc does not 
perform such an optimization, but it is better to avoid such a construct 
if possible.* It is unfortunate that libfdt requires it.

Nevertheless, I won't object to use fdt64_ld() and fdt32_ld(). That's 
what the upstream provides anyway.

Regards,
Akihiko Odaki

* By the way, I had a related discussion with sanitizer developers:
   https://github.com/llvm/llvm-project/issues/83710

Peter Maydell July 6, 2024, 10:37 a.m. UTC | #12

On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote:
>
> On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > >
> > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > > > > > >
> > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > > >
> > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > > > ---
> > > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > > --- a/hw/ppc/vof.c
> > > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > > > > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > > > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > > > > >      if (sc == 2) {
> > > > > > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > > > >      } else {
> > > > > > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > >      }
> > > > > > >
> > > > > > > I did wonder if there was a better way to do what this is doing,
> > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > > provide one.
> > > > > >
> > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> > > > >
> > > > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > > > part, which we already have QEMU utility functions for (and which
> > > > > are this patch uses).
> > > > >
> > > > > This particular bit of code is dealing with an fdt property ("memory")
> > > > > that is an array of (address, size) tuples where address and size
> > > > > can independently be either 32 or 64 bits, and it wants the
> > > > > size value of tuple 0. So the missing functionality is something at
> > > > > a higher level than fdt32_ld() which would let you say "give me
> > > > > tuple N field X" with some way to specify the tuple layout. (Which
> > > > > is an awkward kind of API to write in C.)
> > > > >
> > > > > Slightly less general, but for this case we could perhaps have
> > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > > > >
> > > > >   uint64_t value_array[2];
> > > > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> > > > >                                ac, sc);
> > > > >   /*
> > > > >    * fills in value_array[0] with address, value_array[1] with size,
> > > > >    * probably barfs if the varargs-list of cell-sizes doesn't
> > > > >    * cover the whole property, similar to the current assert on
> > > > >    * proplen.
> > > > >    */
> > > > >   mem0_end = value_array[0];
> > > >
> > > > Since 4/8 byte cells are most common and size is probably
> > > > normally known, what about something simpler to start with?
> > >
> > > Hrm, I don't think this helps much.  As Peter points out the actual
> > > load isn't really the issue, it's locating the right spot for it.
> >
> > I don't really see why that's a problem, it's just a pointer
> > addition - base + fdt_address_cells * 4. The problem was in
>
> This is harder if #address-cells and #size-cells are different, or if
> you're parsing ranges and #address-cells is different between parent
> and child node.
>
> > the memory access (yes it's fixed with the patch but you could
> > add a general libfdt way to do it).
>
> Huh.. well I'm getting different impressions of what the problem
> actually is from what I initially read versus Peter Maydell's
> comments, so I don't really know what to think.
>
> If it's just the load then fdt32_ld() etc. already exist.  Or is it
> really such a hot path that unconditionally handling unaligned
> accesses isn't tenable?

The specific problem here is that the code as written tries to
cast a not-aligned-enough pointer to uint64_t* to do the load,
which is UB. The patch submitted fixes that, and personally I
think it would be entirely fine to say that's all we need to do here.

*If* we want to look at the broader question of "why is this
code that's reading something out of an fdt having to do the
pretty low-level action of getting the start address of the
fdt property and then doing pointer arithmetic and then fishing
a value out of it as a 64-bit unaligned load?" then we get into
"do we want to provide a helper function that lets the caller
say 'give me element X from tuple Y'?". But that seems like a
lot of effort for what's basically a single callsite we would
be tidying up...

thanks
-- PMM

David Gibson July 6, 2024, 11:46 p.m. UTC | #13

On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote:
> On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > >
> > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > > > > > > >
> > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > > > > ---
> > > > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > > > --- a/hw/ppc/vof.c
> > > > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > > > > > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > > > > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > > > > > >      if (sc == 2) {
> > > > > > > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > > > > >      } else {
> > > > > > > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > > >      }
> > > > > > > >
> > > > > > > > I did wonder if there was a better way to do what this is doing,
> > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > > > provide one.
> > > > > > >
> > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > > > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> > > > > >
> > > > > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > > > > part, which we already have QEMU utility functions for (and which
> > > > > > are this patch uses).
> > > > > >
> > > > > > This particular bit of code is dealing with an fdt property ("memory")
> > > > > > that is an array of (address, size) tuples where address and size
> > > > > > can independently be either 32 or 64 bits, and it wants the
> > > > > > size value of tuple 0. So the missing functionality is something at
> > > > > > a higher level than fdt32_ld() which would let you say "give me
> > > > > > tuple N field X" with some way to specify the tuple layout. (Which
> > > > > > is an awkward kind of API to write in C.)
> > > > > >
> > > > > > Slightly less general, but for this case we could perhaps have
> > > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > > > > >
> > > > > >   uint64_t value_array[2];
> > > > > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> > > > > >                                ac, sc);
> > > > > >   /*
> > > > > >    * fills in value_array[0] with address, value_array[1] with size,
> > > > > >    * probably barfs if the varargs-list of cell-sizes doesn't
> > > > > >    * cover the whole property, similar to the current assert on
> > > > > >    * proplen.
> > > > > >    */
> > > > > >   mem0_end = value_array[0];
> > > > >
> > > > > Since 4/8 byte cells are most common and size is probably
> > > > > normally known, what about something simpler to start with?
> > > >
> > > > Hrm, I don't think this helps much.  As Peter points out the actual
> > > > load isn't really the issue, it's locating the right spot for it.
> > >
> > > I don't really see why that's a problem, it's just a pointer
> > > addition - base + fdt_address_cells * 4. The problem was in
> >
> > This is harder if #address-cells and #size-cells are different, or if
> > you're parsing ranges and #address-cells is different between parent
> > and child node.
> >
> > > the memory access (yes it's fixed with the patch but you could
> > > add a general libfdt way to do it).
> >
> > Huh.. well I'm getting different impressions of what the problem
> > actually is from what I initially read versus Peter Maydell's
> > comments, so I don't really know what to think.
> >
> > If it's just the load then fdt32_ld() etc. already exist.  Or is it
> > really such a hot path that unconditionally handling unaligned
> > accesses isn't tenable?
> 
> The specific problem here is that the code as written tries to
> cast a not-aligned-enough pointer to uint64_t* to do the load,
> which is UB.

Ah... and I'm assuming it's the cast itself which triggers the UB, not
just dereferencing it.  Which makes the interface of fdt32_ld()
etc. unusable for their intended purpose.  Well.. damn.  Now... how do
I fix it without breaking compatibility any more than I have to.

> The patch submitted fixes that, and personally I
> think it would be entirely fine to say that's all we need to do here.
> 
> *If* we want to look at the broader question of "why is this
> code that's reading something out of an fdt having to do the
> pretty low-level action of getting the start address of the
> fdt property and then doing pointer arithmetic and then fishing
> a value out of it as a 64-bit unaligned load?" then we get into
> "do we want to provide a helper function that lets the caller
> say 'give me element X from tuple Y'?". But that seems like a
> lot of effort for what's basically a single callsite we would
> be tidying up...

I tend to agree.

Nicholas Piggin July 8, 2024, 7:49 a.m. UTC | #14

On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote:
> On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote:
> > On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote:
> > >
> > > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> > > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > >
> > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > > > > > > > >
> > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > > > > > ---
> > > > > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > > > > --- a/hw/ppc/vof.c
> > > > > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > > > > > > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > > > > > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > > > > > > >      if (sc == 2) {
> > > > > > > > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > > > > > >      } else {
> > > > > > > > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > > > >      }
> > > > > > > > >
> > > > > > > > > I did wonder if there was a better way to do what this is doing,
> > > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > > > > provide one.
> > > > > > > >
> > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > > > > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> > > > > > >
> > > > > > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > > > > > part, which we already have QEMU utility functions for (and which
> > > > > > > are this patch uses).
> > > > > > >
> > > > > > > This particular bit of code is dealing with an fdt property ("memory")
> > > > > > > that is an array of (address, size) tuples where address and size
> > > > > > > can independently be either 32 or 64 bits, and it wants the
> > > > > > > size value of tuple 0. So the missing functionality is something at
> > > > > > > a higher level than fdt32_ld() which would let you say "give me
> > > > > > > tuple N field X" with some way to specify the tuple layout. (Which
> > > > > > > is an awkward kind of API to write in C.)
> > > > > > >
> > > > > > > Slightly less general, but for this case we could perhaps have
> > > > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > > > > > >
> > > > > > >   uint64_t value_array[2];
> > > > > > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> > > > > > >                                ac, sc);
> > > > > > >   /*
> > > > > > >    * fills in value_array[0] with address, value_array[1] with size,
> > > > > > >    * probably barfs if the varargs-list of cell-sizes doesn't
> > > > > > >    * cover the whole property, similar to the current assert on
> > > > > > >    * proplen.
> > > > > > >    */
> > > > > > >   mem0_end = value_array[0];
> > > > > >
> > > > > > Since 4/8 byte cells are most common and size is probably
> > > > > > normally known, what about something simpler to start with?
> > > > >
> > > > > Hrm, I don't think this helps much.  As Peter points out the actual
> > > > > load isn't really the issue, it's locating the right spot for it.
> > > >
> > > > I don't really see why that's a problem, it's just a pointer
> > > > addition - base + fdt_address_cells * 4. The problem was in
> > >
> > > This is harder if #address-cells and #size-cells are different, or if
> > > you're parsing ranges and #address-cells is different between parent
> > > and child node.
> > >
> > > > the memory access (yes it's fixed with the patch but you could
> > > > add a general libfdt way to do it).
> > >
> > > Huh.. well I'm getting different impressions of what the problem
> > > actually is from what I initially read versus Peter Maydell's
> > > comments, so I don't really know what to think.
> > >
> > > If it's just the load then fdt32_ld() etc. already exist.  Or is it
> > > really such a hot path that unconditionally handling unaligned
> > > accesses isn't tenable?
> > 
> > The specific problem here is that the code as written tries to
> > cast a not-aligned-enough pointer to uint64_t* to do the load,
> > which is UB.
>
> Ah... and I'm assuming it's the cast itself which triggers the UB, not
> just dereferencing it.

Oh it's just the cast itself that is UB? Looks like that's true.
Interesting gcc and clang don't flag it, I guess they care about
warning on practical breakage first.

> Which makes the interface of fdt32_ld()
> etc. unusable for their intended purpose.  Well.. damn.  Now... how do
> I fix it without breaking compatibility any more than I have to.

Why not just make them take a void * ptr? I don't think that would
break anything but existing code that was forced to add the cast
may be at risk of the UB.

Could also add fdt64_unaligned_t types with aligned(1) attribute
for new code. Those can just be dereferenced directly and the
caller the compiler can choose the appropriate access supported by
the host. (Actually gcc can recognise that load unaligned and
byteswap pattern and do the same anyway, but clang at least can
not yet).

Thanks,
Nick

Peter Maydell July 8, 2024, 3:59 p.m. UTC | #15

On Mon, 8 Jul 2024 at 08:49, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote:
> > On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote:
> > > On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > Huh.. well I'm getting different impressions of what the problem
> > > > actually is from what I initially read versus Peter Maydell's
> > > > comments, so I don't really know what to think.
> > > >
> > > > If it's just the load then fdt32_ld() etc. already exist.  Or is it
> > > > really such a hot path that unconditionally handling unaligned
> > > > accesses isn't tenable?
> > >
> > > The specific problem here is that the code as written tries to
> > > cast a not-aligned-enough pointer to uint64_t* to do the load,
> > > which is UB.
> >
> > Ah... and I'm assuming it's the cast itself which triggers the UB, not
> > just dereferencing it.
>
> Oh it's just the cast itself that is UB? Looks like that's true.
> Interesting gcc and clang don't flag it, I guess they care about
> warning on practical breakage first.

Er, I was speaking a bit vaguely there, don't take my word for
it without going and looking at the text of the C standard.

What I *meant* was that the practical problem here is that we
really do dereference a pointer for a 64-bit load when the
pointer isn't necessarily 64-bit-aligned.

As it happens, C99 says that it is the cast that is UB:
section 6.3.2.3 para 7 says:
 "A pointer to an object or incomplete type may be converted to
  a pointer to a different object or incomplete type. If the
  resulting pointer is not correctly aligned for the pointed-to
  type, the behavior is undefined. Otherwise, when converted back
  again, the result shall compare equal to the original pointer."

Presumably this is envisaging the possibility of a pointer cast
being a destructive operation somehow, such that e.g. a uint64_t*
can only represent 64-bit-aligned values. But I bet QEMU does
a lot of casting pointers around that might fall foul of this
rule, so I'm not particularly worried about trying to clean up
that kind of thing (until/unless analysers start warning about
it, in which case we have a specific set of things to clean up).

What I care about from the point of view of this patch
is that we fix the actually-broken-on-some-real-hardware problem
of doing the load as a misaligned access. My vote would be for
"take Akihiko's patch as-is, rather than gating fixing the bug
on deciding on an improvement/change to the fdt API or our
wrappers of it".

thanks
-- PMM

David Gibson July 9, 2024, 7:41 a.m. UTC | #16

On Mon, Jul 08, 2024 at 05:49:32PM +1000, Nicholas Piggin wrote:
> On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote:
> > On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote:
> > > On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >
> > > > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> > > > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > > > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > > > > > --- a/hw/ppc/vof.c
> > > > > > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base)
> > > > > > > > > > >      mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen);
> > > > > > > > > > >      g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > > > > > > > >      if (sc == 2) {
> > > > > > > > > > > -        mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > > > > > +        mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > > > > > > >      } else {
> > > > > > > > > > >          mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac));
> > > > > > > > > > >      }
> > > > > > > > > >
> > > > > > > > > > I did wonder if there was a better way to do what this is doing,
> > > > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > > > > > provide one.
> > > > > > > > >
> > > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > > > > > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> > > > > > > >
> > > > > > > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > > > > > > part, which we already have QEMU utility functions for (and which
> > > > > > > > are this patch uses).
> > > > > > > >
> > > > > > > > This particular bit of code is dealing with an fdt property ("memory")
> > > > > > > > that is an array of (address, size) tuples where address and size
> > > > > > > > can independently be either 32 or 64 bits, and it wants the
> > > > > > > > size value of tuple 0. So the missing functionality is something at
> > > > > > > > a higher level than fdt32_ld() which would let you say "give me
> > > > > > > > tuple N field X" with some way to specify the tuple layout. (Which
> > > > > > > > is an awkward kind of API to write in C.)
> > > > > > > >
> > > > > > > > Slightly less general, but for this case we could perhaps have
> > > > > > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > > > > > > >
> > > > > > > >   uint64_t value_array[2];
> > > > > > > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", &value_array,
> > > > > > > >                                ac, sc);
> > > > > > > >   /*
> > > > > > > >    * fills in value_array[0] with address, value_array[1] with size,
> > > > > > > >    * probably barfs if the varargs-list of cell-sizes doesn't
> > > > > > > >    * cover the whole property, similar to the current assert on
> > > > > > > >    * proplen.
> > > > > > > >    */
> > > > > > > >   mem0_end = value_array[0];
> > > > > > >
> > > > > > > Since 4/8 byte cells are most common and size is probably
> > > > > > > normally known, what about something simpler to start with?
> > > > > >
> > > > > > Hrm, I don't think this helps much.  As Peter points out the actual
> > > > > > load isn't really the issue, it's locating the right spot for it.
> > > > >
> > > > > I don't really see why that's a problem, it's just a pointer
> > > > > addition - base + fdt_address_cells * 4. The problem was in
> > > >
> > > > This is harder if #address-cells and #size-cells are different, or if
> > > > you're parsing ranges and #address-cells is different between parent
> > > > and child node.
> > > >
> > > > > the memory access (yes it's fixed with the patch but you could
> > > > > add a general libfdt way to do it).
> > > >
> > > > Huh.. well I'm getting different impressions of what the problem
> > > > actually is from what I initially read versus Peter Maydell's
> > > > comments, so I don't really know what to think.
> > > >
> > > > If it's just the load then fdt32_ld() etc. already exist.  Or is it
> > > > really such a hot path that unconditionally handling unaligned
> > > > accesses isn't tenable?
> > > 
> > > The specific problem here is that the code as written tries to
> > > cast a not-aligned-enough pointer to uint64_t* to do the load,
> > > which is UB.
> >
> > Ah... and I'm assuming it's the cast itself which triggers the UB, not
> > just dereferencing it.
> 
> Oh it's just the cast itself that is UB? Looks like that's true.
> Interesting gcc and clang don't flag it, I guess they care about
> warning on practical breakage first.
> 
> > Which makes the interface of fdt32_ld()
> > etc. unusable for their intended purpose.  Well.. damn.  Now... how do
> > I fix it without breaking compatibility any more than I have to.
> 
> Why not just make them take a void * ptr? I don't think that would
> break anything but existing code that was forced to add the cast
> may be at risk of the UB.

That works if recompiling, but I believe it's an ABI change.
Typically library users would get these functions as inlines, but I
believe it's not impossible something could have taken function
pointers to them and get versions from the shared library.

Or... maybe it's not possible, since these aren't listed our
version.lds.  Ugh.

Given the current state is something basically impossible to use
without UB, wearing an ABI breakage might be the lesser evil.

> Could also add fdt64_unaligned_t types with aligned(1) attribute
> for new code. Those can just be dereferenced directly and the
> caller the compiler can choose the appropriate access supported by
> the host. (Actually gcc can recognise that load unaligned and
> byteswap pattern and do the same anyway, but clang at least can
> not yet).

That might be a nice idea, but doesn't solve the immediate versioning
problem.

David Gibson July 9, 2024, 7:46 a.m. UTC | #17

On Mon, Jul 08, 2024 at 04:59:30PM +0100, Peter Maydell wrote:
> On Mon, 8 Jul 2024 at 08:49, Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote:
> > > On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote:
> > > > On Fri, 5 Jul 2024 at 06:13, David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > > Huh.. well I'm getting different impressions of what the problem
> > > > > actually is from what I initially read versus Peter Maydell's
> > > > > comments, so I don't really know what to think.
> > > > >
> > > > > If it's just the load then fdt32_ld() etc. already exist.  Or is it
> > > > > really such a hot path that unconditionally handling unaligned
> > > > > accesses isn't tenable?
> > > >
> > > > The specific problem here is that the code as written tries to
> > > > cast a not-aligned-enough pointer to uint64_t* to do the load,
> > > > which is UB.
> > >
> > > Ah... and I'm assuming it's the cast itself which triggers the UB, not
> > > just dereferencing it.
> >
> > Oh it's just the cast itself that is UB? Looks like that's true.
> > Interesting gcc and clang don't flag it, I guess they care about
> > warning on practical breakage first.
> 
> Er, I was speaking a bit vaguely there, don't take my word for
> it without going and looking at the text of the C standard.

Sure.

> What I *meant* was that the practical problem here is that we
> really do dereference a pointer for a 64-bit load when the
> pointer isn't necessarily 64-bit-aligned.

From the qemu point of view, yes.  And theoretically, the fix is easy,
since libfdt provides fdt32_ld() etc. for exactly this use case.  But..

> As it happens, C99 says that it is the cast that is UB:
> section 6.3.2.3 para 7 says:
>  "A pointer to an object or incomplete type may be converted to
>   a pointer to a different object or incomplete type. If the
>   resulting pointer is not correctly aligned for the pointed-to
>   type, the behavior is undefined. Otherwise, when converted back
>   again, the result shall compare equal to the original pointer."

.. this makes fdt32_ld() etc. unusable by design.

> Presumably this is envisaging the possibility of a pointer cast
> being a destructive operation somehow, such that e.g. a uint64_t*
> can only represent 64-bit-aligned values. But I bet QEMU does
> a lot of casting pointers around that might fall foul of this
> rule, so I'm not particularly worried about trying to clean up
> that kind of thing (until/unless analysers start warning about
> it, in which case we have a specific set of things to clean up).

Fair enough from the qemu point of view.  However, this unusable by
design interface was written by me as part of a library I maintain, so
it certainly worries *me*.

> What I care about from the point of view of this patch
> is that we fix the actually-broken-on-some-real-hardware problem
> of doing the load as a misaligned access. My vote would be for
> "take Akihiko's patch as-is, rather than gating fixing the bug
> on deciding on an improvement/change to the fdt API or our
> wrappers of it".
> 
> thanks
> -- PMM
>

[v2,06/15] ppc/vof: Fix unaligned FDT property access

Commit Message

Comments

Patch