diff mbox series

[7/7] qobject atomics osdep: Make a few macros more hygienic

Message ID 20230831132546.3525721-8-armbru@redhat.com
State New
Headers show
Series [1/7] migration/rdma: Fix save_page method to fail on polling error | expand

Commit Message

Markus Armbruster Aug. 31, 2023, 1:25 p.m. UTC
Variables declared in macros can shadow other variables.  Much of the
time, this is harmless, e.g.:

    #define _FDT(exp)                                                  \
        do {                                                           \
            int ret = (exp);                                           \
            if (ret < 0) {                                             \
                error_report("error creating device tree: %s: %s",   \
                        #exp, fdt_strerror(ret));                      \
                exit(1);                                               \
            }                                                          \
        } while (0)

Harmless shadowing in h_client_architecture_support():

        target_ulong ret;

        [...]

        ret = do_client_architecture_support(cpu, spapr, vec, fdt_bufsize);
        if (ret == H_SUCCESS) {
            _FDT((fdt_pack(spapr->fdt_blob)));
            [...]
        }

        return ret;

However, we can get in trouble when the shadowed variable is used in a
macro argument:

    #define QOBJECT(obj) ({                                 \
        typeof(obj) o = (obj);                              \
        o ? container_of(&(o)->base, QObject, base) : NULL; \
     })

QOBJECT(o) expands into

    ({
--->    typeof(o) o = (o);
        o ? container_of(&(o)->base, QObject, base) : NULL;
    })

Unintended variable name capture at --->.  We'd be saved by
-Winit-self.  But I could certainly construct more elaborate death
traps that don't trigger it.

To reduce the risk of trapping ourselves, we use variable names in
macros that no sane person would use elsewhere.  Here's our actual
definition of QOBJECT():

    #define QOBJECT(obj) ({                                         \
        typeof(obj) _obj = (obj);                                   \
        _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
    })

Works well enough until we nest macro calls.  For instance, with

    #define qobject_ref(obj) ({                     \
        typeof(obj) _obj = (obj);                   \
        qobject_ref_impl(QOBJECT(_obj));            \
        _obj;                                       \
    })

the expression qobject_ref(obj) expands into

    ({
        typeof(obj) _obj = (obj);
        qobject_ref_impl(
            ({
--->            typeof(_obj) _obj = (_obj);
                _obj ? container_of(&(_obj)->base, QObject, base) : NULL;
            }));
        _obj;
    })

Unintended variable name capture at --->.

The only reliable way to prevent unintended variable name capture is
-Wshadow.

One blocker for enabling it is shadowing hiding in function-like
macros like

     qdict_put(dict, "name", qobject_ref(...))

qdict_put() wraps its last argument in QOBJECT(), and the last
argument here contains another QOBJECT().

Use dark preprocessor sorcery to make the macros that give us this
problem use different variable names on every call.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/qobject.h |  8 +++++---
 include/qemu/atomic.h      | 11 ++++++-----
 include/qemu/osdep.h       | 34 +++++++++++++++++++---------------
 3 files changed, 30 insertions(+), 23 deletions(-)

Comments

Eric Blake Aug. 31, 2023, 2:30 p.m. UTC | #1
On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:

[This paragraph written last: Bear with my stream of consciousness
review below, where I end up duplicating some of the conslusions you
reached before the point where I saw where the patch was headed]

> Variables declared in macros can shadow other variables.  Much of the
> time, this is harmless, e.g.:
> 
>     #define _FDT(exp)                                                  \
>         do {                                                           \
>             int ret = (exp);                                           \
>             if (ret < 0) {                                             \
>                 error_report("error creating device tree: %s: %s",   \
>                         #exp, fdt_strerror(ret));                      \
>                 exit(1);                                               \
>             }                                                          \
>         } while (0)

Which is why I've seen some projects require a strict namespace
separation: if all macro parameters and any identifiers declared in
macros use either a leading or a trailing _ (I prefer a trailing one,
to avoid risking conflicts with libc reserved namespace; but leading
is usually okay), and all other identifiers avoid that namespace, then
you will never have shadowing by calling a macro from normal code.

> 
> Harmless shadowing in h_client_architecture_support():
> 
>         target_ulong ret;
> 
>         [...]
> 
>         ret = do_client_architecture_support(cpu, spapr, vec, fdt_bufsize);
>         if (ret == H_SUCCESS) {
>             _FDT((fdt_pack(spapr->fdt_blob)));
>             [...]
>         }
> 
>         return ret;
> 
> However, we can get in trouble when the shadowed variable is used in a
> macro argument:
> 
>     #define QOBJECT(obj) ({                                 \
>         typeof(obj) o = (obj);                              \
>         o ? container_of(&(o)->base, QObject, base) : NULL; \
>      })
> 
> QOBJECT(o) expands into
> 
>     ({
> --->    typeof(o) o = (o);
>         o ? container_of(&(o)->base, QObject, base) : NULL;
>     })
> 
> Unintended variable name capture at --->.  We'd be saved by
> -Winit-self.  But I could certainly construct more elaborate death
> traps that don't trigger it.

Indeed, not fully understanding the preprocessor makes for some
interesting death traps.

> 
> To reduce the risk of trapping ourselves, we use variable names in
> macros that no sane person would use elsewhere.  Here's our actual
> definition of QOBJECT():
> 
>     #define QOBJECT(obj) ({                                         \
>         typeof(obj) _obj = (obj);                                   \
>         _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
>     })

Yep, goes back to the policy I've seen of enforcing naming conventions
that ensure macros can't shadow normal code.

> 
> Works well enough until we nest macro calls.  For instance, with
> 
>     #define qobject_ref(obj) ({                     \
>         typeof(obj) _obj = (obj);                   \
>         qobject_ref_impl(QOBJECT(_obj));            \
>         _obj;                                       \
>     })
> 
> the expression qobject_ref(obj) expands into
> 
>     ({
>         typeof(obj) _obj = (obj);
>         qobject_ref_impl(
>             ({
> --->            typeof(_obj) _obj = (_obj);
>                 _obj ? container_of(&(_obj)->base, QObject, base) : NULL;
>             }));
>         _obj;
>     })
> 
> Unintended variable name capture at --->.

Yep, you've just proven how macros calling macros requires care, as we
no longer have the namespace protections.  If macro calls can only
form a DAG, it is possible to choose unique intermediate declarations
for every macro (although remembering to do that, and still keeping
the macro definition legible, may not be easy).  But if you can have
macros that form any sort of nesting loop (and we WANT to allow things
like MIN(MIN(a, b), c) - which is such a loop), dynamic naming becomes
the only solution.

> 
> The only reliable way to prevent unintended variable name capture is
> -Wshadow.

Yes, I would love to have that enabled eventually.

> 
> One blocker for enabling it is shadowing hiding in function-like
> macros like
> 
>      qdict_put(dict, "name", qobject_ref(...))
> 
> qdict_put() wraps its last argument in QOBJECT(), and the last
> argument here contains another QOBJECT().
> 
> Use dark preprocessor sorcery to make the macros that give us this
> problem use different variable names on every call.

Sounds foreboding; hopefully not many macros are affected...

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  include/qapi/qmp/qobject.h |  8 +++++---
>  include/qemu/atomic.h      | 11 ++++++-----
>  include/qemu/osdep.h       | 34 +++++++++++++++++++---------------
>  3 files changed, 30 insertions(+), 23 deletions(-)

...okay, the size of the diffstat seems tolerable (good that we don't
have many macros that expand to a temporary variable declaration and
which are likely to be heavily reused from nested contexts).

> 
> diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
> index 9003b71fd3..7b50fc905d 100644
> --- a/include/qapi/qmp/qobject.h
> +++ b/include/qapi/qmp/qobject.h
> @@ -45,10 +45,12 @@ struct QObject {
>      struct QObjectBase_ base;
>  };
>  
> -#define QOBJECT(obj) ({                                         \
> -    typeof(obj) _obj = (obj);                                   \
> -    _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
> +#define QOBJECT_INTERNAL(obj, l) ({                                     \
> +    typeof(obj) PASTE(_obj, l) = (obj);                                 \
> +    PASTE(_obj, l)                                                      \
> +        ? container_of(&(PASTE(_obj, l))->base, QObject, base) : NULL;  \
>  })
> +#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)

Slick!  Every call to QOBJECT() defines a unique temporary variable
name.  But as written, QOBJECT(o) expands to this (when __COUNTER__ is
1):

({
  typeof((o)) _obj1 = ((o));
  _obj1 ? container_of(&(_obj1)->base, QObject, base) : NULL;
})

which has three sets of redundant parens that could be elided.  Why do
you need to add () around obj when forwarding to QOBJECT_INTERNAL()?
The only way the expansion of the text passed through the 'obj'
parameter can contain a comma is if the user has already parenthesized
it on their end (not that I expect a comma expression to be a frequent
argument to QOBJECT(), but who knows).  Arguing that it is to silence
a syntax checker is weak; since we must NOT add parens around the
parameters to QOBJECT_INTERNAL (calling PASTE((_obj), (l)) is
obviously wrong).

Meanwhile, the repetition of three calls to PASTE() is annoying.  How
about:

#define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
  typeof _obj _tmp = _obj; \
  _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
  )}
#define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
#define QOBJECT_INTERNAL(_arg, _ctr) \
  QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)

or:

#define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
  typeof(_obj) _tmp = (_obj); \
  _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
  )}
#define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
#define QOBJECT_INTERNAL(_arg, _ctr) \
  QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
#define QOBJECT(obj) QOBJECT_INTERNAL(obj, __COUNTER__)

where, in either case, QOBJECT(o) should then expand to

({
  typeof (o) _obj1 = (o);
  _obj1 ? container_of(&_obj1->base, QObject, base) : NULL;
})

>  
>  /* Required for qobject_to() */
>  #define QTYPE_CAST_TO_QNull     QTYPE_QNULL
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index d95612f7a0..3f80ffac69 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -157,13 +157,14 @@
>      smp_read_barrier_depends();
>  #endif
>  
> -#define qatomic_rcu_read(ptr)                          \
> -    ({                                                 \
> +#define qatomic_rcu_read_internal(ptr, l)               \
> +    ({                                                  \
>      qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
> -    typeof_strip_qual(*ptr) _val;                      \
> -    qatomic_rcu_read__nocheck(ptr, &_val);             \
> -    _val;                                              \
> +    typeof_strip_qual(*ptr) PASTE(_val, l);             \
> +    qatomic_rcu_read__nocheck(ptr, &PASTE(_val, l));    \
> +    PASTE(_val, l);                                     \
>      })
> +#define qatomic_rcu_read(ptr) qatomic_rcu_read_internal((ptr), __COUNTER__)

Same observation about being able to reduce the number of PASTE()
calls by adding yet another layer of macro invocations.

>  
>  #define qatomic_rcu_set(ptr, i) do {                   \
>      qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 21ef8f1699..9c191ebe99 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -371,18 +371,21 @@ void QEMU_ERROR("code path is reachable")
>   * have to open-code it.  Sadly, Coverity is severely confused by the
>   * constant variants, so we have to dumb things down there.
>   */
> +#define PASTE(a, b) a##b
> +#define MIN_INTERNAL(a, b, l)                                           \
> +    ({                                                                  \
> +        typeof(1 ? (a) : (b)) PASTE(_a, l) = (a), PASTE(_b, l) = (b);   \
> +        PASTE(_a, l) < PASTE(_b, l) ? PASTE(_a, l) : PASTE(_b, l);      \
> +    })

And again.

I think you are definitely on the right track to have all internal
variable declarations within a macro definition use multi-layered
expansion with the help of __COUNTER__ to ensure that the macro's
temporary variable is globally unique; so if you leave it as written,
I could live with:

Reviewed-by: Eric Blake <eblake@redhat.com>

But if you respin it to pick up any of my suggestions, I'll definitely
spend another review cycle on the result.

If it helps, here's what I ended up using in nbdkit:

https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/unique-name.h#L36
/* https://stackoverflow.com/a/1597129
 * https://stackoverflow.com/a/12711226
 */
#define XXUNIQUE_NAME(name, counter) name ## counter
#define XUNIQUE_NAME(name, counter) XXUNIQUE_NAME (name, counter)
#define NBDKIT_UNIQUE_NAME(name) XUNIQUE_NAME (name, __COUNTER__)

https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/minmax.h#L47
#include "unique-name.h"

#undef MIN
#define MIN(x, y) \
  MIN_1 ((x), (y), NBDKIT_UNIQUE_NAME (_x), NBDKIT_UNIQUE_NAME (_y))

...
#define MIN_1(x, y, _x, _y) ({                 \
      __auto_type _x = (x);                    \
      __auto_type _y = (y);                    \
      _x < _y ? _x : _y;                       \
    })
Richard Henderson Aug. 31, 2023, 3:52 p.m. UTC | #2
On 8/31/23 06:25, Markus Armbruster wrote:
> +#define PASTE(a, b) a##b

We already have glue() in qemu/compiler.h.

The rest of it looks quite sensible.


r~
Markus Armbruster Sept. 1, 2023, 8:12 a.m. UTC | #3
Richard Henderson <richard.henderson@linaro.org> writes:

> On 8/31/23 06:25, Markus Armbruster wrote:
>> +#define PASTE(a, b) a##b
>
> We already have glue() in qemu/compiler.h.

Missed it, will fix.

> The rest of it looks quite sensible.

Thanks!
Markus Armbruster Sept. 1, 2023, 8:48 a.m. UTC | #4
Eric Blake <eblake@redhat.com> writes:

> On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:
>
> [This paragraph written last: Bear with my stream of consciousness
> review below, where I end up duplicating some of the conslusions you
> reached before the point where I saw where the patch was headed]
>
>> Variables declared in macros can shadow other variables.  Much of the
>> time, this is harmless, e.g.:
>> 
>>     #define _FDT(exp)                                                  \
>>         do {                                                           \
>>             int ret = (exp);                                           \
>>             if (ret < 0) {                                             \
>>                 error_report("error creating device tree: %s: %s",   \
>>                         #exp, fdt_strerror(ret));                      \
>>                 exit(1);                                               \
>>             }                                                          \
>>         } while (0)
>
> Which is why I've seen some projects require a strict namespace
> separation: if all macro parameters and any identifiers declared in
> macros use either a leading or a trailing _ (I prefer a trailing one,
> to avoid risking conflicts with libc reserved namespace; but leading
> is usually okay), and all other identifiers avoid that namespace, then
> you will never have shadowing by calling a macro from normal code.
>
>> 
>> Harmless shadowing in h_client_architecture_support():
>> 
>>         target_ulong ret;
>> 
>>         [...]
>> 
>>         ret = do_client_architecture_support(cpu, spapr, vec, fdt_bufsize);
>>         if (ret == H_SUCCESS) {
>>             _FDT((fdt_pack(spapr->fdt_blob)));
>>             [...]
>>         }
>> 
>>         return ret;
>> 
>> However, we can get in trouble when the shadowed variable is used in a
>> macro argument:
>> 
>>     #define QOBJECT(obj) ({                                 \
>>         typeof(obj) o = (obj);                              \
>>         o ? container_of(&(o)->base, QObject, base) : NULL; \
>>      })
>> 
>> QOBJECT(o) expands into
>> 
>>     ({
>> --->    typeof(o) o = (o);
>>         o ? container_of(&(o)->base, QObject, base) : NULL;
>>     })
>> 
>> Unintended variable name capture at --->.  We'd be saved by
>> -Winit-self.  But I could certainly construct more elaborate death
>> traps that don't trigger it.
>
> Indeed, not fully understanding the preprocessor makes for some
> interesting death traps.

We use ALL_CAPS for macros to signal "watch out for traps".

>> To reduce the risk of trapping ourselves, we use variable names in
>> macros that no sane person would use elsewhere.  Here's our actual
>> definition of QOBJECT():
>> 
>>     #define QOBJECT(obj) ({                                         \
>>         typeof(obj) _obj = (obj);                                   \
>>         _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
>>     })
>
> Yep, goes back to the policy I've seen of enforcing naming conventions
> that ensure macros can't shadow normal code.
>
>> 
>> Works well enough until we nest macro calls.  For instance, with
>> 
>>     #define qobject_ref(obj) ({                     \
>>         typeof(obj) _obj = (obj);                   \
>>         qobject_ref_impl(QOBJECT(_obj));            \
>>         _obj;                                       \
>>     })
>> 
>> the expression qobject_ref(obj) expands into
>> 
>>     ({
>>         typeof(obj) _obj = (obj);
>>         qobject_ref_impl(
>>             ({
>> --->            typeof(_obj) _obj = (_obj);
>>                 _obj ? container_of(&(_obj)->base, QObject, base) : NULL;
>>             }));
>>         _obj;
>>     })
>> 
>> Unintended variable name capture at --->.
>
> Yep, you've just proven how macros calling macros requires care, as we
> no longer have the namespace protections.  If macro calls can only
> form a DAG, it is possible to choose unique intermediate declarations
> for every macro (although remembering to do that, and still keeping
> the macro definition legible, may not be easy).  But if you can have
> macros that form any sort of nesting loop (and we WANT to allow things
> like MIN(MIN(a, b), c) - which is such a loop), dynamic naming becomes
> the only solution.
>
>> 
>> The only reliable way to prevent unintended variable name capture is
>> -Wshadow.
>
> Yes, I would love to have that enabled eventually.
>
>> 
>> One blocker for enabling it is shadowing hiding in function-like
>> macros like
>> 
>>      qdict_put(dict, "name", qobject_ref(...))
>> 
>> qdict_put() wraps its last argument in QOBJECT(), and the last
>> argument here contains another QOBJECT().
>> 
>> Use dark preprocessor sorcery to make the macros that give us this
>> problem use different variable names on every call.
>
> Sounds foreboding; hopefully not many macros are affected...
>
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  include/qapi/qmp/qobject.h |  8 +++++---
>>  include/qemu/atomic.h      | 11 ++++++-----
>>  include/qemu/osdep.h       | 34 +++++++++++++++++++---------------
>>  3 files changed, 30 insertions(+), 23 deletions(-)
>
> ...okay, the size of the diffstat seems tolerable (good that we don't
> have many macros that expand to a temporary variable declaration and
> which are likely to be heavily reused from nested contexts).
>
>> diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
>> index 9003b71fd3..7b50fc905d 100644
>> --- a/include/qapi/qmp/qobject.h
>> +++ b/include/qapi/qmp/qobject.h
>> @@ -45,10 +45,12 @@ struct QObject {
>>      struct QObjectBase_ base;
>>  };
>>  
>> -#define QOBJECT(obj) ({                                         \
>> -    typeof(obj) _obj = (obj);                                   \
>> -    _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
>> +#define QOBJECT_INTERNAL(obj, l) ({                                     \
>> +    typeof(obj) PASTE(_obj, l) = (obj);                                 \
>> +    PASTE(_obj, l)                                                      \
>> +        ? container_of(&(PASTE(_obj, l))->base, QObject, base) : NULL;  \
>>  })
>> +#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
>
> Slick!  Every call to QOBJECT() defines a unique temporary variable
> name.  But as written, QOBJECT(o) expands to this (when __COUNTER__ is
> 1):
>
> ({
>   typeof((o)) _obj1 = ((o));
>   _obj1 ? container_of(&(_obj1)->base, QObject, base) : NULL;
> })
>
> which has three sets of redundant parens that could be elided.  Why do
> you need to add () around obj when forwarding to QOBJECT_INTERNAL()?

Habit: enclose macro arguments in parenthesis unless there's a specific
need not to.

> The only way the expansion of the text passed through the 'obj'
> parameter can contain a comma is if the user has already parenthesized
> it on their end (not that I expect a comma expression to be a frequent
> argument to QOBJECT(), but who knows).  Arguing that it is to silence
> a syntax checker is weak; since we must NOT add parens around the
> parameters to QOBJECT_INTERNAL (calling PASTE((_obj), (l)) is
> obviously wrong).
>
> Meanwhile, the repetition of three calls to PASTE() is annoying.  How
> about:
>
> #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
>   typeof _obj _tmp = _obj; \
>   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
>   )}
> #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
> #define QOBJECT_INTERNAL(_arg, _ctr) \
>   QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> #define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
>
> or:
>
> #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
>   typeof(_obj) _tmp = (_obj); \
>   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
>   )}
> #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
> #define QOBJECT_INTERNAL(_arg, _ctr) \
>   QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> #define QOBJECT(obj) QOBJECT_INTERNAL(obj, __COUNTER__)
>
> where, in either case, QOBJECT(o) should then expand to
>
> ({
>   typeof (o) _obj1 = (o);
>   _obj1 ? container_of(&_obj1->base, QObject, base) : NULL;
> })

The idea is to have the innermost macro take the variable name instead
of the counter.  "PASTE(_obj, l)" then becomes "_tmp" there, which is
more legible.  However, we pay for it by going through two more macros.

Can you explain why you need two more?

Complication: the innermost macro needs to take all its variable names
as arguments.  The macro above has just one.  Macros below have two.

Not quite sure which version is easier to understand.

>>  /* Required for qobject_to() */
>>  #define QTYPE_CAST_TO_QNull     QTYPE_QNULL
>> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
>> index d95612f7a0..3f80ffac69 100644
>> --- a/include/qemu/atomic.h
>> +++ b/include/qemu/atomic.h
>> @@ -157,13 +157,14 @@
>>      smp_read_barrier_depends();
>>  #endif
>>  
>> -#define qatomic_rcu_read(ptr)                          \
>> -    ({                                                 \
>> +#define qatomic_rcu_read_internal(ptr, l)               \
>> +    ({                                                  \
>>      qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
>> -    typeof_strip_qual(*ptr) _val;                      \
>> -    qatomic_rcu_read__nocheck(ptr, &_val);             \
>> -    _val;                                              \
>> +    typeof_strip_qual(*ptr) PASTE(_val, l);             \
>> +    qatomic_rcu_read__nocheck(ptr, &PASTE(_val, l));    \
>> +    PASTE(_val, l);                                     \
>>      })
>> +#define qatomic_rcu_read(ptr) qatomic_rcu_read_internal((ptr), __COUNTER__)
>
> Same observation about being able to reduce the number of PASTE()
> calls by adding yet another layer of macro invocations.
>
>>  
>>  #define qatomic_rcu_set(ptr, i) do {                   \
>>      qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
>> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>> index 21ef8f1699..9c191ebe99 100644
>> --- a/include/qemu/osdep.h
>> +++ b/include/qemu/osdep.h
>> @@ -371,18 +371,21 @@ void QEMU_ERROR("code path is reachable")
>>   * have to open-code it.  Sadly, Coverity is severely confused by the
>>   * constant variants, so we have to dumb things down there.
>>   */
>> +#define PASTE(a, b) a##b
>> +#define MIN_INTERNAL(a, b, l)                                           \
>> +    ({                                                                  \
>> +        typeof(1 ? (a) : (b)) PASTE(_a, l) = (a), PASTE(_b, l) = (b);   \
>> +        PASTE(_a, l) < PASTE(_b, l) ? PASTE(_a, l) : PASTE(_b, l);      \
>> +    })
>
> And again.
>
> I think you are definitely on the right track to have all internal
> variable declarations within a macro definition use multi-layered
> expansion with the help of __COUNTER__ to ensure that the macro's
> temporary variable is globally unique; so if you leave it as written,
> I could live with:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> But if you respin it to pick up any of my suggestions, I'll definitely
> spend another review cycle on the result.
>
> If it helps, here's what I ended up using in nbdkit:
>
> https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/unique-name.h#L36
> /* https://stackoverflow.com/a/1597129
>  * https://stackoverflow.com/a/12711226
>  */
> #define XXUNIQUE_NAME(name, counter) name ## counter
> #define XUNIQUE_NAME(name, counter) XXUNIQUE_NAME (name, counter)
> #define NBDKIT_UNIQUE_NAME(name) XUNIQUE_NAME (name, __COUNTER__)
>
> https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/minmax.h#L47
> #include "unique-name.h"
>
> #undef MIN
> #define MIN(x, y) \
>   MIN_1 ((x), (y), NBDKIT_UNIQUE_NAME (_x), NBDKIT_UNIQUE_NAME (_y))
>
> ...
> #define MIN_1(x, y, _x, _y) ({                 \
>       __auto_type _x = (x);                    \
>       __auto_type _y = (y);                    \
>       _x < _y ? _x : _y;                       \
>     })

Thanks!
Cédric Le Goater Sept. 1, 2023, 12:59 p.m. UTC | #5
On 8/31/23 16:30, Eric Blake wrote:
> On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:
> 
> [This paragraph written last: Bear with my stream of consciousness
> review below, where I end up duplicating some of the conslusions you
> reached before the point where I saw where the patch was headed]
> 
>> Variables declared in macros can shadow other variables.  Much of the
>> time, this is harmless, e.g.:
>>
>>      #define _FDT(exp)                                                  \
>>          do {                                                           \
>>              int ret = (exp);                                           \
>>              if (ret < 0) {                                             \
>>                  error_report("error creating device tree: %s: %s",   \
>>                          #exp, fdt_strerror(ret));                      \
>>                  exit(1);                                               \
>>              }                                                          \
>>          } while (0)
> 
> Which is why I've seen some projects require a strict namespace
> separation: if all macro parameters and any identifiers declared in
> macros use either a leading or a trailing _ (I prefer a trailing one,
> to avoid risking conflicts with libc reserved namespace; but leading
> is usually okay), and all other identifiers avoid that namespace, then
> you will never have shadowing by calling a macro from normal code.

I started fixing the _FDT() macro since it is quite noisy at compile.
Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_'
and 'i_' ? I used a 'local_' prefix for now but I can change.

I also have a bunch of fixes for ppc.

C.

> 
>>
>> Harmless shadowing in h_client_architecture_support():
>>
>>          target_ulong ret;
>>
>>          [...]
>>
>>          ret = do_client_architecture_support(cpu, spapr, vec, fdt_bufsize);
>>          if (ret == H_SUCCESS) {
>>              _FDT((fdt_pack(spapr->fdt_blob)));
>>              [...]
>>          }
>>
>>          return ret;
>>
>> However, we can get in trouble when the shadowed variable is used in a
>> macro argument:
>>
>>      #define QOBJECT(obj) ({                                 \
>>          typeof(obj) o = (obj);                              \
>>          o ? container_of(&(o)->base, QObject, base) : NULL; \
>>       })
>>
>> QOBJECT(o) expands into
>>
>>      ({
>> --->    typeof(o) o = (o);
>>          o ? container_of(&(o)->base, QObject, base) : NULL;
>>      })
>>
>> Unintended variable name capture at --->.  We'd be saved by
>> -Winit-self.  But I could certainly construct more elaborate death
>> traps that don't trigger it.
> 
> Indeed, not fully understanding the preprocessor makes for some
> interesting death traps.
> 
>>
>> To reduce the risk of trapping ourselves, we use variable names in
>> macros that no sane person would use elsewhere.  Here's our actual
>> definition of QOBJECT():
>>
>>      #define QOBJECT(obj) ({                                         \
>>          typeof(obj) _obj = (obj);                                   \
>>          _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
>>      })
> 
> Yep, goes back to the policy I've seen of enforcing naming conventions
> that ensure macros can't shadow normal code.
> 
>>
>> Works well enough until we nest macro calls.  For instance, with
>>
>>      #define qobject_ref(obj) ({                     \
>>          typeof(obj) _obj = (obj);                   \
>>          qobject_ref_impl(QOBJECT(_obj));            \
>>          _obj;                                       \
>>      })
>>
>> the expression qobject_ref(obj) expands into
>>
>>      ({
>>          typeof(obj) _obj = (obj);
>>          qobject_ref_impl(
>>              ({
>> --->            typeof(_obj) _obj = (_obj);
>>                  _obj ? container_of(&(_obj)->base, QObject, base) : NULL;
>>              }));
>>          _obj;
>>      })
>>
>> Unintended variable name capture at --->.
> 
> Yep, you've just proven how macros calling macros requires care, as we
> no longer have the namespace protections.  If macro calls can only
> form a DAG, it is possible to choose unique intermediate declarations
> for every macro (although remembering to do that, and still keeping
> the macro definition legible, may not be easy).  But if you can have
> macros that form any sort of nesting loop (and we WANT to allow things
> like MIN(MIN(a, b), c) - which is such a loop), dynamic naming becomes
> the only solution.
> 
>>
>> The only reliable way to prevent unintended variable name capture is
>> -Wshadow.
> 
> Yes, I would love to have that enabled eventually.
> 
>>
>> One blocker for enabling it is shadowing hiding in function-like
>> macros like
>>
>>       qdict_put(dict, "name", qobject_ref(...))
>>
>> qdict_put() wraps its last argument in QOBJECT(), and the last
>> argument here contains another QOBJECT().
>>
>> Use dark preprocessor sorcery to make the macros that give us this
>> problem use different variable names on every call.
> 
> Sounds foreboding; hopefully not many macros are affected...
> 
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   include/qapi/qmp/qobject.h |  8 +++++---
>>   include/qemu/atomic.h      | 11 ++++++-----
>>   include/qemu/osdep.h       | 34 +++++++++++++++++++---------------
>>   3 files changed, 30 insertions(+), 23 deletions(-)
> 
> ...okay, the size of the diffstat seems tolerable (good that we don't
> have many macros that expand to a temporary variable declaration and
> which are likely to be heavily reused from nested contexts).
> 
>>
>> diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
>> index 9003b71fd3..7b50fc905d 100644
>> --- a/include/qapi/qmp/qobject.h
>> +++ b/include/qapi/qmp/qobject.h
>> @@ -45,10 +45,12 @@ struct QObject {
>>       struct QObjectBase_ base;
>>   };
>>   
>> -#define QOBJECT(obj) ({                                         \
>> -    typeof(obj) _obj = (obj);                                   \
>> -    _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
>> +#define QOBJECT_INTERNAL(obj, l) ({                                     \
>> +    typeof(obj) PASTE(_obj, l) = (obj);                                 \
>> +    PASTE(_obj, l)                                                      \
>> +        ? container_of(&(PASTE(_obj, l))->base, QObject, base) : NULL;  \
>>   })
>> +#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
> 
> Slick!  Every call to QOBJECT() defines a unique temporary variable
> name.  But as written, QOBJECT(o) expands to this (when __COUNTER__ is
> 1):
> 
> ({
>    typeof((o)) _obj1 = ((o));
>    _obj1 ? container_of(&(_obj1)->base, QObject, base) : NULL;
> })
> 
> which has three sets of redundant parens that could be elided.  Why do
> you need to add () around obj when forwarding to QOBJECT_INTERNAL()?
> The only way the expansion of the text passed through the 'obj'
> parameter can contain a comma is if the user has already parenthesized
> it on their end (not that I expect a comma expression to be a frequent
> argument to QOBJECT(), but who knows).  Arguing that it is to silence
> a syntax checker is weak; since we must NOT add parens around the
> parameters to QOBJECT_INTERNAL (calling PASTE((_obj), (l)) is
> obviously wrong).
> 
> Meanwhile, the repetition of three calls to PASTE() is annoying.  How
> about:
> 
> #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
>    typeof _obj _tmp = _obj; \
>    _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
>    )}
> #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
> #define QOBJECT_INTERNAL(_arg, _ctr) \
>    QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> #define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
> 
> or:
> 
> #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
>    typeof(_obj) _tmp = (_obj); \
>    _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
>    )}
> #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
> #define QOBJECT_INTERNAL(_arg, _ctr) \
>    QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> #define QOBJECT(obj) QOBJECT_INTERNAL(obj, __COUNTER__)
> 
> where, in either case, QOBJECT(o) should then expand to
> 
> ({
>    typeof (o) _obj1 = (o);
>    _obj1 ? container_of(&_obj1->base, QObject, base) : NULL;
> })
> 
>>   
>>   /* Required for qobject_to() */
>>   #define QTYPE_CAST_TO_QNull     QTYPE_QNULL
>> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
>> index d95612f7a0..3f80ffac69 100644
>> --- a/include/qemu/atomic.h
>> +++ b/include/qemu/atomic.h
>> @@ -157,13 +157,14 @@
>>       smp_read_barrier_depends();
>>   #endif
>>   
>> -#define qatomic_rcu_read(ptr)                          \
>> -    ({                                                 \
>> +#define qatomic_rcu_read_internal(ptr, l)               \
>> +    ({                                                  \
>>       qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
>> -    typeof_strip_qual(*ptr) _val;                      \
>> -    qatomic_rcu_read__nocheck(ptr, &_val);             \
>> -    _val;                                              \
>> +    typeof_strip_qual(*ptr) PASTE(_val, l);             \
>> +    qatomic_rcu_read__nocheck(ptr, &PASTE(_val, l));    \
>> +    PASTE(_val, l);                                     \
>>       })
>> +#define qatomic_rcu_read(ptr) qatomic_rcu_read_internal((ptr), __COUNTER__)
> 
> Same observation about being able to reduce the number of PASTE()
> calls by adding yet another layer of macro invocations.
> 
>>   
>>   #define qatomic_rcu_set(ptr, i) do {                   \
>>       qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
>> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>> index 21ef8f1699..9c191ebe99 100644
>> --- a/include/qemu/osdep.h
>> +++ b/include/qemu/osdep.h
>> @@ -371,18 +371,21 @@ void QEMU_ERROR("code path is reachable")
>>    * have to open-code it.  Sadly, Coverity is severely confused by the
>>    * constant variants, so we have to dumb things down there.
>>    */
>> +#define PASTE(a, b) a##b
>> +#define MIN_INTERNAL(a, b, l)                                           \
>> +    ({                                                                  \
>> +        typeof(1 ? (a) : (b)) PASTE(_a, l) = (a), PASTE(_b, l) = (b);   \
>> +        PASTE(_a, l) < PASTE(_b, l) ? PASTE(_a, l) : PASTE(_b, l);      \
>> +    })
> 
> And again.
> 
> I think you are definitely on the right track to have all internal
> variable declarations within a macro definition use multi-layered
> expansion with the help of __COUNTER__ to ensure that the macro's
> temporary variable is globally unique; so if you leave it as written,
> I could live with:
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> But if you respin it to pick up any of my suggestions, I'll definitely
> spend another review cycle on the result.
> 
> If it helps, here's what I ended up using in nbdkit:
> 
> https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/unique-name.h#L36
> /* https://stackoverflow.com/a/1597129
>   * https://stackoverflow.com/a/12711226
>   */
> #define XXUNIQUE_NAME(name, counter) name ## counter
> #define XUNIQUE_NAME(name, counter) XXUNIQUE_NAME (name, counter)
> #define NBDKIT_UNIQUE_NAME(name) XUNIQUE_NAME (name, __COUNTER__)
> 
> https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/minmax.h#L47
> #include "unique-name.h"
> 
> #undef MIN
> #define MIN(x, y) \
>    MIN_1 ((x), (y), NBDKIT_UNIQUE_NAME (_x), NBDKIT_UNIQUE_NAME (_y))
> 
> ...
> #define MIN_1(x, y, _x, _y) ({                 \
>        __auto_type _x = (x);                    \
>        __auto_type _y = (y);                    \
>        _x < _y ? _x : _y;                       \
>      })
>
Eric Blake Sept. 1, 2023, 1:18 p.m. UTC | #6
On Fri, Sep 01, 2023 at 10:48:26AM +0200, Markus Armbruster wrote:
> > Indeed, not fully understanding the preprocessor makes for some
> > interesting death traps.
> 
> We use ALL_CAPS for macros to signal "watch out for traps".
> 

> >> -#define QOBJECT(obj) ({                                         \
> >> -    typeof(obj) _obj = (obj);                                   \
> >> -    _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
> >> +#define QOBJECT_INTERNAL(obj, l) ({                                     \
> >> +    typeof(obj) PASTE(_obj, l) = (obj);                                 \
> >> +    PASTE(_obj, l)                                                      \
> >> +        ? container_of(&(PASTE(_obj, l))->base, QObject, base) : NULL;  \
> >>  })
> >> +#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
> >
> > Slick!  Every call to QOBJECT() defines a unique temporary variable
> > name.  But as written, QOBJECT(o) expands to this (when __COUNTER__ is
> > 1):
> >
> > ({
> >   typeof((o)) _obj1 = ((o));
> >   _obj1 ? container_of(&(_obj1)->base, QObject, base) : NULL;
> > })
> >
> > which has three sets of redundant parens that could be elided.  Why do
> > you need to add () around obj when forwarding to QOBJECT_INTERNAL()?
> 
> Habit: enclose macro arguments in parenthesis unless there's a specific
> need not to.
> 
> > The only way the expansion of the text passed through the 'obj'
> > parameter can contain a comma is if the user has already parenthesized
> > it on their end (not that I expect a comma expression to be a frequent
> > argument to QOBJECT(), but who knows).  Arguing that it is to silence
> > a syntax checker is weak; since we must NOT add parens around the
> > parameters to QOBJECT_INTERNAL (calling PASTE((_obj), (l)) is
> > obviously wrong).
> >
> > Meanwhile, the repetition of three calls to PASTE() is annoying.  How
> > about:
> >
> > #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
> >   typeof _obj _tmp = _obj; \
> >   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
> >   )}
> > #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
> > #define QOBJECT_INTERNAL(_arg, _ctr) \
> >   QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> > #define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
> >
> > or:
> >
> > #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
> >   typeof(_obj) _tmp = (_obj); \
> >   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
> >   )}
> > #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
> > #define QOBJECT_INTERNAL(_arg, _ctr) \
> >   QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> > #define QOBJECT(obj) QOBJECT_INTERNAL(obj, __COUNTER__)
> >
> > where, in either case, QOBJECT(o) should then expand to
> >
> > ({
> >   typeof (o) _obj1 = (o);
> >   _obj1 ? container_of(&_obj1->base, QObject, base) : NULL;
> > })
> 
> The idea is to have the innermost macro take the variable name instead
> of the counter.  "PASTE(_obj, l)" then becomes "_tmp" there, which is
> more legible.  However, we pay for it by going through two more macros.
> 
> Can you explain why you need two more?

Likewise habit, on my part (and laziness - I wrote the previous email
without testing anything). I've learned over the years that pasting is
hard (you can't mix ## with a macro that you want expanded without 2
layers of indirection), so I started out by adding two layers of
QOBJECT_INTERNAL, then wrote QOBJECT to forward to QOBJECT_INTERNAL.
But now that you asked, I actually spent the time to test with the
preprocessor, and your hunch is indeed correct: I over-compensated
becaues of my habit.

$cat foo.c
#define PASTE(_a, _b) _a##_b

#define QOBJECT_INTERNAL_2(_obj, _tmp) ({       \
  typeof _obj _tmp = _obj; \
  _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
  )}
#define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
#define QOBJECT_INTERNAL(_arg, _ctr) \
  QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)

QOBJECT(o)

#define Q_INTERNAL_1(_obj, _tmp) ({ \
  typeof _obj _tmp = _obj; \
  _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
  )}
#define Q_INTERNAL(_arg, _ctr) \
  Q_INTERNAL_1(_arg, PASTE(_obj, _ctr))
#define Q(obj) Q_INTERNAL((obj), __COUNTER__)

Q(o)
$ gcc -E foo.c
# 0 "foo.c"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "foo.c"
# 12 "foo.c"
({ typeof (o) _obj0 = (o); _obj0 ? container_of(&_obj0->base, QObject, base) : NULL; )}
# 22 "foo.c"
({ typeof (o) _obj1 = (o); _obj1 ? container_of(&_obj1->base, QObject, base) : NULL; )}

So the important part was merely ensuring that __COUNTER__ has a
chance to be expanded PRIOR to the point where ## gets its argument,
and one less layer of indirection was still sufficient for that.

> 
> Complication: the innermost macro needs to take all its variable names
> as arguments.  The macro above has just one.  Macros below have two.
> 
> Not quite sure which version is easier to understand.

Proper comments may help; once you realize that you are passing in the
unique variable name(s) to be used as the temporary identifier(s),
rather than an integer that still needs to be glued, then separating
the task of generating name(s) (which is done once per name, instead
of repeated 3 times) makes sense to me.  I also like minimizing the
number of times I have to spell out __COUNTER__; wrapping unique name
generation behind a well-named helper macro gives a better view of why
it is needed in the first place.

At any rate, this comment still applies:

> >
> > I think you are definitely on the right track to have all internal
> > variable declarations within a macro definition use multi-layered
> > expansion with the help of __COUNTER__ to ensure that the macro's
> > temporary variable is globally unique; so if you leave it as written,
> > I could live with:
> >
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> >
> > But if you respin it to pick up any of my suggestions, I'll definitely
> > spend another review cycle on the result.
> >
> > If it helps, here's what I ended up using in nbdkit:
> >
> > https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/unique-name.h#L36
> > /* https://stackoverflow.com/a/1597129
> >  * https://stackoverflow.com/a/12711226
> >  */
> > #define XXUNIQUE_NAME(name, counter) name ## counter
> > #define XUNIQUE_NAME(name, counter) XXUNIQUE_NAME (name, counter)
> > #define NBDKIT_UNIQUE_NAME(name) XUNIQUE_NAME (name, __COUNTER__)
> >
> > https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/minmax.h#L47
> > #include "unique-name.h"
> >
> > #undef MIN
> > #define MIN(x, y) \
> >   MIN_1 ((x), (y), NBDKIT_UNIQUE_NAME (_x), NBDKIT_UNIQUE_NAME (_y))
> >
> > ...
> > #define MIN_1(x, y, _x, _y) ({                 \
> >       __auto_type _x = (x);                    \
> >       __auto_type _y = (y);                    \
> >       _x < _y ? _x : _y;                       \
> >     })
> 
> Thanks!

and so I'm fine leaving it up to you if you want to copy the paradigm
I've seen elsewhere, or if you want to leave it as you first proposed.
The end result is the same, and it is more of a judgment call on which
form is easier for you to reason about.

But as other commenters mentioned, we already have a glue() macro, so
use that instead of PASTE(), no matter what else you choose.

Looking forward to v2!
Philippe Mathieu-Daudé Sept. 1, 2023, 2:31 p.m. UTC | #7
On 1/9/23 14:59, Cédric Le Goater wrote:
> On 8/31/23 16:30, Eric Blake wrote:
>> On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:
>>
>> [This paragraph written last: Bear with my stream of consciousness
>> review below, where I end up duplicating some of the conslusions you
>> reached before the point where I saw where the patch was headed]
>>
>>> Variables declared in macros can shadow other variables.  Much of the
>>> time, this is harmless, e.g.:
>>>
>>>      #define 
>>> _FDT(exp)                                                  \
>>>          do 
>>> {                                                           \
>>>              int ret = 
>>> (exp);                                           \
>>>              if (ret < 0) 
>>> {                                             \
>>>                  error_report("error creating device tree: %s: %s",   \
>>>                          #exp, 
>>> fdt_strerror(ret));                      \
>>>                  
>>> exit(1);                                               \
>>>              
>>> }                                                          \
>>>          } while (0)
>>
>> Which is why I've seen some projects require a strict namespace
>> separation: if all macro parameters and any identifiers declared in
>> macros use either a leading or a trailing _ (I prefer a trailing one,
>> to avoid risking conflicts with libc reserved namespace; but leading
>> is usually okay), and all other identifiers avoid that namespace, then
>> you will never have shadowing by calling a macro from normal code.
> 
> I started fixing the _FDT() macro since it is quite noisy at compile.
> Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_'
> and 'i_' ? I used a 'local_' prefix for now but I can change.

See 
https://lore.kernel.org/qemu-devel/20230831225607.30829-12-philmd@linaro.org/
Markus Armbruster Sept. 1, 2023, 2:50 p.m. UTC | #8
Cédric Le Goater <clg@kaod.org> writes:

> On 8/31/23 16:30, Eric Blake wrote:
>> On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:
>> [This paragraph written last: Bear with my stream of consciousness
>> review below, where I end up duplicating some of the conslusions you
>> reached before the point where I saw where the patch was headed]
>> 
>>> Variables declared in macros can shadow other variables.  Much of the
>>> time, this is harmless, e.g.:
>>>
>>>      #define _FDT(exp)                                                  \
>>>          do {                                                           \
>>>              int ret = (exp);                                           \
>>>              if (ret < 0) {                                             \
>>>                  error_report("error creating device tree: %s: %s",   \
>>>                          #exp, fdt_strerror(ret));                      \
>>>                  exit(1);                                               \
>>>              }                                                          \
>>>          } while (0)
>> Which is why I've seen some projects require a strict namespace
>> separation: if all macro parameters and any identifiers declared in
>> macros use either a leading or a trailing _ (I prefer a trailing one,
>> to avoid risking conflicts with libc reserved namespace; but leading
>> is usually okay), and all other identifiers avoid that namespace, then
>> you will never have shadowing by calling a macro from normal code.
>
> I started fixing the _FDT() macro since it is quite noisy at compile.
> Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_'
> and 'i_' ? I used a 'local_' prefix for now but I can change.

I believe identifiers with a '_' suffix are just fine in macros.  We
have quite a few of them already.

> I also have a bunch of fixes for ppc.

Appreciated!
Cédric Le Goater Sept. 1, 2023, 2:54 p.m. UTC | #9
On 9/1/23 16:50, Markus Armbruster wrote:
> Cédric Le Goater <clg@kaod.org> writes:
> 
>> On 8/31/23 16:30, Eric Blake wrote:
>>> On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:
>>> [This paragraph written last: Bear with my stream of consciousness
>>> review below, where I end up duplicating some of the conslusions you
>>> reached before the point where I saw where the patch was headed]
>>>
>>>> Variables declared in macros can shadow other variables.  Much of the
>>>> time, this is harmless, e.g.:
>>>>
>>>>       #define _FDT(exp)                                                  \
>>>>           do {                                                           \
>>>>               int ret = (exp);                                           \
>>>>               if (ret < 0) {                                             \
>>>>                   error_report("error creating device tree: %s: %s",   \
>>>>                           #exp, fdt_strerror(ret));                      \
>>>>                   exit(1);                                               \
>>>>               }                                                          \
>>>>           } while (0)
>>> Which is why I've seen some projects require a strict namespace
>>> separation: if all macro parameters and any identifiers declared in
>>> macros use either a leading or a trailing _ (I prefer a trailing one,
>>> to avoid risking conflicts with libc reserved namespace; but leading
>>> is usually okay), and all other identifiers avoid that namespace, then
>>> you will never have shadowing by calling a macro from normal code.
>>
>> I started fixing the _FDT() macro since it is quite noisy at compile.
>> Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_'
>> and 'i_' ? I used a 'local_' prefix for now but I can change.
> 
> I believe identifiers with a '_' suffix are just fine in macros.  We
> have quite a few of them already.

ok

>> I also have a bunch of fixes for ppc.
> 
> Appreciated!

count me on for the ppc files :

  hw/ppc/pnv_psi.c
  hw/ppc/spapr.c
  hw/ppc/spapr_drc.c
  hw/ppc/spapr_pci.c
  include/hw/ppc/fdt.h

and surely some other files under target/ppc/

This one was taken care of by Phil:

  include/sysemu/device_tree.h

C.
Markus Armbruster Sept. 19, 2023, 6:29 a.m. UTC | #10
Eric Blake <eblake@redhat.com> writes:

> On Fri, Sep 01, 2023 at 10:48:26AM +0200, Markus Armbruster wrote:
>> > Indeed, not fully understanding the preprocessor makes for some
>> > interesting death traps.
>> 
>> We use ALL_CAPS for macros to signal "watch out for traps".
>> 
>
>> >> -#define QOBJECT(obj) ({                                         \
>> >> -    typeof(obj) _obj = (obj);                                   \
>> >> -    _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
>> >> +#define QOBJECT_INTERNAL(obj, l) ({                                     \
>> >> +    typeof(obj) PASTE(_obj, l) = (obj);                                 \
>> >> +    PASTE(_obj, l)                                                      \
>> >> +        ? container_of(&(PASTE(_obj, l))->base, QObject, base) : NULL;  \
>> >>  })
>> >> +#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
>> >
>> > Slick!  Every call to QOBJECT() defines a unique temporary variable
>> > name.  But as written, QOBJECT(o) expands to this (when __COUNTER__ is
>> > 1):
>> >
>> > ({
>> >   typeof((o)) _obj1 = ((o));
>> >   _obj1 ? container_of(&(_obj1)->base, QObject, base) : NULL;
>> > })
>> >
>> > which has three sets of redundant parens that could be elided.  Why do
>> > you need to add () around obj when forwarding to QOBJECT_INTERNAL()?
>> 
>> Habit: enclose macro arguments in parenthesis unless there's a specific
>> need not to.
>> 
>> > The only way the expansion of the text passed through the 'obj'
>> > parameter can contain a comma is if the user has already parenthesized
>> > it on their end (not that I expect a comma expression to be a frequent
>> > argument to QOBJECT(), but who knows).  Arguing that it is to silence
>> > a syntax checker is weak; since we must NOT add parens around the
>> > parameters to QOBJECT_INTERNAL (calling PASTE((_obj), (l)) is
>> > obviously wrong).
>> >
>> > Meanwhile, the repetition of three calls to PASTE() is annoying.  How
>> > about:
>> >
>> > #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
>> >   typeof _obj _tmp = _obj; \
>> >   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
>> >   )}
>> > #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
>> > #define QOBJECT_INTERNAL(_arg, _ctr) \
>> >   QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
>> > #define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
>> >
>> > or:
>> >
>> > #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
>> >   typeof(_obj) _tmp = (_obj); \
>> >   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
>> >   )}
>> > #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
>> > #define QOBJECT_INTERNAL(_arg, _ctr) \
>> >   QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
>> > #define QOBJECT(obj) QOBJECT_INTERNAL(obj, __COUNTER__)
>> >
>> > where, in either case, QOBJECT(o) should then expand to
>> >
>> > ({
>> >   typeof (o) _obj1 = (o);
>> >   _obj1 ? container_of(&_obj1->base, QObject, base) : NULL;
>> > })
>> 
>> The idea is to have the innermost macro take the variable name instead
>> of the counter.  "PASTE(_obj, l)" then becomes "_tmp" there, which is
>> more legible.  However, we pay for it by going through two more macros.
>> 
>> Can you explain why you need two more?
>
> Likewise habit, on my part (and laziness - I wrote the previous email
> without testing anything). I've learned over the years that pasting is
> hard (you can't mix ## with a macro that you want expanded without 2
> layers of indirection), so I started out by adding two layers of
> QOBJECT_INTERNAL, then wrote QOBJECT to forward to QOBJECT_INTERNAL.
> But now that you asked, I actually spent the time to test with the
> preprocessor, and your hunch is indeed correct: I over-compensated
> becaues of my habit.
>
> $cat foo.c
> #define PASTE(_a, _b) _a##_b
>
> #define QOBJECT_INTERNAL_2(_obj, _tmp) ({       \
>   typeof _obj _tmp = _obj; \
>   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
>   )}
> #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
> #define QOBJECT_INTERNAL(_arg, _ctr) \
>   QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> #define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
>
> QOBJECT(o)
>
> #define Q_INTERNAL_1(_obj, _tmp) ({ \
>   typeof _obj _tmp = _obj; \
>   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
>   )}
> #define Q_INTERNAL(_arg, _ctr) \
>   Q_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> #define Q(obj) Q_INTERNAL((obj), __COUNTER__)
>
> Q(o)
> $ gcc -E foo.c
> # 0 "foo.c"
> # 0 "<built-in>"
> # 0 "<command-line>"
> # 1 "/usr/include/stdc-predef.h" 1 3 4
> # 0 "<command-line>" 2
> # 1 "foo.c"
> # 12 "foo.c"
> ({ typeof (o) _obj0 = (o); _obj0 ? container_of(&_obj0->base, QObject, base) : NULL; )}
> # 22 "foo.c"
> ({ typeof (o) _obj1 = (o); _obj1 ? container_of(&_obj1->base, QObject, base) : NULL; )}
>
> So the important part was merely ensuring that __COUNTER__ has a
> chance to be expanded PRIOR to the point where ## gets its argument,
> and one less layer of indirection was still sufficient for that.

The version with less indirection is easier to understand for me.

>> 
>> Complication: the innermost macro needs to take all its variable names
>> as arguments.  The macro above has just one.  Macros below have two.
>> 
>> Not quite sure which version is easier to understand.
>
> Proper comments may help; once you realize that you are passing in the
> unique variable name(s) to be used as the temporary identifier(s),
> rather than an integer that still needs to be glued, then separating
> the task of generating name(s) (which is done once per name, instead
> of repeated 3 times) makes sense to me.  I also like minimizing the
> number of times I have to spell out __COUNTER__; wrapping unique name
> generation behind a well-named helper macro gives a better view of why
> it is needed in the first place.

I can add this comment to every instance of the __COUNTER__ trick:

    /*
     * Preprocessor wizardry ahead: glue(_val, l) expands to a new
     * identifier in each macro expansion.  Helps avoid shadowing
     * variables and the unwanted name captures that come with it.
     */

> At any rate, this comment still applies:
>
>> >
>> > I think you are definitely on the right track to have all internal
>> > variable declarations within a macro definition use multi-layered
>> > expansion with the help of __COUNTER__ to ensure that the macro's
>> > temporary variable is globally unique; so if you leave it as written,
>> > I could live with:
>> >
>> > Reviewed-by: Eric Blake <eblake@redhat.com>
>> >
>> > But if you respin it to pick up any of my suggestions, I'll definitely
>> > spend another review cycle on the result.
>> >
>> > If it helps, here's what I ended up using in nbdkit:
>> >
>> > https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/unique-name.h#L36
>> > /* https://stackoverflow.com/a/1597129
>> >  * https://stackoverflow.com/a/12711226
>> >  */
>> > #define XXUNIQUE_NAME(name, counter) name ## counter
>> > #define XUNIQUE_NAME(name, counter) XXUNIQUE_NAME (name, counter)
>> > #define NBDKIT_UNIQUE_NAME(name) XUNIQUE_NAME (name, __COUNTER__)
>> >
>> > https://gitlab.com/nbdkit/nbdkit/-/blame/master/common/include/minmax.h#L47
>> > #include "unique-name.h"
>> >
>> > #undef MIN
>> > #define MIN(x, y) \
>> >   MIN_1 ((x), (y), NBDKIT_UNIQUE_NAME (_x), NBDKIT_UNIQUE_NAME (_y))
>> >
>> > ...
>> > #define MIN_1(x, y, _x, _y) ({                 \
>> >       __auto_type _x = (x);                    \
>> >       __auto_type _y = (y);                    \
>> >       _x < _y ? _x : _y;                       \
>> >     })
>> 
>> Thanks!
>
> and so I'm fine leaving it up to you if you want to copy the paradigm
> I've seen elsewhere, or if you want to leave it as you first proposed.
> The end result is the same, and it is more of a judgment call on which
> form is easier for you to reason about.

I need to review the two competing versions once more to decide which I
find easier to read.  Or shall I say less hard; preprocessor wizardry is
never really easy.

> But as other commenters mentioned, we already have a glue() macro, so
> use that instead of PASTE(), no matter what else you choose.
>
> Looking forward to v2!

Thanks again!
diff mbox series

Patch

diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index 9003b71fd3..7b50fc905d 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -45,10 +45,12 @@  struct QObject {
     struct QObjectBase_ base;
 };
 
-#define QOBJECT(obj) ({                                         \
-    typeof(obj) _obj = (obj);                                   \
-    _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
+#define QOBJECT_INTERNAL(obj, l) ({                                     \
+    typeof(obj) PASTE(_obj, l) = (obj);                                 \
+    PASTE(_obj, l)                                                      \
+        ? container_of(&(PASTE(_obj, l))->base, QObject, base) : NULL;  \
 })
+#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
 
 /* Required for qobject_to() */
 #define QTYPE_CAST_TO_QNull     QTYPE_QNULL
diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index d95612f7a0..3f80ffac69 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -157,13 +157,14 @@ 
     smp_read_barrier_depends();
 #endif
 
-#define qatomic_rcu_read(ptr)                          \
-    ({                                                 \
+#define qatomic_rcu_read_internal(ptr, l)               \
+    ({                                                  \
     qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
-    typeof_strip_qual(*ptr) _val;                      \
-    qatomic_rcu_read__nocheck(ptr, &_val);             \
-    _val;                                              \
+    typeof_strip_qual(*ptr) PASTE(_val, l);             \
+    qatomic_rcu_read__nocheck(ptr, &PASTE(_val, l));    \
+    PASTE(_val, l);                                     \
     })
+#define qatomic_rcu_read(ptr) qatomic_rcu_read_internal((ptr), __COUNTER__)
 
 #define qatomic_rcu_set(ptr, i) do {                   \
     qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 21ef8f1699..9c191ebe99 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -371,18 +371,21 @@  void QEMU_ERROR("code path is reachable")
  * have to open-code it.  Sadly, Coverity is severely confused by the
  * constant variants, so we have to dumb things down there.
  */
+#define PASTE(a, b) a##b
+#define MIN_INTERNAL(a, b, l)                                           \
+    ({                                                                  \
+        typeof(1 ? (a) : (b)) PASTE(_a, l) = (a), PASTE(_b, l) = (b);   \
+        PASTE(_a, l) < PASTE(_b, l) ? PASTE(_a, l) : PASTE(_b, l);      \
+    })
 #undef MIN
-#define MIN(a, b)                                       \
-    ({                                                  \
-        typeof(1 ? (a) : (b)) _a = (a), _b = (b);       \
-        _a < _b ? _a : _b;                              \
+#define MIN(a, b) MIN_INTERNAL((a), (b), __COUNTER__)
+#define MAX_INTERNAL(a, b, l)                                           \
+    ({                                                                  \
+        typeof(1 ? (a) : (b)) PASTE(_a, l) = (a), PASTE(_b, l) = (b);   \
+        PASTE(_a, l) > PASTE(_b, l) ? PASTE(_a, l) : PASTE(_b, l);      \
     })
 #undef MAX
-#define MAX(a, b)                                       \
-    ({                                                  \
-        typeof(1 ? (a) : (b)) _a = (a), _b = (b);       \
-        _a > _b ? _a : _b;                              \
-    })
+#define MAX(a, b) MAX_INTERNAL((a), (b), __COUNTER__)
 
 #ifdef __COVERITY__
 # define MIN_CONST(a, b) ((a) < (b) ? (a) : (b))
@@ -404,13 +407,14 @@  void QEMU_ERROR("code path is reachable")
  * Minimum function that returns zero only if both values are zero.
  * Intended for use with unsigned values only.
  */
-#ifndef MIN_NON_ZERO
-#define MIN_NON_ZERO(a, b)                              \
-    ({                                                  \
-        typeof(1 ? (a) : (b)) _a = (a), _b = (b);       \
-        _a == 0 ? _b : (_b == 0 || _b > _a) ? _a : _b;  \
+#define MIN_NON_ZERO_INTERNAL(a, b, l)                                  \
+    ({                                                                  \
+        typeof(1 ? (a) : (b)) PASTE(_a, l) = (a), PASTE(_b, l) = (b);   \
+        PASTE(_a, l) == 0 ? PASTE(_b, l)                                \
+        : (PASTE(_b, l) == 0 || PASTE(_b, l) > PASTE(_a, l)) ? PASTE(_a, l) \
+        : PASTE(_b, l);                                                 \
     })
-#endif
+#define MIN_NON_ZERO(a, b) MIN_NON_ZERO_INTERNAL((a), (b), __COUNTER__)
 
 /*
  * Round number down to multiple. Safe when m is not a power of 2 (see