diff mbox series

gimple ssa: Teach switch conversion to optimize powers of 2 switches

Message ID ZlhsVV7W0Fv8SCAk@fkdesktop.suse.cz
State New
Headers show
Series gimple ssa: Teach switch conversion to optimize powers of 2 switches | expand

Commit Message

Filip Kastl May 30, 2024, 12:08 p.m. UTC
Hi,

This patch adds a transformation into the switch conversion pass --
the "exponential index transform".  This transformation can help switch
conversion convert switches it otherwise could not.  The transformation is
intended for switches whose cases are all powers of 2.  Here is a more detailed
description of what and how this patch tries to address:


The problem
-----------

Consider this piece of C code

switch (i)
  {
    case (1 << 0): return 0;
    case (1 << 1): return 1;
    case (1 << 2): return 2;
    ...
    case (1 << 30): return 30;
    default: return 31;
  }

If i is a power of 2 (2^k), the switch just returns the exponent (k).  This can
be seen as taking the base 2 logarithm of i or as returning the position of the
singular 1 bit in i.

Currently, GCC fails to see this and doesn't optimize the switch in any way.

Switch conversion is able to optimize similar switches to an array lookup.
This is not possible here because the range of cases is too wide.  Another
optimization that switch conversion is able to do is the "linear
transformation" -- switch conversion is able to notice a linear relationship
between the index variable (variable i in the case) and the result value and
rewrite switch to just an assignment (or multiple assignments in case of
multiple result values). Sadly, linear transformation isn't applicable here
because the linear relationship is based on the exponent of i, not on i itself.


The solution
------------

The exponential index transformation does the following.  When it recognises
that a switch only has case numbers that are powers of 2 it replaces them with
their exponents.  It also replaces the index variable by its exponent.  This is
done by inserting a statement that takes the logarithm of i and using the
result as the new index variable.  Actually we use the FFS operation for this
-- since we expect a power of two, we may just ask for the position of the
first 1 bit.

We also need to insert a conditional that checks at runtime that the index
variable is a power of two.  If it isn't, the resulting value should just be
the default case value (31 in the example above).

With exponential index transform, switch conversion is able to simplify the
above example into something like this

if (i is power of 2)
  return log2(i); // actually implemented as ffs(i) - 1
else
  return 31;

Switch conversion bails if the range of case numbers is too big.  Exponential
index transform shrinks this range (exponentially).  So even if there is no
linear relationship in the switch, exponential index transform can still help
convert the switch at least to an array lookup.


Limitations
-----------

Currently we only run the exponential index transform if the target has the
POPCOUNT (for checking a number is a power of 2) and FFS (for taking the
logarithm) instructions -- we check direct_internal_fn_supported_p () for
POPCOUNT and FFS internal functions.  Otherwise maybe computing FFS could be
less efficient than just using a jump table.  We try to avoid transforming a
switch into a less efficient form.  Maybe this is too conservative and could be
tweaked in the future.


Bootstrapped and regtested on x86_64 linux.  I have additionally run bootstrap
and regtest on a version where I removed the check that the target has the
POPCOUNT and FFS instructions so that the transformation would be triggered
more often.  That testing also went well.

Are there any things I should tweak?  Or is the patch ready to be applied?

Cheers,
Filip Kastl


-- 8< --


Sometimes a switch has case numbers that are powers of 2.  Switch
conversion usually isn't able to optimize switches.  This patch adds
"exponential index transformation" to switch conversion.  After switch
conversion applies this transformation on the switch the index variable
of the switch becomes the exponent instead of the whole value.  For
example:

switch (i)
  {
    case (1 << 0): return 0;
    case (1 << 1): return 1;
    case (1 << 2): return 2;
    ...
    case (1 << 30): return 30;
    default: return 31;
  }

gets transformed roughly into

switch (log2(i))
  {
    case 0: return 0;
    case 1: return 1;
    case 2: return 2;
    ...
    case 30: return 30;
    default: return 31;
  }

This enables switch conversion to further optimize the switch.

This patch only enables this transformation if there are optabs for
POPCOUNT and FFS so that the base 2 logarithm can be computed
efficiently at runtime.

gcc/ChangeLog:

	* tree-switch-conversion.cc (switch_conversion::switch_conversion):
	Track if the transformation happened.
	(switch_conversion::is_exp_index_transform_viable): New function
	to decide whether the transformation should be applied.
	(switch_conversion::exp_index_transform): New function to
	execute the transformation.
	(switch_conversion::gen_inbound_check): Don't remove the default
	BB if the transformation happened.
	(switch_conversion::expand): Execute the transform if it is
	viable.  Skip test for sufficiently small case range if the
	transformation is going to be executed.
	* tree-switch-conversion.h: Add is_exp_index_transform viable
	and exp_index_transform.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/switch-exp-transform-1.c: New test.
	* gcc.target/i386/switch-exp-transform-2.c: New test.
	* gcc.target/i386/switch-exp-transform-3.c: New test.

Signed-off-by: Filip Kastl <fkastl@suse.cz>
---
 .../gcc.target/i386/switch-exp-transform-1.c  |  34 +++
 .../gcc.target/i386/switch-exp-transform-2.c  |  36 +++
 .../gcc.target/i386/switch-exp-transform-3.c  | 151 +++++++++
 gcc/tree-switch-conversion.cc                 | 289 +++++++++++++++++-
 gcc/tree-switch-conversion.h                  |  18 ++
 5 files changed, 523 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c

Comments

Andrew Pinski May 30, 2024, 12:21 p.m. UTC | #1
On Thu, May 30, 2024 at 5:09 AM Filip Kastl <fkastl@suse.cz> wrote:
>
> Hi,
>
> This patch adds a transformation into the switch conversion pass --
> the "exponential index transform".  This transformation can help switch
> conversion convert switches it otherwise could not.  The transformation is
> intended for switches whose cases are all powers of 2.  Here is a more detailed
> description of what and how this patch tries to address:
>
>
> The problem
> -----------
>
> Consider this piece of C code
>
> switch (i)
>   {
>     case (1 << 0): return 0;
>     case (1 << 1): return 1;
>     case (1 << 2): return 2;
>     ...
>     case (1 << 30): return 30;
>     default: return 31;
>   }
>
> If i is a power of 2 (2^k), the switch just returns the exponent (k).  This can
> be seen as taking the base 2 logarithm of i or as returning the position of the
> singular 1 bit in i.
>
> Currently, GCC fails to see this and doesn't optimize the switch in any way.
>
> Switch conversion is able to optimize similar switches to an array lookup.
> This is not possible here because the range of cases is too wide.  Another
> optimization that switch conversion is able to do is the "linear
> transformation" -- switch conversion is able to notice a linear relationship
> between the index variable (variable i in the case) and the result value and
> rewrite switch to just an assignment (or multiple assignments in case of
> multiple result values). Sadly, linear transformation isn't applicable here
> because the linear relationship is based on the exponent of i, not on i itself.
>
>
> The solution
> ------------
>
> The exponential index transformation does the following.  When it recognises
> that a switch only has case numbers that are powers of 2 it replaces them with
> their exponents.  It also replaces the index variable by its exponent.  This is
> done by inserting a statement that takes the logarithm of i and using the
> result as the new index variable.  Actually we use the FFS operation for this
> -- since we expect a power of two, we may just ask for the position of the
> first 1 bit.
>
> We also need to insert a conditional that checks at runtime that the index
> variable is a power of two.  If it isn't, the resulting value should just be
> the default case value (31 in the example above).
>
> With exponential index transform, switch conversion is able to simplify the
> above example into something like this
>
> if (i is power of 2)
>   return log2(i); // actually implemented as ffs(i) - 1
> else
>   return 31;
>
> Switch conversion bails if the range of case numbers is too big.  Exponential
> index transform shrinks this range (exponentially).  So even if there is no
> linear relationship in the switch, exponential index transform can still help
> convert the switch at least to an array lookup.
>
>
> Limitations
> -----------
>
> Currently we only run the exponential index transform if the target has the
> POPCOUNT (for checking a number is a power of 2) and FFS (for taking the
> logarithm) instructions -- we check direct_internal_fn_supported_p () for
> POPCOUNT and FFS internal functions.  Otherwise maybe computing FFS could be
> less efficient than just using a jump table.  We try to avoid transforming a
> switch into a less efficient form.  Maybe this is too conservative and could be
> tweaked in the future.
>
>
> Bootstrapped and regtested on x86_64 linux.  I have additionally run bootstrap
> and regtest on a version where I removed the check that the target has the
> POPCOUNT and FFS instructions so that the transformation would be triggered
> more often.  That testing also went well.
>
> Are there any things I should tweak?  Or is the patch ready to be applied?
>
> Cheers,
> Filip Kastl
>
>
> -- 8< --
>
>
> Sometimes a switch has case numbers that are powers of 2.  Switch
> conversion usually isn't able to optimize switches.  This patch adds
> "exponential index transformation" to switch conversion.  After switch
> conversion applies this transformation on the switch the index variable
> of the switch becomes the exponent instead of the whole value.  For
> example:
>
> switch (i)
>   {
>     case (1 << 0): return 0;
>     case (1 << 1): return 1;
>     case (1 << 2): return 2;
>     ...
>     case (1 << 30): return 30;
>     default: return 31;
>   }
>
> gets transformed roughly into
>
> switch (log2(i))
>   {
>     case 0: return 0;
>     case 1: return 1;
>     case 2: return 2;
>     ...
>     case 30: return 30;
>     default: return 31;
>   }
>
> This enables switch conversion to further optimize the switch.
>
> This patch only enables this transformation if there are optabs for
> POPCOUNT and FFS so that the base 2 logarithm can be computed
> efficiently at runtime.
>
> gcc/ChangeLog:
>
>         * tree-switch-conversion.cc (switch_conversion::switch_conversion):
>         Track if the transformation happened.
>         (switch_conversion::is_exp_index_transform_viable): New function
>         to decide whether the transformation should be applied.
>         (switch_conversion::exp_index_transform): New function to
>         execute the transformation.
>         (switch_conversion::gen_inbound_check): Don't remove the default
>         BB if the transformation happened.
>         (switch_conversion::expand): Execute the transform if it is
>         viable.  Skip test for sufficiently small case range if the
>         transformation is going to be executed.
>         * tree-switch-conversion.h: Add is_exp_index_transform viable
>         and exp_index_transform.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/switch-exp-transform-1.c: New test.
>         * gcc.target/i386/switch-exp-transform-2.c: New test.
>         * gcc.target/i386/switch-exp-transform-3.c: New test.
>
> Signed-off-by: Filip Kastl <fkastl@suse.cz>
> ---
>  .../gcc.target/i386/switch-exp-transform-1.c  |  34 +++
>  .../gcc.target/i386/switch-exp-transform-2.c  |  36 +++
>  .../gcc.target/i386/switch-exp-transform-3.c  | 151 +++++++++
>  gcc/tree-switch-conversion.cc                 | 289 +++++++++++++++++-
>  gcc/tree-switch-conversion.h                  |  18 ++
>  5 files changed, 523 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> new file mode 100644
> index 00000000000..d51dd110623
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> +
> +/* Checks that exponential index transform enables switch conversion to convert
> +   this switch into an array lookup.  Also checks that the "index variable is a
> +   power of two" check has been generated.  -march=znver3 because that
> +   microarchitecture supports POPCOUNT and FFS instructions and exponential
> +   index transform requires that.  */
> +
> +int foo(unsigned bar)
> +{
> +    switch (bar)
> +    {
> +        case (1 << 0):
> +            return 1;
> +        case (1 << 1):
> +            return 2;
> +        case (1 << 2):
> +            return 3;
> +        case (1 << 3):
> +            return 4;
> +        case (1 << 4):
> +            return 8;
> +        case (1 << 5):
> +            return 13;
> +        case (1 << 6):
> +            return 21;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
> +/* { dg-final { scan-tree-dump "POPCOUNT" "switchconv" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c b/gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c
> new file mode 100644
> index 00000000000..9f2b536aa74
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> +
> +/* Checks that when exponential index transform is viable but switch conversion
> +   decides that the switch cannot be converted, the exponential index transform
> +   is not done.  -march=znver3 because that microarchitecture supports POPCOUNT
> +   and FFS instructions and exponential index transform requires that.  */
> +
> +int a;
> +
> +int foo(unsigned bar)
> +{
> +    switch (bar)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            a = 3;
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Exponential index transform viable" "switchconv" } } */
> +/* { dg-final { scan-tree-dump-not "Applying exponential index transform" "switchconv" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
> new file mode 100644
> index 00000000000..e9665d4a38d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
> @@ -0,0 +1,151 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> +
> +/* Checks that the exponential index transformation is done for all these types
> +   of the index variable:
> +   - (unsigned) int
> +   - (unsigned) long
> +   - (unsigned) long long
> +
> +   -march=znver3 because that microarchitecture supports POPCOUNT
> +   and FFS instructions and exponential index transform requires that.  */
> +
> +int unopt_int(int bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_unsigned_int(unsigned int bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_long(long bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_unsigned_long(unsigned long bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_long_long(long long bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_unsigned_long_long(unsigned long long bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 "switchconv" } } */
> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
> index 3a5b84c09e2..975888c0969 100644
> --- a/gcc/tree-switch-conversion.cc
> +++ b/gcc/tree-switch-conversion.cc
> @@ -52,6 +52,8 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
>  #include "omp-general.h"
>  #include "gimple-range.h"
>  #include "tree-cfgcleanup.h"
> +#include "hwint.h"
> +#include "internal-fn.h"
>
>  /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode
>     type in the GIMPLE type system that is language-independent?  */
> @@ -66,7 +68,8 @@ using namespace tree_switch_conversion;
>  switch_conversion::switch_conversion (): m_final_bb (NULL),
>    m_constructors (NULL), m_default_values (NULL),
>    m_arr_ref_first (NULL), m_arr_ref_last (NULL),
> -  m_reason (NULL), m_default_case_nonstandard (false), m_cfg_altered (false)
> +  m_reason (NULL), m_default_case_nonstandard (false), m_cfg_altered (false),
> +  m_exp_index_transform_applied (false)
>  {
>  }
>
> @@ -202,6 +205,267 @@ switch_conversion::collect (gswitch *swtch)
>      }
>  }
>
> +/* Check that the "exponential index transform" can be applied to this switch.
> +
> +   See comment of the exp_index_transform function for details about this
> +   transformation.
> +
> +   We want:
> +   - This form of the switch is more efficient
> +   - Cases are powers of 2
> +
> +   Expects that SWTCH has at least one case.  */
> +
> +bool
> +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> +{
> +  tree index = gimple_switch_index (swtch);
> +  tree index_type = TREE_TYPE (index);
> +  basic_block swtch_bb = gimple_bb (swtch);
> +  unsigned num_labels = gimple_switch_num_labels (swtch);
> +
> +  /* Check that we can efficiently compute logarithm of 2^k (using FFS) and
> +     test that a given number is 2^k for some k (using POPCOUNT).  */
> +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> +    || !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, opt_type))
> +    return false;
> +
> +  /* Check that each case label corresponds only to one value
> +     (no case 1..3).  */
> +  unsigned i;
> +  for (i = 1; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      if (CASE_HIGH (label))
> +       return false;
> +    }
> +
> +  /* Check that each label is nonnegative.  */
> +  for (i = 1; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      if (!tree_fits_uhwi_p (CASE_LOW (label)))
> +       return false;
> +    }
> +
> +  /* Check that each case is a power of 2.  */
> +  for (i = 1; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));

I Know you check tree_fits_uhwi_p above but couldn't you use
wide_int::exact_log2 here instead?
That is I am not a fan of using tree_to_uhwi but rather using wide_int.
Also the use of direct_internal_fn_supported_p seems too early since
you could figure out the widest value in use and use that to see what
mode/type to use.

Thanks,
Andrew

> +      if (!pow2p_hwi (label_hwi))
> +       return false;
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "Exponential index transform viable\n");
> +
> +  return true;
> +}
> +
> +/* Perform the "exponential index transform".
> +
> +   Assume that cases of SWTCH are powers of 2.  The transformation replaces the
> +   cases by their exponents (2^k -> k).  It also inserts a statement that
> +   computes the exponent of the original index variable (basically taking the
> +   logarithm) and then sets the result as the new index variable.
> +
> +   The transformation also inserts a conditional statement checking that the
> +   incoming original index variable is a power of 2 with the false edge leading
> +   to the default case.
> +
> +   The exponential index transform shrinks the range of case numbers which
> +   helps switch conversion convert switches it otherwise could not.
> +
> +   Consider for example:
> +
> +   switch (i)
> +     {
> +       case (1 << 0): return 0;
> +       case (1 << 1): return 1;
> +       case (1 << 2): return 2;
> +       ...
> +       case (1 << 30): return 30;
> +       default: return 31;
> +     }
> +
> +   First, exponential index transform gets applied.  Since each case becomes
> +   case x: return x;, the rest of switch conversion is then able to get rid of
> +   the switch statement.
> +
> +   if (i is power of 2)
> +     return log2 (i);
> +   else
> +     return 31;
> +
> +   */
> +
> +void
> +switch_conversion::exp_index_transform (gswitch *swtch)
> +{
> +  if (dump_file)
> +    fprintf (dump_file, "Applying exponential index transform\n");
> +
> +  tree index = gimple_switch_index (swtch);
> +  tree index_type = TREE_TYPE (index);
> +  basic_block swtch_bb = gimple_bb (swtch);
> +  unsigned num_labels = gimple_switch_num_labels (swtch);
> +  tree one = build_one_cst (index_type);
> +
> +  /* Insert a cond stmt that checks if the index variable is a power of 2.  */
> +  gimple_stmt_iterator gsi = gsi_for_stmt (swtch);
> +  gsi_prev (&gsi);
> +  gimple *foo = gsi_stmt (gsi);
> +  edge new_edge1 = split_block (swtch_bb, foo);
> +
> +  swtch_bb = new_edge1->dest;
> +  basic_block cond_bb = new_edge1->src;
> +  new_edge1->flags |= EDGE_TRUE_VALUE;
> +  new_edge1->flags &= ~EDGE_FALLTHRU;
> +  new_edge1->probability = profile_probability::even ();
> +
> +  basic_block default_bb = gimple_switch_default_bb (cfun, swtch);
> +  edge new_edge2 = make_edge (cond_bb, default_bb, EDGE_FALSE_VALUE);
> +  new_edge2->probability = profile_probability::even ();
> +
> +  gsi = gsi_last_bb (cond_bb);
> +
> +  tree tmp1 = make_ssa_name (index_type);
> +  gcall *stmt_popcount = gimple_build_call_internal (IFN_POPCOUNT, 2, index,
> +                                                    one);
> +  gimple_call_set_lhs (stmt_popcount, tmp1);
> +  gsi_insert_after (&gsi, stmt_popcount, GSI_NEW_STMT);
> +
> +  gcond *stmt_cond = gimple_build_cond (EQ_EXPR, tmp1, one, NULL, NULL);
> +  gsi_insert_after (&gsi, stmt_cond, GSI_NEW_STMT);
> +
> +  /* We just added an edge going to default bb so fix PHI nodes in that bb:
> +     For each PHI add new PHI arg.  It will be the same arg as when comming to
> +     the default bb from the switch bb.  */
> +  edge default_edge = find_edge (swtch_bb, default_bb);
> +  for (gphi_iterator gsi = gsi_start_phis (default_bb);
> +       !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gphi *phi = gsi.phi ();
> +      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, default_edge);
> +      location_t loc = gimple_phi_arg_location_from_edge (phi, default_edge);
> +      add_phi_arg (phi, arg, new_edge2, loc);
> +    }
> +
> +  /* Insert a statement that takes the logarithm of the index variable.  */
> +  tree tmp2 = make_ssa_name (index_type);
> +  gsi = gsi_start_bb (swtch_bb);
> +  gcall *stmt_ffs = gimple_build_call_internal (IFN_FFS, 1, index);
> +  gimple_call_set_lhs (stmt_ffs, tmp2);
> +  gsi_insert_before (&gsi, stmt_ffs, GSI_SAME_STMT);
> +
> +  tree tmp3 = make_ssa_name (index_type);
> +  gassign *stmt_minus_one = gimple_build_assign (tmp3, MINUS_EXPR, tmp2, one);
> +  gsi_insert_before (&gsi, stmt_minus_one, GSI_SAME_STMT);
> +
> +  /* Use the result of the logarithm as the new index variable.  */
> +  gimple_switch_set_index (swtch, tmp3);
> +  update_stmt (swtch);
> +
> +  /* Replace each case number with its logarithm.  */
> +  unsigned i;
> +  for (i = 1; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
> +      CASE_LOW (label) = build_int_cst (index_type, floor_log2 (label_hwi));
> +    }
> +
> +  /* Fix the dominator tree, if it is available.  */
> +  if (dom_info_available_p (CDI_DOMINATORS))
> +    {
> +      /* Analysis of how dominators should look after we add the edge E going
> +        from the cond block to the default block.
> +
> +        1 For the blocks between the switch block and the final block
> +        (excluding the final block itself):  They had the switch block as
> +        their immediate dominator.  That shouldn't change.
> +
> +        2 The final block may now have the switch block or the cond block as
> +        its immediate dominator.  There's no easy way of knowing (consider
> +        two cases where in both m_default_case_nonstandard = true, in one a
> +        path through default intersects the final block and in one all paths
> +        through default avoid the final block but intersect a successor of the
> +        final block).
> +
> +        3 Other blocks that had the switch block as their immediate dominator
> +        should now have the cond block as their immediate dominator.
> +
> +        4 Immediate dominators of the rest of the blocks shouldn't change.
> +
> +        Reasoning for 3 and 4:
> +
> +        We'll only consider blocks that do not fall into 1 or 2.
> +
> +        Consider a block X whose original imm dom was the switch block.  All
> +        paths to X must also intersect the cond block since it's the only
> +        pred of the switch block.  The final block doesn't dominate X so at
> +        least one path P must lead through the default block.  Let P' be P but
> +        instead of going through the switch block, take E.  The switch block
> +        doesn't dominate X so its imm dom must now be the cond block.
> +
> +        Consider a block X whose original imm dom was Y != the switch block.
> +        We only added an edge so all original paths to X are still present.
> +        So X gained no new dominators.  Observe that Y still dominates X.
> +        There would have to be a path that avoids Y otherwise.  But any block
> +        we can avoid now except for the switch block we were able to avoid
> +        before adding E.  */
> +
> +      redirect_immediate_dominators (CDI_DOMINATORS, swtch_bb, cond_bb);
> +
> +      /* FIXME: Since exponential transform is run only if we know that the
> +        switch will be converted, we know these blocks will be removed and we
> +        maybe don't have to bother updating their dominators.  */
> +      edge e;
> +      edge_iterator ei;
> +      FOR_EACH_EDGE (e, ei, swtch_bb->succs)
> +       {
> +         basic_block bb = e->dest;
> +         if (bb == m_final_bb || bb == default_bb)
> +           continue;
> +         set_immediate_dominator (CDI_DOMINATORS, bb, swtch_bb);
> +       }
> +
> +      vec<basic_block> v;
> +      v.create (1);
> +      v.quick_push (m_final_bb);
> +      iterate_fix_dominators (CDI_DOMINATORS, v, true);
> +    }
> +
> +  /* Update information about the switch statement.  */
> +  tree first_label = gimple_switch_label (swtch, 1);
> +  tree last_label = gimple_switch_label (swtch, num_labels - 1);
> +
> +  m_range_min = CASE_LOW (first_label);
> +  m_range_max = CASE_LOW (last_label);
> +  m_index_expr = gimple_switch_index (swtch);
> +  m_switch_bb = swtch_bb;
> +
> +  unsigned HOST_WIDE_INT range_size_hwi = tree_to_uhwi (m_range_max)
> +    - tree_to_uhwi (m_range_min);
> +  m_range_size = build_int_cst (index_type, range_size_hwi);
> +
> +  m_cfg_altered = true;
> +
> +  m_contiguous_range = true;
> +  unsigned HOST_WIDE_INT last_hwi = tree_to_uhwi (CASE_LOW (first_label));
> +  for (i = 2; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
> +      m_contiguous_range &= last_hwi + 1 == label_hwi;
> +      last_hwi = label_hwi;
> +    }
> +
> +  m_exp_index_transform_applied = true;
> +}
> +
>  /* Checks whether the range given by individual case statements of the switch
>     switch statement isn't too big and whether the number of branches actually
>     satisfies the size of the new array.  */
> @@ -973,8 +1237,9 @@ switch_conversion::gen_inbound_check ()
>      bbf->count = e1f->count () + e2f->count ();
>
>    /* Tidy blocks that have become unreachable.  */
> -  prune_bbs (bbd, m_final_bb,
> -            m_default_case_nonstandard ? m_default_bb : NULL);
> +  bool prune_default_bb = !m_default_case_nonstandard
> +    && !m_exp_index_transform_applied;
> +  prune_bbs (bbd, m_final_bb, prune_default_bb ? NULL : m_default_bb);
>
>    /* Fixup the PHI nodes in bbF.  */
>    fix_phi_nodes (e1f, e2f, bbf);
> @@ -1053,8 +1318,19 @@ switch_conversion::expand (gswitch *swtch)
>        return;
>      }
>
> -  /* Check the case label values are within reasonable range:  */
> -  if (!check_range ())
> +  /* Sometimes it is possible to use the "exponential index transform" to help
> +     switch conversion convert switches which it otherwise could not convert.
> +     However, we want to do this transform only when we know that switch
> +     conversion will then really be able to convert the switch.  So we first
> +     check if the transformation is applicable and then maybe later do the
> +     transformation.  */
> +  bool exp_transform_viable = is_exp_index_transform_viable (swtch);
> +
> +  /* Check the case label values are within reasonable range.
> +
> +     If we will be doing exponential index transform, the range will be always
> +     reasonable.  */
> +  if (!exp_transform_viable && !check_range ())
>      {
>        gcc_assert (m_reason);
>        return;
> @@ -1076,6 +1352,9 @@ switch_conversion::expand (gswitch *swtch)
>    /* At this point all checks have passed and we can proceed with the
>       transformation.  */
>
> +  if (exp_transform_viable)
> +    exp_index_transform (swtch);
> +
>    create_temp_arrays ();
>    gather_default_values (m_default_case_nonstandard
>                          ? gimple_switch_label (swtch, 1)
> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
> index 6939eec6018..1a865f85f3a 100644
> --- a/gcc/tree-switch-conversion.h
> +++ b/gcc/tree-switch-conversion.h
> @@ -743,6 +743,19 @@ public:
>    /* Collection information about SWTCH statement.  */
>    void collect (gswitch *swtch);
>
> +  /* Check that the 'exponential index transform' can be applied.
> +
> +     See the comment at the function definition for more details.  */
> +  bool is_exp_index_transform_viable (gswitch *swtch);
> +
> +  /* Perform the 'exponential index transform'.
> +
> +     The exponential index transform shrinks the range of case numbers which
> +     helps switch conversion convert switches it otherwise could not.
> +
> +     See the comment at the function definition for more details.  */
> +  void exp_index_transform (gswitch *swtch);
> +
>    /* Checks whether the range given by individual case statements of the switch
>       switch statement isn't too big and whether the number of branches actually
>       satisfies the size of the new array.  */
> @@ -900,6 +913,11 @@ public:
>
>    /* True if CFG has been changed.  */
>    bool m_cfg_altered;
> +
> +  /* True if exponential index transform has been applied.  See the comment at
> +     the definition of exp_index_transform for details about the
> +     transformation.  */
> +  bool m_exp_index_transform_applied;
>  };
>
>  void
> --
> 2.45.0
>
>
Richard Biener June 11, 2024, 12:48 p.m. UTC | #2
On Thu, 30 May 2024, Filip Kastl wrote:

> Hi,
> 
> This patch adds a transformation into the switch conversion pass --
> the "exponential index transform".  This transformation can help switch
> conversion convert switches it otherwise could not.  The transformation is
> intended for switches whose cases are all powers of 2.  Here is a more detailed
> description of what and how this patch tries to address:
> 
> 
> The problem
> -----------
> 
> Consider this piece of C code
> 
> switch (i)
>   {
>     case (1 << 0): return 0;
>     case (1 << 1): return 1;
>     case (1 << 2): return 2;
>     ...
>     case (1 << 30): return 30;
>     default: return 31;
>   }
> 
> If i is a power of 2 (2^k), the switch just returns the exponent (k).  This can
> be seen as taking the base 2 logarithm of i or as returning the position of the
> singular 1 bit in i.
> 
> Currently, GCC fails to see this and doesn't optimize the switch in any way.
> 
> Switch conversion is able to optimize similar switches to an array lookup.
> This is not possible here because the range of cases is too wide.  Another
> optimization that switch conversion is able to do is the "linear
> transformation" -- switch conversion is able to notice a linear relationship
> between the index variable (variable i in the case) and the result value and
> rewrite switch to just an assignment (or multiple assignments in case of
> multiple result values). Sadly, linear transformation isn't applicable here
> because the linear relationship is based on the exponent of i, not on i itself.
> 
> 
> The solution
> ------------
> 
> The exponential index transformation does the following.  When it recognises
> that a switch only has case numbers that are powers of 2 it replaces them with
> their exponents.  It also replaces the index variable by its exponent.  This is
> done by inserting a statement that takes the logarithm of i and using the
> result as the new index variable.  Actually we use the FFS operation for this
> -- since we expect a power of two, we may just ask for the position of the
> first 1 bit.
> 
> We also need to insert a conditional that checks at runtime that the index
> variable is a power of two.  If it isn't, the resulting value should just be
> the default case value (31 in the example above).
> 
> With exponential index transform, switch conversion is able to simplify the
> above example into something like this
> 
> if (i is power of 2)
>   return log2(i); // actually implemented as ffs(i) - 1
> else
>   return 31;
> 
> Switch conversion bails if the range of case numbers is too big.  Exponential
> index transform shrinks this range (exponentially).  So even if there is no
> linear relationship in the switch, exponential index transform can still help
> convert the switch at least to an array lookup.
> 
> 
> Limitations
> -----------
> 
> Currently we only run the exponential index transform if the target has the
> POPCOUNT (for checking a number is a power of 2) and FFS (for taking the
> logarithm) instructions -- we check direct_internal_fn_supported_p () for
> POPCOUNT and FFS internal functions.  Otherwise maybe computing FFS could be
> less efficient than just using a jump table.  We try to avoid transforming a
> switch into a less efficient form.  Maybe this is too conservative and could be
> tweaked in the future.

I think you can use (x & -x) == x to check if x is power-of-two or zero
and avoid requiring .POPCOUNT.  If you know x is power-of-two the 
logarithm can also be computed with CTZ.

> 
> Bootstrapped and regtested on x86_64 linux.  I have additionally run bootstrap
> and regtest on a version where I removed the check that the target has the
> POPCOUNT and FFS instructions so that the transformation would be triggered
> more often.  That testing also went well.
> 
> Are there any things I should tweak?  Or is the patch ready to be applied?
> 
> Cheers,
> Filip Kastl
> 
> 
> -- 8< --
> 
> 
> Sometimes a switch has case numbers that are powers of 2.  Switch
> conversion usually isn't able to optimize switches.  This patch adds
> "exponential index transformation" to switch conversion.  After switch
> conversion applies this transformation on the switch the index variable
> of the switch becomes the exponent instead of the whole value.  For
> example:
> 
> switch (i)
>   {
>     case (1 << 0): return 0;
>     case (1 << 1): return 1;
>     case (1 << 2): return 2;
>     ...
>     case (1 << 30): return 30;
>     default: return 31;
>   }
> 
> gets transformed roughly into
> 
> switch (log2(i))
>   {
>     case 0: return 0;
>     case 1: return 1;
>     case 2: return 2;
>     ...
>     case 30: return 30;
>     default: return 31;
>   }
> 
> This enables switch conversion to further optimize the switch.
> 
> This patch only enables this transformation if there are optabs for
> POPCOUNT and FFS so that the base 2 logarithm can be computed
> efficiently at runtime.
> 
> gcc/ChangeLog:
> 
> 	* tree-switch-conversion.cc (switch_conversion::switch_conversion):
> 	Track if the transformation happened.
> 	(switch_conversion::is_exp_index_transform_viable): New function
> 	to decide whether the transformation should be applied.
> 	(switch_conversion::exp_index_transform): New function to
> 	execute the transformation.
> 	(switch_conversion::gen_inbound_check): Don't remove the default
> 	BB if the transformation happened.
> 	(switch_conversion::expand): Execute the transform if it is
> 	viable.  Skip test for sufficiently small case range if the
> 	transformation is going to be executed.
> 	* tree-switch-conversion.h: Add is_exp_index_transform viable
> 	and exp_index_transform.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/i386/switch-exp-transform-1.c: New test.
> 	* gcc.target/i386/switch-exp-transform-2.c: New test.
> 	* gcc.target/i386/switch-exp-transform-3.c: New test.
> 
> Signed-off-by: Filip Kastl <fkastl@suse.cz>
> ---
>  .../gcc.target/i386/switch-exp-transform-1.c  |  34 +++
>  .../gcc.target/i386/switch-exp-transform-2.c  |  36 +++
>  .../gcc.target/i386/switch-exp-transform-3.c  | 151 +++++++++
>  gcc/tree-switch-conversion.cc                 | 289 +++++++++++++++++-
>  gcc/tree-switch-conversion.h                  |  18 ++
>  5 files changed, 523 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
> 
> diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> new file mode 100644
> index 00000000000..d51dd110623
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */

I think it's better to enable -mpopcnt and -mbmi (or what remains
as minimal requirement).

> +
> +/* Checks that exponential index transform enables switch conversion to convert
> +   this switch into an array lookup.  Also checks that the "index variable is a
> +   power of two" check has been generated.  -march=znver3 because that
> +   microarchitecture supports POPCOUNT and FFS instructions and exponential
> +   index transform requires that.  */
> +
> +int foo(unsigned bar)
> +{
> +    switch (bar)
> +    {
> +        case (1 << 0):
> +            return 1;
> +        case (1 << 1):
> +            return 2;
> +        case (1 << 2):
> +            return 3;
> +        case (1 << 3):
> +            return 4;
> +        case (1 << 4):
> +            return 8;
> +        case (1 << 5):
> +            return 13;
> +        case (1 << 6):
> +            return 21;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
> +/* { dg-final { scan-tree-dump "POPCOUNT" "switchconv" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c b/gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c
> new file mode 100644
> index 00000000000..9f2b536aa74
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> +
> +/* Checks that when exponential index transform is viable but switch conversion
> +   decides that the switch cannot be converted, the exponential index transform
> +   is not done.  -march=znver3 because that microarchitecture supports POPCOUNT
> +   and FFS instructions and exponential index transform requires that.  */
> +
> +int a;
> +
> +int foo(unsigned bar)
> +{
> +    switch (bar)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            a = 3;
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Exponential index transform viable" "switchconv" } } */
> +/* { dg-final { scan-tree-dump-not "Applying exponential index transform" "switchconv" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
> new file mode 100644
> index 00000000000..e9665d4a38d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
> @@ -0,0 +1,151 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> +
> +/* Checks that the exponential index transformation is done for all these types
> +   of the index variable:
> +   - (unsigned) int
> +   - (unsigned) long
> +   - (unsigned) long long
> +
> +   -march=znver3 because that microarchitecture supports POPCOUNT
> +   and FFS instructions and exponential index transform requires that.  */
> +
> +int unopt_int(int bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_unsigned_int(unsigned int bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_long(long bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_unsigned_long(unsigned long bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_long_long(long long bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +int unopt_unsigned_long_long(unsigned long long bit_position)
> +{
> +    switch (bit_position)
> +    {
> +        case (1 << 0):
> +            return 0;
> +        case (1 << 1):
> +            return 1;
> +        case (1 << 2):
> +            return 2;
> +        case (1 << 3):
> +            return 3;
> +        case (1 << 4):
> +            return 4;
> +        case (1 << 5):
> +            return 5;
> +        case (1 << 6):
> +            return 6;
> +        default:
> +            return 0;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 "switchconv" } } */
> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
> index 3a5b84c09e2..975888c0969 100644
> --- a/gcc/tree-switch-conversion.cc
> +++ b/gcc/tree-switch-conversion.cc
> @@ -52,6 +52,8 @@ Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
>  #include "omp-general.h"
>  #include "gimple-range.h"
>  #include "tree-cfgcleanup.h"
> +#include "hwint.h"
> +#include "internal-fn.h"
>  
>  /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode
>     type in the GIMPLE type system that is language-independent?  */
> @@ -66,7 +68,8 @@ using namespace tree_switch_conversion;
>  switch_conversion::switch_conversion (): m_final_bb (NULL),
>    m_constructors (NULL), m_default_values (NULL),
>    m_arr_ref_first (NULL), m_arr_ref_last (NULL),
> -  m_reason (NULL), m_default_case_nonstandard (false), m_cfg_altered (false)
> +  m_reason (NULL), m_default_case_nonstandard (false), m_cfg_altered (false),
> +  m_exp_index_transform_applied (false)
>  {
>  }
>  
> @@ -202,6 +205,267 @@ switch_conversion::collect (gswitch *swtch)
>      }
>  }
>  
> +/* Check that the "exponential index transform" can be applied to this switch.
> +
> +   See comment of the exp_index_transform function for details about this
> +   transformation.
> +
> +   We want:
> +   - This form of the switch is more efficient
> +   - Cases are powers of 2
> +
> +   Expects that SWTCH has at least one case.  */
> +
> +bool
> +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> +{
> +  tree index = gimple_switch_index (swtch);
> +  tree index_type = TREE_TYPE (index);
> +  basic_block swtch_bb = gimple_bb (swtch);
> +  unsigned num_labels = gimple_switch_num_labels (swtch);
> +
> +  /* Check that we can efficiently compute logarithm of 2^k (using FFS) and
> +     test that a given number is 2^k for some k (using POPCOUNT).  */
> +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> +    || !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, opt_type))
> +    return false;
> +

See above, I think this can be improved.  Would be nice to split out
a can_pow2p (type) and can_log2 (type) and a corresponding
gen_pow2p (op) and gen_log2 (op) function so this could be re-used
and alternate variants added when required.

I think we already demote the switch index type as much as possible
but for FFS/POPCOUNT availability it might be useful to check whether
a wider type has target support for this.  I'd expect this to be
an issue when the index type is short or char (not sure if that
happens).  This is what Andrew means I think.

> +  /* Check that each case label corresponds only to one value
> +     (no case 1..3).  */
> +  unsigned i;
> +  for (i = 1; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      if (CASE_HIGH (label))
> +	return false;
> +    }
> +
> +  /* Check that each label is nonnegative.  */
> +  for (i = 1; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      if (!tree_fits_uhwi_p (CASE_LOW (label)))
> +	return false;
> +    }
> +
> +  /* Check that each case is a power of 2.  */
> +  for (i = 1; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
> +      if (!pow2p_hwi (label_hwi))
> +	return false;
> +    }

Can you merge the loop bodies please?  There's no good reason to
iterate over labels multiple times.

As future enhancement it would be possible to handle a more
general [-]pow2(x) +- constant by switching on log2([-x] -+ constant).
I guess we already handle linear scaling/shifting thus A * (x +- C1) +- 
C2.

> +  if (dump_file)
> +    fprintf (dump_file, "Exponential index transform viable\n");
> +
> +  return true;
> +}
> +
> +/* Perform the "exponential index transform".
> +
> +   Assume that cases of SWTCH are powers of 2.  The transformation replaces the
> +   cases by their exponents (2^k -> k).  It also inserts a statement that
> +   computes the exponent of the original index variable (basically taking the
> +   logarithm) and then sets the result as the new index variable.
> +
> +   The transformation also inserts a conditional statement checking that the
> +   incoming original index variable is a power of 2 with the false edge leading
> +   to the default case.
> +
> +   The exponential index transform shrinks the range of case numbers which
> +   helps switch conversion convert switches it otherwise could not.
> +
> +   Consider for example:
> +
> +   switch (i)
> +     {
> +       case (1 << 0): return 0;
> +       case (1 << 1): return 1;
> +       case (1 << 2): return 2;
> +       ...
> +       case (1 << 30): return 30;
> +       default: return 31;
> +     }
> +
> +   First, exponential index transform gets applied.  Since each case becomes
> +   case x: return x;, the rest of switch conversion is then able to get rid of
> +   the switch statement.
> +
> +   if (i is power of 2)
> +     return log2 (i);
> +   else
> +     return 31;
> +
> +   */
> +
> +void
> +switch_conversion::exp_index_transform (gswitch *swtch)
> +{
> +  if (dump_file)
> +    fprintf (dump_file, "Applying exponential index transform\n");
> +
> +  tree index = gimple_switch_index (swtch);
> +  tree index_type = TREE_TYPE (index);
> +  basic_block swtch_bb = gimple_bb (swtch);
> +  unsigned num_labels = gimple_switch_num_labels (swtch);
> +  tree one = build_one_cst (index_type);
> +
> +  /* Insert a cond stmt that checks if the index variable is a power of 2.  */
> +  gimple_stmt_iterator gsi = gsi_for_stmt (swtch);
> +  gsi_prev (&gsi);
> +  gimple *foo = gsi_stmt (gsi);
> +  edge new_edge1 = split_block (swtch_bb, foo);
> +
> +  swtch_bb = new_edge1->dest;
> +  basic_block cond_bb = new_edge1->src;
> +  new_edge1->flags |= EDGE_TRUE_VALUE;
> +  new_edge1->flags &= ~EDGE_FALLTHRU;
> +  new_edge1->probability = profile_probability::even ();
> +
> +  basic_block default_bb = gimple_switch_default_bb (cfun, swtch);
> +  edge new_edge2 = make_edge (cond_bb, default_bb, EDGE_FALSE_VALUE);
> +  new_edge2->probability = profile_probability::even ();
> +
> +  gsi = gsi_last_bb (cond_bb);
> +
> +  tree tmp1 = make_ssa_name (index_type);
> +  gcall *stmt_popcount = gimple_build_call_internal (IFN_POPCOUNT, 2, index,
> +						     one);
> +  gimple_call_set_lhs (stmt_popcount, tmp1);
> +  gsi_insert_after (&gsi, stmt_popcount, GSI_NEW_STMT);
> +
> +  gcond *stmt_cond = gimple_build_cond (EQ_EXPR, tmp1, one, NULL, NULL);
> +  gsi_insert_after (&gsi, stmt_cond, GSI_NEW_STMT);
> +
> +  /* We just added an edge going to default bb so fix PHI nodes in that bb:
> +     For each PHI add new PHI arg.  It will be the same arg as when comming to
> +     the default bb from the switch bb.  */
> +  edge default_edge = find_edge (swtch_bb, default_bb);
> +  for (gphi_iterator gsi = gsi_start_phis (default_bb);
> +       !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gphi *phi = gsi.phi ();
> +      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, default_edge);
> +      location_t loc = gimple_phi_arg_location_from_edge (phi, default_edge);
> +      add_phi_arg (phi, arg, new_edge2, loc);
> +    }
> +
> +  /* Insert a statement that takes the logarithm of the index variable.  */
> +  tree tmp2 = make_ssa_name (index_type);
> +  gsi = gsi_start_bb (swtch_bb);

Please use gsi_after_labels (swtch_bb) even though you know there's no
labels there.

> +  gcall *stmt_ffs = gimple_build_call_internal (IFN_FFS, 1, index);
> +  gimple_call_set_lhs (stmt_ffs, tmp2);
> +  gsi_insert_before (&gsi, stmt_ffs, GSI_SAME_STMT);
> +
> +  tree tmp3 = make_ssa_name (index_type);
> +  gassign *stmt_minus_one = gimple_build_assign (tmp3, MINUS_EXPR, tmp2, one);
> +  gsi_insert_before (&gsi, stmt_minus_one, GSI_SAME_STMT);

You could also use

     tree tmp2 = gimple_build (gsi, true, GSI_SAME_STMT,
			       UNKNOWN_LOCATION, IFN_FFS, index);
     tree tmp3 = gimple_build (gsi, true, GSI_SAME_STMT,
                               UNKNOWN_LOCATION, MINUS_EXPR, tmp2, one);

which does the stmt building, temporary SSA name creation and insertion
plus eventually folding things with a definition.  There isn't a
gimple_build_cond with this API, but it would still work for the
popcount build above.

Of course as said splitting this out would make extending it to handle
alternate code generation when popcount or ffs are not available
easier.

> +  /* Use the result of the logarithm as the new index variable.  */
> +  gimple_switch_set_index (swtch, tmp3);
> +  update_stmt (swtch);
> +
> +  /* Replace each case number with its logarithm.  */
> +  unsigned i;
> +  for (i = 1; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
> +      CASE_LOW (label) = build_int_cst (index_type, floor_log2 
(label_hwi));

You should be able to directly use

         CASE_LOW (label) = tree_log2 (CASE_LOW (label));

Do you have test coverage for a signed index and a 0x80000000 case?
It would satisfy popcount and ffs should work but the constant
doesnt fit an uhwi and likely tree_log2 would ICE on it.

As Andrew said working with wide_int might be best though signedness
might interfere here as well.

> +    }
> +
> +  /* Fix the dominator tree, if it is available.  */
> +  if (dom_info_available_p (CDI_DOMINATORS))
> +    {
> +      /* Analysis of how dominators should look after we add the edge E going
> +	 from the cond block to the default block.
> +
> +	 1 For the blocks between the switch block and the final block
> +	 (excluding the final block itself):  They had the switch block as
> +	 their immediate dominator.  That shouldn't change.
> +
> +	 2 The final block may now have the switch block or the cond block as
> +	 its immediate dominator.  There's no easy way of knowing (consider
> +	 two cases where in both m_default_case_nonstandard = true, in one a
> +	 path through default intersects the final block and in one all paths
> +	 through default avoid the final block but intersect a successor of the
> +	 final block).
> +
> +	 3 Other blocks that had the switch block as their immediate dominator
> +	 should now have the cond block as their immediate dominator.
> +
> +	 4 Immediate dominators of the rest of the blocks shouldn't change.
> +
> +	 Reasoning for 3 and 4:
> +
> +	 We'll only consider blocks that do not fall into 1 or 2.
> +
> +	 Consider a block X whose original imm dom was the switch block.  All
> +	 paths to X must also intersect the cond block since it's the only
> +	 pred of the switch block.  The final block doesn't dominate X so at
> +	 least one path P must lead through the default block.  Let P' be P but
> +	 instead of going through the switch block, take E.  The switch block
> +	 doesn't dominate X so its imm dom must now be the cond block.
> +
> +	 Consider a block X whose original imm dom was Y != the switch block.
> +	 We only added an edge so all original paths to X are still present.
> +	 So X gained no new dominators.  Observe that Y still dominates X.
> +	 There would have to be a path that avoids Y otherwise.  But any block
> +	 we can avoid now except for the switch block we were able to avoid
> +	 before adding E.  */
> +
> +      redirect_immediate_dominators (CDI_DOMINATORS, swtch_bb, cond_bb);
> +
> +      /* FIXME: Since exponential transform is run only if we know that the
> +	 switch will be converted, we know these blocks will be removed and we
> +	 maybe don't have to bother updating their dominators.  */

It's good to not leave the IL in an intermediate broken state.

> +      edge e;
> +      edge_iterator ei;
> +      FOR_EACH_EDGE (e, ei, swtch_bb->succs)
> +	{
> +	  basic_block bb = e->dest;
> +	  if (bb == m_final_bb || bb == default_bb)
> +	    continue;
> +	  set_immediate_dominator (CDI_DOMINATORS, bb, swtch_bb);

If there's an alternate edge into the cases, thus the original
dominator wasn't the swtch_bb the dominator shouldn't change.
I wonder why it would change at all - you are only creating
a new edge into the default case so only that block dominance
relationship changes?

> +	}
> +
> +      vec<basic_block> v;
> +      v.create (1);
> +      v.quick_push (m_final_bb);
> +      iterate_fix_dominators (CDI_DOMINATORS, v, true);

The final BB should have the condition block as immediate dominator
if it's old immediate dominator was the switch block, otherwise
it's immediate dominator shouldn't change.

> +    }
> +
> +  /* Update information about the switch statement.  */
> +  tree first_label = gimple_switch_label (swtch, 1);
> +  tree last_label = gimple_switch_label (swtch, num_labels - 1);
> +
> +  m_range_min = CASE_LOW (first_label);
> +  m_range_max = CASE_LOW (last_label);
> +  m_index_expr = gimple_switch_index (swtch);
> +  m_switch_bb = swtch_bb;
> +
> +  unsigned HOST_WIDE_INT range_size_hwi = tree_to_uhwi (m_range_max)
> +    - tree_to_uhwi (m_range_min);
> +  m_range_size = build_int_cst (index_type, range_size_hwi);

Use int_const_binop (MINUS_EXPR, m_range_max, m_range_min) as is done
elsewhere (m_range_size should be a wide_int, but that's out of scope 
here).

Otherwise the patch looks good to me.  You can leave the requirement
for popcount/ffs but please split out the check and code-gen.

Thanks,
Richard.

> +
> +  m_cfg_altered = true;
> +
> +  m_contiguous_range = true;
> +  unsigned HOST_WIDE_INT last_hwi = tree_to_uhwi (CASE_LOW (first_label));
> +  for (i = 2; i < num_labels; i++)
> +    {
> +      tree label = gimple_switch_label (swtch, i);
> +      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
> +      m_contiguous_range &= last_hwi + 1 == label_hwi;
> +      last_hwi = label_hwi;
> +    }
> +
> +  m_exp_index_transform_applied = true;
> +}
> +
>  /* Checks whether the range given by individual case statements of the switch
>     switch statement isn't too big and whether the number of branches actually
>     satisfies the size of the new array.  */
> @@ -973,8 +1237,9 @@ switch_conversion::gen_inbound_check ()
>      bbf->count = e1f->count () + e2f->count ();
>  
>    /* Tidy blocks that have become unreachable.  */
> -  prune_bbs (bbd, m_final_bb,
> -	     m_default_case_nonstandard ? m_default_bb : NULL);
> +  bool prune_default_bb = !m_default_case_nonstandard
> +    && !m_exp_index_transform_applied;
> +  prune_bbs (bbd, m_final_bb, prune_default_bb ? NULL : m_default_bb);
>  
>    /* Fixup the PHI nodes in bbF.  */
>    fix_phi_nodes (e1f, e2f, bbf);
> @@ -1053,8 +1318,19 @@ switch_conversion::expand (gswitch *swtch)
>        return;
>      }
>  
> -  /* Check the case label values are within reasonable range:  */
> -  if (!check_range ())
> +  /* Sometimes it is possible to use the "exponential index transform" to help
> +     switch conversion convert switches which it otherwise could not convert.
> +     However, we want to do this transform only when we know that switch
> +     conversion will then really be able to convert the switch.  So we first
> +     check if the transformation is applicable and then maybe later do the
> +     transformation.  */
> +  bool exp_transform_viable = is_exp_index_transform_viable (swtch);
> +
> +  /* Check the case label values are within reasonable range.
> +
> +     If we will be doing exponential index transform, the range will be always
> +     reasonable.  */
> +  if (!exp_transform_viable && !check_range ())
>      {
>        gcc_assert (m_reason);
>        return;
> @@ -1076,6 +1352,9 @@ switch_conversion::expand (gswitch *swtch)
>    /* At this point all checks have passed and we can proceed with the
>       transformation.  */
>  
> +  if (exp_transform_viable)
> +    exp_index_transform (swtch);
> +
>    create_temp_arrays ();
>    gather_default_values (m_default_case_nonstandard
>  			 ? gimple_switch_label (swtch, 1)
> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
> index 6939eec6018..1a865f85f3a 100644
> --- a/gcc/tree-switch-conversion.h
> +++ b/gcc/tree-switch-conversion.h
> @@ -743,6 +743,19 @@ public:
>    /* Collection information about SWTCH statement.  */
>    void collect (gswitch *swtch);
>  
> +  /* Check that the 'exponential index transform' can be applied.
> +
> +     See the comment at the function definition for more details.  */
> +  bool is_exp_index_transform_viable (gswitch *swtch);
> +
> +  /* Perform the 'exponential index transform'.
> +
> +     The exponential index transform shrinks the range of case numbers which
> +     helps switch conversion convert switches it otherwise could not.
> +
> +     See the comment at the function definition for more details.  */
> +  void exp_index_transform (gswitch *swtch);
> +
>    /* Checks whether the range given by individual case statements of the switch
>       switch statement isn't too big and whether the number of branches actually
>       satisfies the size of the new array.  */
> @@ -900,6 +913,11 @@ public:
>  
>    /* True if CFG has been changed.  */
>    bool m_cfg_altered;
> +
> +  /* True if exponential index transform has been applied.  See the comment at
> +     the definition of exp_index_transform for details about the
> +     transformation.  */
> +  bool m_exp_index_transform_applied;
>  };
>  
>  void
>
Filip Kastl July 8, 2024, 3:06 p.m. UTC | #3
Hi,

I'm replying to Richard and keeping Andrew in cc since your suggestions
overlap.


On Tue 2024-06-11 14:48:06, Richard Biener wrote:
> On Thu, 30 May 2024, Filip Kastl wrote:
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> 
> I think it's better to enable -mpopcnt and -mbmi (or what remains
> as minimal requirement).

Will do.  Currently the testcases are in the i386 directory.  After I exchange
the -march for -mpopcnt -mbmi can I put these testcases into gcc.dg/tree-ssa?
Will the -mpopcnt -mbmi options work with all target architectures?

> > +/* Check that the "exponential index transform" can be applied to this switch.
> > +
> > +   See comment of the exp_index_transform function for details about this
> > +   transformation.
> > +
> > +   We want:
> > +   - This form of the switch is more efficient
> > +   - Cases are powers of 2
> > +
> > +   Expects that SWTCH has at least one case.  */
> > +
> > +bool
> > +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> > +{
> > +  tree index = gimple_switch_index (swtch);
> > +  tree index_type = TREE_TYPE (index);
> > +  basic_block swtch_bb = gimple_bb (swtch);
> > +  unsigned num_labels = gimple_switch_num_labels (swtch);
> > +
> > +  /* Check that we can efficiently compute logarithm of 2^k (using FFS) and
> > +     test that a given number is 2^k for some k (using POPCOUNT).  */
> > +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> > +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> > +    || !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, opt_type))
> > +    return false;
> > +
> 
> See above, I think this can be improved.  Would be nice to split out
> a can_pow2p (type) and can_log2 (type) and a corresponding
> gen_pow2p (op) and gen_log2 (op) function so this could be re-used
> and alternate variants added when required.
> 

Just to check that I understand this correctly:  You'd like me to create
functions can_pow2p, can_log2.  Those functions will check that there are
optabs for the target machine which allow us to efficiently test
power-of-2-ness of a number and which allow us to efficiently compute the
base-2 log of a power-of-2 number.  You'd also like me to create functions
gen_pow2p and gen_log2 which generate this code.  For now these functions will
just use POPCOUNT and FFS but they can be later extended to also consider
different instructions.  Is that right?

Into which file should I put these functions?

Is can_pow2p and gen_pow2p necessary?  As you noted one can always use
(x & -x) == x so testing pow2p can always be done efficiently.

> > +  /* Insert a statement that takes the logarithm of the index variable.  */
> > +  tree tmp2 = make_ssa_name (index_type);
> > +  gsi = gsi_start_bb (swtch_bb);
> 
> Please use gsi_after_labels (swtch_bb) even though you know there's no
> labels there.
> 
> > +  gcall *stmt_ffs = gimple_build_call_internal (IFN_FFS, 1, index);
> > +  gimple_call_set_lhs (stmt_ffs, tmp2);
> > +  gsi_insert_before (&gsi, stmt_ffs, GSI_SAME_STMT);
> > +
> > +  tree tmp3 = make_ssa_name (index_type);
> > +  gassign *stmt_minus_one = gimple_build_assign (tmp3, MINUS_EXPR, tmp2, one);
> > +  gsi_insert_before (&gsi, stmt_minus_one, GSI_SAME_STMT);
> 
> You could also use
> 
>      tree tmp2 = gimple_build (gsi, true, GSI_SAME_STMT,
> 			       UNKNOWN_LOCATION, IFN_FFS, index);
>      tree tmp3 = gimple_build (gsi, true, GSI_SAME_STMT,
>                                UNKNOWN_LOCATION, MINUS_EXPR, tmp2, one);
> 
> which does the stmt building, temporary SSA name creation and insertion
> plus eventually folding things with a definition.  There isn't a
> gimple_build_cond with this API, but it would still work for the
> popcount build above.

I've tried using the gimple_build API and it is indeed nicer.  However, I
wasn't able to get it working with the FFS internal function.  IFN_FFS is of a
different type than what gimple_build accepts.  I've tried this

  tree tmp2 = gimple_build (&gsi, true, GSI_SAME_STMT, UNKNOWN_LOCATION,
                            as_combined_fn (IFN_FFS), index);

but that only caused an ICE.  I tried looking for an example of using
gimple_build to build an internal function call in the sources but I haven't
found any.

If you'd be willing to help me with this, I will happily use the gimple_build
API.  If I don't figure this out I will just stick with the less clean original
version.

> > +  /* Use the result of the logarithm as the new index variable.  */
> > +  gimple_switch_set_index (swtch, tmp3);
> > +  update_stmt (swtch);
> > +
> > +  /* Replace each case number with its logarithm.  */
> > +  unsigned i;
> > +  for (i = 1; i < num_labels; i++)
> > +    {
> > +      tree label = gimple_switch_label (swtch, i);
> > +      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
> > +      CASE_LOW (label) = build_int_cst (index_type, floor_log2 
> (label_hwi));
> 
> You should be able to directly use
> 
>          CASE_LOW (label) = tree_log2 (CASE_LOW (label));
> 
> Do you have test coverage for a signed index and a 0x80000000 case?
> It would satisfy popcount and ffs should work but the constant
> doesnt fit an uhwi and likely tree_log2 would ICE on it.

A situation with signed index variable and a 0x80000000 case shouldn't be a
problem since I check all cases for tree_fits_uhwi_p.  To make sure I made this
testcase

int foo()    
{    
    int a, b;    
    switch (a)    
    {    
        case 0x10000000:    
            b = 0;    
            break;    
        case 0x20000000:    
            b = 1;    
            break;    
        case 0x40000000:    
            b = 2;    
            break;    
        case 0x80000000:    
            b = 3;    
            break;    
    }    
    return b;    
}

and indeed, is_exp_transform_viable returned false due to the last case.  No
ICE happened.  I added this testcase to my personal set of testcases that I'll
run on subsequent versions of this patch.

> 
> As Andrew said working with wide_int might be best though signedness
> might interfere here as well.
> 

Ok, I'm looking into using wide_ints instead of HOST_WIDE_INTs.  However, I
need some help:  How should I check that case numbers are nonnegative?  If I
understand it correctly wide_ints don't carry signedness information so I'll
have to somehow check nonnegativity of trees, right?  Should I continue using
tree_fits_uhwi_p for this?

> > +    }
> > +
> > +  /* Update information about the switch statement.  */
> > +  tree first_label = gimple_switch_label (swtch, 1);
> > +  tree last_label = gimple_switch_label (swtch, num_labels - 1);
> > +
> > +  m_range_min = CASE_LOW (first_label);
> > +  m_range_max = CASE_LOW (last_label);
> > +  m_index_expr = gimple_switch_index (swtch);
> > +  m_switch_bb = swtch_bb;
> > +
> > +  unsigned HOST_WIDE_INT range_size_hwi = tree_to_uhwi (m_range_max)
> > +    - tree_to_uhwi (m_range_min);
> > +  m_range_size = build_int_cst (index_type, range_size_hwi);
> 
> Use int_const_binop (MINUS_EXPR, m_range_max, m_range_min) as is done
> elsewhere (m_range_size should be a wide_int, but that's out of scope 
> here).

Just to check that I understand the bit in parentheses correctly:  You think
that the switch conversion pass should represent m_range_size as a wide_int
instead of a tree.  However, that isn't something that should be changed in my
patch (it could be changed in some future patch instead), right?


Thanks for the suggestions.

Cheers,
Filip Kastl
Richard Biener July 10, 2024, 9:34 a.m. UTC | #4
On Mon, 8 Jul 2024, Filip Kastl wrote:

> Hi,
> 
> I'm replying to Richard and keeping Andrew in cc since your suggestions
> overlap.
> 
> 
> On Tue 2024-06-11 14:48:06, Richard Biener wrote:
> > On Thu, 30 May 2024, Filip Kastl wrote:
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> > 
> > I think it's better to enable -mpopcnt and -mbmi (or what remains
> > as minimal requirement).
> 
> Will do.  Currently the testcases are in the i386 directory.  After I exchange
> the -march for -mpopcnt -mbmi can I put these testcases into gcc.dg/tree-ssa?
> Will the -mpopcnt -mbmi options work with all target architectures?

No, those are i386 specific flags.  At least for popcount there's
dejagnu effective targets popcount, popcountl and popcountll so you
could do

/* { dg-additional-options "-mpopcnt" { target { x86_64-*-* i?86-*-* } } } 
*/

and guard the tree dump scan with { target popcount } to cover other
archs that have popcount (without adding extra flags).

> > > +/* Check that the "exponential index transform" can be applied to this switch.
> > > +
> > > +   See comment of the exp_index_transform function for details about this
> > > +   transformation.
> > > +
> > > +   We want:
> > > +   - This form of the switch is more efficient
> > > +   - Cases are powers of 2
> > > +
> > > +   Expects that SWTCH has at least one case.  */
> > > +
> > > +bool
> > > +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> > > +{
> > > +  tree index = gimple_switch_index (swtch);
> > > +  tree index_type = TREE_TYPE (index);
> > > +  basic_block swtch_bb = gimple_bb (swtch);
> > > +  unsigned num_labels = gimple_switch_num_labels (swtch);
> > > +
> > > +  /* Check that we can efficiently compute logarithm of 2^k (using FFS) and
> > > +     test that a given number is 2^k for some k (using POPCOUNT).  */
> > > +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> > > +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> > > +    || !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, opt_type))
> > > +    return false;
> > > +
> > 
> > See above, I think this can be improved.  Would be nice to split out
> > a can_pow2p (type) and can_log2 (type) and a corresponding
> > gen_pow2p (op) and gen_log2 (op) function so this could be re-used
> > and alternate variants added when required.
> > 
> 
> Just to check that I understand this correctly:  You'd like me to create
> functions can_pow2p, can_log2.  Those functions will check that there are
> optabs for the target machine which allow us to efficiently test
> power-of-2-ness of a number and which allow us to efficiently compute the
> base-2 log of a power-of-2 number.  You'd also like me to create functions
> gen_pow2p and gen_log2 which generate this code.  For now these functions will
> just use POPCOUNT and FFS but they can be later extended to also consider
> different instructions.  Is that right?

Right.

> Into which file should I put these functions?

Just in this file for now.
 
> Is can_pow2p and gen_pow2p necessary?  As you noted one can always use
> (x & -x) == x so testing pow2p can always be done efficiently.

If you add this fallback then can_pow2p / gen_pow2p wouldn't be
necessary indeed.

> > > +  /* Insert a statement that takes the logarithm of the index variable.  */
> > > +  tree tmp2 = make_ssa_name (index_type);
> > > +  gsi = gsi_start_bb (swtch_bb);
> > 
> > Please use gsi_after_labels (swtch_bb) even though you know there's no
> > labels there.
> > 
> > > +  gcall *stmt_ffs = gimple_build_call_internal (IFN_FFS, 1, index);
> > > +  gimple_call_set_lhs (stmt_ffs, tmp2);
> > > +  gsi_insert_before (&gsi, stmt_ffs, GSI_SAME_STMT);
> > > +
> > > +  tree tmp3 = make_ssa_name (index_type);
> > > +  gassign *stmt_minus_one = gimple_build_assign (tmp3, MINUS_EXPR, tmp2, one);
> > > +  gsi_insert_before (&gsi, stmt_minus_one, GSI_SAME_STMT);
> > 
> > You could also use
> > 
> >      tree tmp2 = gimple_build (gsi, true, GSI_SAME_STMT,
> > 			       UNKNOWN_LOCATION, IFN_FFS, index);
> >      tree tmp3 = gimple_build (gsi, true, GSI_SAME_STMT,
> >                                UNKNOWN_LOCATION, MINUS_EXPR, tmp2, one);
> > 
> > which does the stmt building, temporary SSA name creation and insertion
> > plus eventually folding things with a definition.  There isn't a
> > gimple_build_cond with this API, but it would still work for the
> > popcount build above.
> 
> I've tried using the gimple_build API and it is indeed nicer.  However, I
> wasn't able to get it working with the FFS internal function.  IFN_FFS is of a
> different type than what gimple_build accepts.  I've tried this
> 
>   tree tmp2 = gimple_build (&gsi, true, GSI_SAME_STMT, UNKNOWN_LOCATION,
>                             as_combined_fn (IFN_FFS), index);
>
> but that only caused an ICE.  I tried looking for an example of using
> gimple_build to build an internal function call in the sources but I haven't
> found any.
> 
> If you'd be willing to help me with this, I will happily use the gimple_build
> API.  If I don't figure this out I will just stick with the less clean original
> version.

I think you need to use

  tree tmp2 = gimple_build (&gsi, true, GSI_SAME_STMT, UNKNOWN_LOCATION,
			    IFN_FFS, integer_type_node, index);

that is, you always need to specify the type of the value produced
(here the return type of ffs).
 
> > > +  /* Use the result of the logarithm as the new index variable.  */
> > > +  gimple_switch_set_index (swtch, tmp3);
> > > +  update_stmt (swtch);
> > > +
> > > +  /* Replace each case number with its logarithm.  */
> > > +  unsigned i;
> > > +  for (i = 1; i < num_labels; i++)
> > > +    {
> > > +      tree label = gimple_switch_label (swtch, i);
> > > +      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
> > > +      CASE_LOW (label) = build_int_cst (index_type, floor_log2 
> > (label_hwi));
> > 
> > You should be able to directly use
> > 
> >          CASE_LOW (label) = tree_log2 (CASE_LOW (label));
> > 
> > Do you have test coverage for a signed index and a 0x80000000 case?
> > It would satisfy popcount and ffs should work but the constant
> > doesnt fit an uhwi and likely tree_log2 would ICE on it.
> 
> A situation with signed index variable and a 0x80000000 case shouldn't be a
> problem since I check all cases for tree_fits_uhwi_p.  To make sure I made this
> testcase
> 
> int foo()    
> {    
>     int a, b;    
>     switch (a)    
>     {    
>         case 0x10000000:    
>             b = 0;    
>             break;    
>         case 0x20000000:    
>             b = 1;    
>             break;    
>         case 0x40000000:    
>             b = 2;    
>             break;    
>         case 0x80000000:    
>             b = 3;    
>             break;    
>     }    
>     return b;    
> }
> 
> and indeed, is_exp_transform_viable returned false due to the last case.  No
> ICE happened.  I added this testcase to my personal set of testcases that I'll
> run on subsequent versions of this patch.
> 
> > 
> > As Andrew said working with wide_int might be best though signedness
> > might interfere here as well.
> > 
> 
> Ok, I'm looking into using wide_ints instead of HOST_WIDE_INTs.  However, I
> need some help:  How should I check that case numbers are nonnegative?  If I
> understand it correctly wide_ints don't carry signedness information so I'll
> have to somehow check nonnegativity of trees, right?  Should I continue using
> tree_fits_uhwi_p for this?

You can use wi::ge_p (wide-int, 0, TYPE_SIGN (type-of-constant)) that
will then dispatch to ge{s,u}_p based on whether the number is signed
or unsigned.

> > > +    }
> > > +
> > > +  /* Update information about the switch statement.  */
> > > +  tree first_label = gimple_switch_label (swtch, 1);
> > > +  tree last_label = gimple_switch_label (swtch, num_labels - 1);
> > > +
> > > +  m_range_min = CASE_LOW (first_label);
> > > +  m_range_max = CASE_LOW (last_label);
> > > +  m_index_expr = gimple_switch_index (swtch);
> > > +  m_switch_bb = swtch_bb;
> > > +
> > > +  unsigned HOST_WIDE_INT range_size_hwi = tree_to_uhwi (m_range_max)
> > > +    - tree_to_uhwi (m_range_min);
> > > +  m_range_size = build_int_cst (index_type, range_size_hwi);
> > 
> > Use int_const_binop (MINUS_EXPR, m_range_max, m_range_min) as is done
> > elsewhere (m_range_size should be a wide_int, but that's out of scope 
> > here).
> 
> Just to check that I understand the bit in parentheses correctly:  You think
> that the switch conversion pass should represent m_range_size as a wide_int
> instead of a tree.  However, that isn't something that should be changed in my
> patch (it could be changed in some future patch instead), right?

Right.

Thanks,
Richard.

> 
> Thanks for the suggestions.
> 
> Cheers,
> Filip Kastl
>
Filip Kastl July 11, 2024, 3:03 p.m. UTC | #5
> > > > +/* Check that the "exponential index transform" can be applied to this switch.
> > > > +
> > > > +   See comment of the exp_index_transform function for details about this
> > > > +   transformation.
> > > > +
> > > > +   We want:
> > > > +   - This form of the switch is more efficient
> > > > +   - Cases are powers of 2
> > > > +
> > > > +   Expects that SWTCH has at least one case.  */
> > > > +
> > > > +bool
> > > > +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> > > > +{
> > > > +  tree index = gimple_switch_index (swtch);
> > > > +  tree index_type = TREE_TYPE (index);
> > > > +  basic_block swtch_bb = gimple_bb (swtch);
> > > > +  unsigned num_labels = gimple_switch_num_labels (swtch);
> > > > +
> > > > +  /* Check that we can efficiently compute logarithm of 2^k (using FFS) and
> > > > +     test that a given number is 2^k for some k (using POPCOUNT).  */
> > > > +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> > > > +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> > > > +    || !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, opt_type))
> > > > +    return false;
> > > > +
> > > 
> > > See above, I think this can be improved.  Would be nice to split out
> > > a can_pow2p (type) and can_log2 (type) and a corresponding
> > > gen_pow2p (op) and gen_log2 (op) function so this could be re-used
> > > and alternate variants added when required.
> > > 
> > 
> > Just to check that I understand this correctly:  You'd like me to create
> > functions can_pow2p, can_log2.  Those functions will check that there are
> > optabs for the target machine which allow us to efficiently test
> > power-of-2-ness of a number and which allow us to efficiently compute the
> > base-2 log of a power-of-2 number.  You'd also like me to create functions
> > gen_pow2p and gen_log2 which generate this code.  For now these functions will
> > just use POPCOUNT and FFS but they can be later extended to also consider
> > different instructions.  Is that right?
> 
> Right.
> 
> > Into which file should I put these functions?
> 
> Just in this file for now.
>  
> > Is can_pow2p and gen_pow2p necessary?  As you noted one can always use
> > (x & -x) == x so testing pow2p can always be done efficiently.
> 
> If you add this fallback then can_pow2p / gen_pow2p wouldn't be
> necessary indeed.

Hi Richard,

I put some work into splitting out the can_ and gen_ functions as you
suggested.  I'm still a bit unsure what your vision of these is so before I
submit all the changes I made to the patch as version 2 I would like to share
how I implemented the functions (see bellow).  Is this how you imagined the
functions?  Would you change something or do the they look ok?

I wasn't sure how generic to make the functions.  The more generic the more
convoluted the implementation becomes.  For example: I could make them more
generic by also including a gsi_iterator_update parameter or I could make them
less generic but more straightforward by removing the BEFORE parameter.

Cheers,
Filip Kastl


/* Does the target have optabs needed to efficiently compute exact base two
   logarithm of a value with type TYPE?

   See gen_log2.  */

static bool
can_log2 (tree type, optimization_type opt_type)
{
  /* Check if target supports FFS.  */
  return direct_internal_fn_supported_p (IFN_FFS, type, opt_type);
}

/* Assume that OP is a power of two.  Build a sequence of gimple statements
   efficiently computing the base two logarithm of OP using special optabs.
   Insert statements before GSI if BEFORE is true or after GSI otherwise.
   Return the result value as an ssa name tree.

   Leave GSI at the same statement (GSI_SAME_STMT behavior).

   Should only be used if target supports the needed optabs.  See can_log2.  */

static tree
gen_log2 (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op)
{
  /* Use .FFS (op) - 1.  */
  gimple *s = gsi->ptr;
  tree type = TREE_TYPE (op);
  gsi_iterator_update update = before ? GSI_SAME_STMT : GSI_NEW_STMT;
  tree tmp1 = gimple_build (gsi, before, update, loc, IFN_FFS, type, op);
  tree tmp2 = gimple_build (gsi, before, update, loc, MINUS_EXPR, type, tmp1,
                            build_one_cst (type));
  gsi->ptr = s;
  return tmp2;
}

/* Build a sequence of gimple statements checking that OP is a power of 2.  Use
   special optabs if targets supports them.  Insert statements before GSI if
   BEFORE is true or after GSI otherwise.  Return the result value as a
   boolean_type_node ssa name tree.

   Leave GSI at the same statement (GSI_SAME_STMT behavior).  */

static tree
gen_pow2p (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op)
{
  tree result;

  /* Use either .POPCOUNT (op) == 1 or op & -op == op.  */
  tree type = TREE_TYPE (op);
  gimple *s = gsi->ptr;
  gsi_iterator_update update = before ? GSI_SAME_STMT : GSI_NEW_STMT;
  optimization_type opt_type = bb_optimization_type (gsi->bb);
  if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, opt_type))
    {
      tree tmp = gimple_build (gsi, before, update, loc, IFN_POPCOUNT, type,
                               op);
      result = gimple_build (gsi, before, update, loc, EQ_EXPR,
                             boolean_type_node, tmp, build_one_cst (type));
    }
  else
    {
      tree tmp1 = gimple_build (gsi, before, update, loc, NEGATE_EXPR, type,
                                op);
      tree tmp2 = gimple_build (gsi, before, update, loc, BIT_AND_EXPR, type,
                                op, tmp1);
      result = gimple_build (gsi, before, update, loc, EQ_EXPR,
                             boolean_type_node, tmp2, op);
    }
  gsi->ptr = s;

  return result;
}
Richard Biener July 12, 2024, 7:07 a.m. UTC | #6
On Thu, 11 Jul 2024, Filip Kastl wrote:

> > > > > +/* Check that the "exponential index transform" can be applied to this switch.
> > > > > +
> > > > > +   See comment of the exp_index_transform function for details about this
> > > > > +   transformation.
> > > > > +
> > > > > +   We want:
> > > > > +   - This form of the switch is more efficient
> > > > > +   - Cases are powers of 2
> > > > > +
> > > > > +   Expects that SWTCH has at least one case.  */
> > > > > +
> > > > > +bool
> > > > > +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> > > > > +{
> > > > > +  tree index = gimple_switch_index (swtch);
> > > > > +  tree index_type = TREE_TYPE (index);
> > > > > +  basic_block swtch_bb = gimple_bb (swtch);
> > > > > +  unsigned num_labels = gimple_switch_num_labels (swtch);
> > > > > +
> > > > > +  /* Check that we can efficiently compute logarithm of 2^k (using FFS) and
> > > > > +     test that a given number is 2^k for some k (using POPCOUNT).  */
> > > > > +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> > > > > +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> > > > > +    || !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, opt_type))
> > > > > +    return false;
> > > > > +
> > > > 
> > > > See above, I think this can be improved.  Would be nice to split out
> > > > a can_pow2p (type) and can_log2 (type) and a corresponding
> > > > gen_pow2p (op) and gen_log2 (op) function so this could be re-used
> > > > and alternate variants added when required.
> > > > 
> > > 
> > > Just to check that I understand this correctly:  You'd like me to create
> > > functions can_pow2p, can_log2.  Those functions will check that there are
> > > optabs for the target machine which allow us to efficiently test
> > > power-of-2-ness of a number and which allow us to efficiently compute the
> > > base-2 log of a power-of-2 number.  You'd also like me to create functions
> > > gen_pow2p and gen_log2 which generate this code.  For now these functions will
> > > just use POPCOUNT and FFS but they can be later extended to also consider
> > > different instructions.  Is that right?
> > 
> > Right.
> > 
> > > Into which file should I put these functions?
> > 
> > Just in this file for now.
> >  
> > > Is can_pow2p and gen_pow2p necessary?  As you noted one can always use
> > > (x & -x) == x so testing pow2p can always be done efficiently.
> > 
> > If you add this fallback then can_pow2p / gen_pow2p wouldn't be
> > necessary indeed.
> 
> Hi Richard,
> 
> I put some work into splitting out the can_ and gen_ functions as you
> suggested.  I'm still a bit unsure what your vision of these is so before I
> submit all the changes I made to the patch as version 2 I would like to share
> how I implemented the functions (see bellow).  Is this how you imagined the
> functions?  Would you change something or do the they look ok?
> 
> I wasn't sure how generic to make the functions.  The more generic the more
> convoluted the implementation becomes.  For example: I could make them more
> generic by also including a gsi_iterator_update parameter or I could make them
> less generic but more straightforward by removing the BEFORE parameter.

I think they are fine as-is.  Other APIs like this chose to emit stmts
to a gimple_seq which the caller can then insert in the way of his
choosing.

Ideally we'd abstract this better in the infrastructure, like having
a insert-iterator that embeds both 'before' and update with the
insertion point.  I guess the standard library would have a
insert_before_forward_iterator and insert_after_forward_iterator
and insert_before_backward_iterator, etc. ...  Note that as you
emit a sequence of stmts their order is of course fixed, it's only
the sequence insertion that the caller should influence.

I think that GSI_CONTINUE_LINKING is what works for both before
and after insertion, I'm not sure your GSI_SAME_STMT use for before
is correct?

Richard.

> Cheers,
> Filip Kastl
> 
> 
> /* Does the target have optabs needed to efficiently compute exact base two
>    logarithm of a value with type TYPE?
> 
>    See gen_log2.  */
> 
> static bool
> can_log2 (tree type, optimization_type opt_type)
> {
>   /* Check if target supports FFS.  */
>   return direct_internal_fn_supported_p (IFN_FFS, type, opt_type);
> }
> 
> /* Assume that OP is a power of two.  Build a sequence of gimple statements
>    efficiently computing the base two logarithm of OP using special optabs.
>    Insert statements before GSI if BEFORE is true or after GSI otherwise.
>    Return the result value as an ssa name tree.
> 
>    Leave GSI at the same statement (GSI_SAME_STMT behavior).
> 
>    Should only be used if target supports the needed optabs.  See can_log2.  */
> 
> static tree
> gen_log2 (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op)
> {
>   /* Use .FFS (op) - 1.  */
>   gimple *s = gsi->ptr;
>   tree type = TREE_TYPE (op);
>   gsi_iterator_update update = before ? GSI_SAME_STMT : GSI_NEW_STMT;
>   tree tmp1 = gimple_build (gsi, before, update, loc, IFN_FFS, type, op);
>   tree tmp2 = gimple_build (gsi, before, update, loc, MINUS_EXPR, type, tmp1,
>                             build_one_cst (type));
>   gsi->ptr = s;
>   return tmp2;
> }
> 
> /* Build a sequence of gimple statements checking that OP is a power of 2.  Use
>    special optabs if targets supports them.  Insert statements before GSI if
>    BEFORE is true or after GSI otherwise.  Return the result value as a
>    boolean_type_node ssa name tree.
> 
>    Leave GSI at the same statement (GSI_SAME_STMT behavior).  */
> 
> static tree
> gen_pow2p (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op)
> {
>   tree result;
> 
>   /* Use either .POPCOUNT (op) == 1 or op & -op == op.  */
>   tree type = TREE_TYPE (op);
>   gimple *s = gsi->ptr;
>   gsi_iterator_update update = before ? GSI_SAME_STMT : GSI_NEW_STMT;
>   optimization_type opt_type = bb_optimization_type (gsi->bb);
>   if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, opt_type))
>     {
>       tree tmp = gimple_build (gsi, before, update, loc, IFN_POPCOUNT, type,
>                                op);
>       result = gimple_build (gsi, before, update, loc, EQ_EXPR,
>                              boolean_type_node, tmp, build_one_cst (type));
>     }
>   else
>     {
>       tree tmp1 = gimple_build (gsi, before, update, loc, NEGATE_EXPR, type,
>                                 op);
>       tree tmp2 = gimple_build (gsi, before, update, loc, BIT_AND_EXPR, type,
>                                 op, tmp1);
>       result = gimple_build (gsi, before, update, loc, EQ_EXPR,
>                              boolean_type_node, tmp2, op);
>     }
>   gsi->ptr = s;
> 
>   return result;
> }
>
Filip Kastl July 16, 2024, 2:17 p.m. UTC | #7
On Wed 2024-07-10 11:34:44, Richard Biener wrote:
> On Mon, 8 Jul 2024, Filip Kastl wrote:
> 
> > Hi,
> > 
> > I'm replying to Richard and keeping Andrew in cc since your suggestions
> > overlap.
> > 
> > 
> > On Tue 2024-06-11 14:48:06, Richard Biener wrote:
> > > On Thu, 30 May 2024, Filip Kastl wrote:
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> > > 
> > > I think it's better to enable -mpopcnt and -mbmi (or what remains
> > > as minimal requirement).
> > 
> > Will do.  Currently the testcases are in the i386 directory.  After I exchange
> > the -march for -mpopcnt -mbmi can I put these testcases into gcc.dg/tree-ssa?
> > Will the -mpopcnt -mbmi options work with all target architectures?
> 
> No, those are i386 specific flags.  At least for popcount there's
> dejagnu effective targets popcount, popcountl and popcountll so you
> could do
> 
> /* { dg-additional-options "-mpopcnt" { target { x86_64-*-* i?86-*-* } } } 
> */
> 
> and guard the tree dump scan with { target popcount } to cover other
> archs that have popcount (without adding extra flags).
> 

How does this take into account the FFS instruction?  If -mbmi is i386 specific
then I can't just put it into dg-options, right?  And if I wanted to handle it
similarly to how you suggest handling POPCOUNT, there would have to be
something like { target bmi }.  Is there something like that?

Note that I commited to adding x & -x == x as a fallback to POPCOUNT so now I
do not require -mpopcount.  I now just have to ensure that the testcase only
runs when the target supports FFS (or runs always but scans output only when
target supports FFS).

Cheers,
Filip Kastl
Richard Biener July 17, 2024, 11:18 a.m. UTC | #8
On Tue, 16 Jul 2024, Filip Kastl wrote:

> On Wed 2024-07-10 11:34:44, Richard Biener wrote:
> > On Mon, 8 Jul 2024, Filip Kastl wrote:
> > 
> > > Hi,
> > > 
> > > I'm replying to Richard and keeping Andrew in cc since your suggestions
> > > overlap.
> > > 
> > > 
> > > On Tue 2024-06-11 14:48:06, Richard Biener wrote:
> > > > On Thu, 30 May 2024, Filip Kastl wrote:
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> > > > 
> > > > I think it's better to enable -mpopcnt and -mbmi (or what remains
> > > > as minimal requirement).
> > > 
> > > Will do.  Currently the testcases are in the i386 directory.  After I exchange
> > > the -march for -mpopcnt -mbmi can I put these testcases into gcc.dg/tree-ssa?
> > > Will the -mpopcnt -mbmi options work with all target architectures?
> > 
> > No, those are i386 specific flags.  At least for popcount there's
> > dejagnu effective targets popcount, popcountl and popcountll so you
> > could do
> > 
> > /* { dg-additional-options "-mpopcnt" { target { x86_64-*-* i?86-*-* } } } 
> > */
> > 
> > and guard the tree dump scan with { target popcount } to cover other
> > archs that have popcount (without adding extra flags).
> > 
> 
> How does this take into account the FFS instruction?  If -mbmi is i386 specific
> then I can't just put it into dg-options, right?  And if I wanted to handle it
> similarly to how you suggest handling POPCOUNT, there would have to be
> something like { target bmi }.  Is there something like that?

I don't think so.  You can of course add architecture specific tests.

Richard.

> Note that I commited to adding x & -x == x as a fallback to POPCOUNT so now I
> do not require -mpopcount.  I now just have to ensure that the testcase only
> runs when the target supports FFS (or runs always but scans output only when
> target supports FFS).



> Cheers,
> Filip Kastl
>
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
new file mode 100644
index 00000000000..d51dd110623
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
@@ -0,0 +1,34 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
+
+/* Checks that exponential index transform enables switch conversion to convert
+   this switch into an array lookup.  Also checks that the "index variable is a
+   power of two" check has been generated.  -march=znver3 because that
+   microarchitecture supports POPCOUNT and FFS instructions and exponential
+   index transform requires that.  */
+
+int foo(unsigned bar)
+{
+    switch (bar)
+    {
+        case (1 << 0):
+            return 1;
+        case (1 << 1):
+            return 2;
+        case (1 << 2):
+            return 3;
+        case (1 << 3):
+            return 4;
+        case (1 << 4):
+            return 8;
+        case (1 << 5):
+            return 13;
+        case (1 << 6):
+            return 21;
+        default:
+            return 0;
+    }
+}
+
+/* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
+/* { dg-final { scan-tree-dump "POPCOUNT" "switchconv" } } */
diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c b/gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c
new file mode 100644
index 00000000000..9f2b536aa74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-2.c
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
+
+/* Checks that when exponential index transform is viable but switch conversion
+   decides that the switch cannot be converted, the exponential index transform
+   is not done.  -march=znver3 because that microarchitecture supports POPCOUNT
+   and FFS instructions and exponential index transform requires that.  */
+
+int a;
+
+int foo(unsigned bar)
+{
+    switch (bar)
+    {
+        case (1 << 0):
+            return 0;
+        case (1 << 1):
+            return 1;
+        case (1 << 2):
+            return 2;
+        case (1 << 3):
+            return 3;
+        case (1 << 4):
+            return 4;
+        case (1 << 5):
+            a = 3;
+            return 5;
+        case (1 << 6):
+            return 6;
+        default:
+            return 0;
+    }
+}
+
+/* { dg-final { scan-tree-dump "Exponential index transform viable" "switchconv" } } */
+/* { dg-final { scan-tree-dump-not "Applying exponential index transform" "switchconv" } } */
diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
new file mode 100644
index 00000000000..e9665d4a38d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
@@ -0,0 +1,151 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
+
+/* Checks that the exponential index transformation is done for all these types
+   of the index variable:
+   - (unsigned) int
+   - (unsigned) long
+   - (unsigned) long long
+
+   -march=znver3 because that microarchitecture supports POPCOUNT
+   and FFS instructions and exponential index transform requires that.  */
+
+int unopt_int(int bit_position)
+{
+    switch (bit_position)
+    {
+        case (1 << 0):
+            return 0;
+        case (1 << 1):
+            return 1;
+        case (1 << 2):
+            return 2;
+        case (1 << 3):
+            return 3;
+        case (1 << 4):
+            return 4;
+        case (1 << 5):
+            return 5;
+        case (1 << 6):
+            return 6;
+        default:
+            return 0;
+    }
+}
+
+int unopt_unsigned_int(unsigned int bit_position)
+{
+    switch (bit_position)
+    {
+        case (1 << 0):
+            return 0;
+        case (1 << 1):
+            return 1;
+        case (1 << 2):
+            return 2;
+        case (1 << 3):
+            return 3;
+        case (1 << 4):
+            return 4;
+        case (1 << 5):
+            return 5;
+        case (1 << 6):
+            return 6;
+        default:
+            return 0;
+    }
+}
+
+int unopt_long(long bit_position)
+{
+    switch (bit_position)
+    {
+        case (1 << 0):
+            return 0;
+        case (1 << 1):
+            return 1;
+        case (1 << 2):
+            return 2;
+        case (1 << 3):
+            return 3;
+        case (1 << 4):
+            return 4;
+        case (1 << 5):
+            return 5;
+        case (1 << 6):
+            return 6;
+        default:
+            return 0;
+    }
+}
+
+int unopt_unsigned_long(unsigned long bit_position)
+{
+    switch (bit_position)
+    {
+        case (1 << 0):
+            return 0;
+        case (1 << 1):
+            return 1;
+        case (1 << 2):
+            return 2;
+        case (1 << 3):
+            return 3;
+        case (1 << 4):
+            return 4;
+        case (1 << 5):
+            return 5;
+        case (1 << 6):
+            return 6;
+        default:
+            return 0;
+    }
+}
+
+int unopt_long_long(long long bit_position)
+{
+    switch (bit_position)
+    {
+        case (1 << 0):
+            return 0;
+        case (1 << 1):
+            return 1;
+        case (1 << 2):
+            return 2;
+        case (1 << 3):
+            return 3;
+        case (1 << 4):
+            return 4;
+        case (1 << 5):
+            return 5;
+        case (1 << 6):
+            return 6;
+        default:
+            return 0;
+    }
+}
+
+int unopt_unsigned_long_long(unsigned long long bit_position)
+{
+    switch (bit_position)
+    {
+        case (1 << 0):
+            return 0;
+        case (1 << 1):
+            return 1;
+        case (1 << 2):
+            return 2;
+        case (1 << 3):
+            return 3;
+        case (1 << 4):
+            return 4;
+        case (1 << 5):
+            return 5;
+        case (1 << 6):
+            return 6;
+        default:
+            return 0;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 "switchconv" } } */
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 3a5b84c09e2..975888c0969 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -52,6 +52,8 @@  Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
 #include "omp-general.h"
 #include "gimple-range.h"
 #include "tree-cfgcleanup.h"
+#include "hwint.h"
+#include "internal-fn.h"
 
 /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode
    type in the GIMPLE type system that is language-independent?  */
@@ -66,7 +68,8 @@  using namespace tree_switch_conversion;
 switch_conversion::switch_conversion (): m_final_bb (NULL),
   m_constructors (NULL), m_default_values (NULL),
   m_arr_ref_first (NULL), m_arr_ref_last (NULL),
-  m_reason (NULL), m_default_case_nonstandard (false), m_cfg_altered (false)
+  m_reason (NULL), m_default_case_nonstandard (false), m_cfg_altered (false),
+  m_exp_index_transform_applied (false)
 {
 }
 
@@ -202,6 +205,267 @@  switch_conversion::collect (gswitch *swtch)
     }
 }
 
+/* Check that the "exponential index transform" can be applied to this switch.
+
+   See comment of the exp_index_transform function for details about this
+   transformation.
+
+   We want:
+   - This form of the switch is more efficient
+   - Cases are powers of 2
+
+   Expects that SWTCH has at least one case.  */
+
+bool
+switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
+{
+  tree index = gimple_switch_index (swtch);
+  tree index_type = TREE_TYPE (index);
+  basic_block swtch_bb = gimple_bb (swtch);
+  unsigned num_labels = gimple_switch_num_labels (swtch);
+
+  /* Check that we can efficiently compute logarithm of 2^k (using FFS) and
+     test that a given number is 2^k for some k (using POPCOUNT).  */
+  optimization_type opt_type = bb_optimization_type (swtch_bb);
+  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
+    || !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, opt_type))
+    return false;
+
+  /* Check that each case label corresponds only to one value
+     (no case 1..3).  */
+  unsigned i;
+  for (i = 1; i < num_labels; i++)
+    {
+      tree label = gimple_switch_label (swtch, i);
+      if (CASE_HIGH (label))
+	return false;
+    }
+
+  /* Check that each label is nonnegative.  */
+  for (i = 1; i < num_labels; i++)
+    {
+      tree label = gimple_switch_label (swtch, i);
+      if (!tree_fits_uhwi_p (CASE_LOW (label)))
+	return false;
+    }
+
+  /* Check that each case is a power of 2.  */
+  for (i = 1; i < num_labels; i++)
+    {
+      tree label = gimple_switch_label (swtch, i);
+      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
+      if (!pow2p_hwi (label_hwi))
+	return false;
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "Exponential index transform viable\n");
+
+  return true;
+}
+
+/* Perform the "exponential index transform".
+
+   Assume that cases of SWTCH are powers of 2.  The transformation replaces the
+   cases by their exponents (2^k -> k).  It also inserts a statement that
+   computes the exponent of the original index variable (basically taking the
+   logarithm) and then sets the result as the new index variable.
+
+   The transformation also inserts a conditional statement checking that the
+   incoming original index variable is a power of 2 with the false edge leading
+   to the default case.
+
+   The exponential index transform shrinks the range of case numbers which
+   helps switch conversion convert switches it otherwise could not.
+
+   Consider for example:
+
+   switch (i)
+     {
+       case (1 << 0): return 0;
+       case (1 << 1): return 1;
+       case (1 << 2): return 2;
+       ...
+       case (1 << 30): return 30;
+       default: return 31;
+     }
+
+   First, exponential index transform gets applied.  Since each case becomes
+   case x: return x;, the rest of switch conversion is then able to get rid of
+   the switch statement.
+
+   if (i is power of 2)
+     return log2 (i);
+   else
+     return 31;
+
+   */
+
+void
+switch_conversion::exp_index_transform (gswitch *swtch)
+{
+  if (dump_file)
+    fprintf (dump_file, "Applying exponential index transform\n");
+
+  tree index = gimple_switch_index (swtch);
+  tree index_type = TREE_TYPE (index);
+  basic_block swtch_bb = gimple_bb (swtch);
+  unsigned num_labels = gimple_switch_num_labels (swtch);
+  tree one = build_one_cst (index_type);
+
+  /* Insert a cond stmt that checks if the index variable is a power of 2.  */
+  gimple_stmt_iterator gsi = gsi_for_stmt (swtch);
+  gsi_prev (&gsi);
+  gimple *foo = gsi_stmt (gsi);
+  edge new_edge1 = split_block (swtch_bb, foo);
+
+  swtch_bb = new_edge1->dest;
+  basic_block cond_bb = new_edge1->src;
+  new_edge1->flags |= EDGE_TRUE_VALUE;
+  new_edge1->flags &= ~EDGE_FALLTHRU;
+  new_edge1->probability = profile_probability::even ();
+
+  basic_block default_bb = gimple_switch_default_bb (cfun, swtch);
+  edge new_edge2 = make_edge (cond_bb, default_bb, EDGE_FALSE_VALUE);
+  new_edge2->probability = profile_probability::even ();
+
+  gsi = gsi_last_bb (cond_bb);
+
+  tree tmp1 = make_ssa_name (index_type);
+  gcall *stmt_popcount = gimple_build_call_internal (IFN_POPCOUNT, 2, index,
+						     one);
+  gimple_call_set_lhs (stmt_popcount, tmp1);
+  gsi_insert_after (&gsi, stmt_popcount, GSI_NEW_STMT);
+
+  gcond *stmt_cond = gimple_build_cond (EQ_EXPR, tmp1, one, NULL, NULL);
+  gsi_insert_after (&gsi, stmt_cond, GSI_NEW_STMT);
+
+  /* We just added an edge going to default bb so fix PHI nodes in that bb:
+     For each PHI add new PHI arg.  It will be the same arg as when comming to
+     the default bb from the switch bb.  */
+  edge default_edge = find_edge (swtch_bb, default_bb);
+  for (gphi_iterator gsi = gsi_start_phis (default_bb);
+       !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gphi *phi = gsi.phi ();
+      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, default_edge);
+      location_t loc = gimple_phi_arg_location_from_edge (phi, default_edge);
+      add_phi_arg (phi, arg, new_edge2, loc);
+    }
+
+  /* Insert a statement that takes the logarithm of the index variable.  */
+  tree tmp2 = make_ssa_name (index_type);
+  gsi = gsi_start_bb (swtch_bb);
+  gcall *stmt_ffs = gimple_build_call_internal (IFN_FFS, 1, index);
+  gimple_call_set_lhs (stmt_ffs, tmp2);
+  gsi_insert_before (&gsi, stmt_ffs, GSI_SAME_STMT);
+
+  tree tmp3 = make_ssa_name (index_type);
+  gassign *stmt_minus_one = gimple_build_assign (tmp3, MINUS_EXPR, tmp2, one);
+  gsi_insert_before (&gsi, stmt_minus_one, GSI_SAME_STMT);
+
+  /* Use the result of the logarithm as the new index variable.  */
+  gimple_switch_set_index (swtch, tmp3);
+  update_stmt (swtch);
+
+  /* Replace each case number with its logarithm.  */
+  unsigned i;
+  for (i = 1; i < num_labels; i++)
+    {
+      tree label = gimple_switch_label (swtch, i);
+      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
+      CASE_LOW (label) = build_int_cst (index_type, floor_log2 (label_hwi));
+    }
+
+  /* Fix the dominator tree, if it is available.  */
+  if (dom_info_available_p (CDI_DOMINATORS))
+    {
+      /* Analysis of how dominators should look after we add the edge E going
+	 from the cond block to the default block.
+
+	 1 For the blocks between the switch block and the final block
+	 (excluding the final block itself):  They had the switch block as
+	 their immediate dominator.  That shouldn't change.
+
+	 2 The final block may now have the switch block or the cond block as
+	 its immediate dominator.  There's no easy way of knowing (consider
+	 two cases where in both m_default_case_nonstandard = true, in one a
+	 path through default intersects the final block and in one all paths
+	 through default avoid the final block but intersect a successor of the
+	 final block).
+
+	 3 Other blocks that had the switch block as their immediate dominator
+	 should now have the cond block as their immediate dominator.
+
+	 4 Immediate dominators of the rest of the blocks shouldn't change.
+
+	 Reasoning for 3 and 4:
+
+	 We'll only consider blocks that do not fall into 1 or 2.
+
+	 Consider a block X whose original imm dom was the switch block.  All
+	 paths to X must also intersect the cond block since it's the only
+	 pred of the switch block.  The final block doesn't dominate X so at
+	 least one path P must lead through the default block.  Let P' be P but
+	 instead of going through the switch block, take E.  The switch block
+	 doesn't dominate X so its imm dom must now be the cond block.
+
+	 Consider a block X whose original imm dom was Y != the switch block.
+	 We only added an edge so all original paths to X are still present.
+	 So X gained no new dominators.  Observe that Y still dominates X.
+	 There would have to be a path that avoids Y otherwise.  But any block
+	 we can avoid now except for the switch block we were able to avoid
+	 before adding E.  */
+
+      redirect_immediate_dominators (CDI_DOMINATORS, swtch_bb, cond_bb);
+
+      /* FIXME: Since exponential transform is run only if we know that the
+	 switch will be converted, we know these blocks will be removed and we
+	 maybe don't have to bother updating their dominators.  */
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, swtch_bb->succs)
+	{
+	  basic_block bb = e->dest;
+	  if (bb == m_final_bb || bb == default_bb)
+	    continue;
+	  set_immediate_dominator (CDI_DOMINATORS, bb, swtch_bb);
+	}
+
+      vec<basic_block> v;
+      v.create (1);
+      v.quick_push (m_final_bb);
+      iterate_fix_dominators (CDI_DOMINATORS, v, true);
+    }
+
+  /* Update information about the switch statement.  */
+  tree first_label = gimple_switch_label (swtch, 1);
+  tree last_label = gimple_switch_label (swtch, num_labels - 1);
+
+  m_range_min = CASE_LOW (first_label);
+  m_range_max = CASE_LOW (last_label);
+  m_index_expr = gimple_switch_index (swtch);
+  m_switch_bb = swtch_bb;
+
+  unsigned HOST_WIDE_INT range_size_hwi = tree_to_uhwi (m_range_max)
+    - tree_to_uhwi (m_range_min);
+  m_range_size = build_int_cst (index_type, range_size_hwi);
+
+  m_cfg_altered = true;
+
+  m_contiguous_range = true;
+  unsigned HOST_WIDE_INT last_hwi = tree_to_uhwi (CASE_LOW (first_label));
+  for (i = 2; i < num_labels; i++)
+    {
+      tree label = gimple_switch_label (swtch, i);
+      unsigned HOST_WIDE_INT label_hwi = tree_to_uhwi (CASE_LOW (label));
+      m_contiguous_range &= last_hwi + 1 == label_hwi;
+      last_hwi = label_hwi;
+    }
+
+  m_exp_index_transform_applied = true;
+}
+
 /* Checks whether the range given by individual case statements of the switch
    switch statement isn't too big and whether the number of branches actually
    satisfies the size of the new array.  */
@@ -973,8 +1237,9 @@  switch_conversion::gen_inbound_check ()
     bbf->count = e1f->count () + e2f->count ();
 
   /* Tidy blocks that have become unreachable.  */
-  prune_bbs (bbd, m_final_bb,
-	     m_default_case_nonstandard ? m_default_bb : NULL);
+  bool prune_default_bb = !m_default_case_nonstandard
+    && !m_exp_index_transform_applied;
+  prune_bbs (bbd, m_final_bb, prune_default_bb ? NULL : m_default_bb);
 
   /* Fixup the PHI nodes in bbF.  */
   fix_phi_nodes (e1f, e2f, bbf);
@@ -1053,8 +1318,19 @@  switch_conversion::expand (gswitch *swtch)
       return;
     }
 
-  /* Check the case label values are within reasonable range:  */
-  if (!check_range ())
+  /* Sometimes it is possible to use the "exponential index transform" to help
+     switch conversion convert switches which it otherwise could not convert.
+     However, we want to do this transform only when we know that switch
+     conversion will then really be able to convert the switch.  So we first
+     check if the transformation is applicable and then maybe later do the
+     transformation.  */
+  bool exp_transform_viable = is_exp_index_transform_viable (swtch);
+
+  /* Check the case label values are within reasonable range.
+
+     If we will be doing exponential index transform, the range will be always
+     reasonable.  */
+  if (!exp_transform_viable && !check_range ())
     {
       gcc_assert (m_reason);
       return;
@@ -1076,6 +1352,9 @@  switch_conversion::expand (gswitch *swtch)
   /* At this point all checks have passed and we can proceed with the
      transformation.  */
 
+  if (exp_transform_viable)
+    exp_index_transform (swtch);
+
   create_temp_arrays ();
   gather_default_values (m_default_case_nonstandard
 			 ? gimple_switch_label (swtch, 1)
diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
index 6939eec6018..1a865f85f3a 100644
--- a/gcc/tree-switch-conversion.h
+++ b/gcc/tree-switch-conversion.h
@@ -743,6 +743,19 @@  public:
   /* Collection information about SWTCH statement.  */
   void collect (gswitch *swtch);
 
+  /* Check that the 'exponential index transform' can be applied.
+
+     See the comment at the function definition for more details.  */
+  bool is_exp_index_transform_viable (gswitch *swtch);
+
+  /* Perform the 'exponential index transform'.
+
+     The exponential index transform shrinks the range of case numbers which
+     helps switch conversion convert switches it otherwise could not.
+
+     See the comment at the function definition for more details.  */
+  void exp_index_transform (gswitch *swtch);
+
   /* Checks whether the range given by individual case statements of the switch
      switch statement isn't too big and whether the number of branches actually
      satisfies the size of the new array.  */
@@ -900,6 +913,11 @@  public:
 
   /* True if CFG has been changed.  */
   bool m_cfg_altered;
+
+  /* True if exponential index transform has been applied.  See the comment at
+     the definition of exp_index_transform for details about the
+     transformation.  */
+  bool m_exp_index_transform_applied;
 };
 
 void