diff mbox series

Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)

Message ID 8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com
State New
Headers show
Series Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads) | expand

Commit Message

Andre Vieira (lists) Aug. 8, 2022, 2:06 p.m. UTC
Hi,

So I've changed the approach from the RFC as suggested, moving the 
bitfield lowering to the if-convert pass.

So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD 
field's to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if 
they are writes, using loads and writes of 'representatives' that are 
big enough to contain the bitfield value.

In vect_recog I added two patterns to replace these BIT_FIELD_REF and 
BIT_INSERT_EXPR with shift's and masks as appropriate.

I'd like to see if it was possible to remove the 'load' part of a 
BIT_INSERT_EXPR if the representative write didn't change any relevant 
bits.  For example:

struct s{
int dont_care;
char a : 3;
};

s.a = <value>;

Should not require a load & write cycle, in fact it wouldn't even 
require any masking either. Though to achieve this we'd need to make 
sure the representative didn't overlap with any other field. Any 
suggestions on how to do this would be great, though I don't think we 
need to wait for that, as that's merely a nice-to-have optimization I guess?

I am not sure where I should 'document' this change of behavior to 
ifcvt, and/or we should change the name of the pass, since it's doing 
more than if-conversion now?

Bootstrapped and regression tested this patch on aarch64-none-linux-gnu.

gcc/ChangeLog:
2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * tree-if-conv.cc (includes): Add expr.h and langhooks.h to 
list of includes.
         (need_to_lower_bitfields): New static bool.
         (need_to_ifcvt): Likewise.
         (version_loop_for_if_conversion): Adapt to work for bitfield 
lowering-only path.
         (bitfield_data_t): New typedef.
         (get_bitfield_data): New function.
         (lower_bitfield): New function.
         (bitfields_to_lower_p): New function.
         (tree_if_conversion): Change to lower-bitfields too.
         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): 
Modify dump message to be more accurate.
         * tree-vect-patterns.cc (includes): Add gimplify-me.h include.
         (vect_recog_bitfield_ref_pattern): New function.
         (vect_recog_bit_insert_pattern): New function.
         (vect_vect_recog_func_ptrs): Add two new patterns.

gcc/testsuite/ChangeLog:
2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * gcc.dg/vect/vect-bitfield-read-1.c: New test.
         * gcc.dg/vect/vect-bitfield-read-2.c: New test.
         * gcc.dg/vect/vect-bitfield-read-3.c: New test.
         * gcc.dg/vect/vect-bitfield-read-4.c: New test.
         * gcc.dg/vect/vect-bitfield-write-1.c: New test.
         * gcc.dg/vect/vect-bitfield-write-2.c: New test.
         * gcc.dg/vect/vect-bitfield-write-3.c: New test.

Kind regards,
Andre

Comments

Richard Biener Aug. 9, 2022, 2:34 p.m. UTC | #1
On Mon, 8 Aug 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> So I've changed the approach from the RFC as suggested, moving the bitfield
> lowering to the if-convert pass.
> 
> So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD field's
> to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if they are
> writes, using loads and writes of 'representatives' that are big enough to
> contain the bitfield value.
> 
> In vect_recog I added two patterns to replace these BIT_FIELD_REF and
> BIT_INSERT_EXPR with shift's and masks as appropriate.
> 
> I'd like to see if it was possible to remove the 'load' part of a
> BIT_INSERT_EXPR if the representative write didn't change any relevant bits. 
> For example:
> 
> struct s{
> int dont_care;
> char a : 3;
> };
> 
> s.a = <value>;
> 
> Should not require a load & write cycle, in fact it wouldn't even require any
> masking either. Though to achieve this we'd need to make sure the
> representative didn't overlap with any other field. Any suggestions on how to
> do this would be great, though I don't think we need to wait for that, as
> that's merely a nice-to-have optimization I guess?

Hmm.  I'm not sure the middle-end can simply ignore padding.  If
some language standard says that would be OK then I think we should
exploit this during lowering when the frontend is still around to
ask - which means somewhen during early optimization.

> I am not sure where I should 'document' this change of behavior to ifcvt,
> and/or we should change the name of the pass, since it's doing more than
> if-conversion now?

It's preparation for vectorization anyway since it will emit
.MASK_LOAD/STORE and friends already.  So I don't think anything
needs to change there.


@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool 
aggressive_if_conv)
   auto_vec<edge> critical_edges;

   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (num <= 2 || loop->inner)
     return false;

   body = get_loop_body (loop);

this doesn't appear in the ChangeLog nor is it clear to me why it's
needed?  Likewise

-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  
*/
+      save_length = loop->inner ? loop->inner->num_nodes : 
loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+       saved_preds[i] = ifc_bbs[i]->aux;
+    }

is that just premature optimization?

+  /* BITSTART and BITEND describe the region we can safely load from 
inside the
+     structure.  BITPOS is the bit position of the value inside the
+     representative that we will end up loading OFFSET bytes from the 
start
+     of the struct.  BEST_MODE is the mode describing the optimal size of 
the
+     representative chunk we load.  If this is a write we will store the 
same
+     sized representative back, after we have changed the appropriate 
bits.  */
+  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);

I think you need to give up when get_bit_range sets bitstart = bitend to 
zero

+  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
+                    TYPE_ALIGN (TREE_TYPE (struct_expr)),
+                    INT_MAX, false, &best_mode))

+  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+                             NULL_TREE, rep_type);
+  /* Load from the start of 'offset + bitpos % alignment'.  */
+  uint64_t extra_offset = bitpos.to_constant ();

you shouldn't build a new FIELD_DECL.  Either you use
DECL_BIT_FIELD_REPRESENTATIVE directly or you use a
BIT_FIELD_REF accessing the "representative".
DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain
a variable field offset, you can also subset that with an
intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is
too large for your taste.

I'm not sure all the offset calculation you do is correct, but
since you shouldn't invent a new FIELD_DECL it probably needs
to change anyway ...

Note that for optimization it will be important that all
accesses to the bitfield members of the same bitfield use the
same underlying area (CSE and store-forwarding will thank you).

+
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, 
&bitfields_to_lower);
+  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
+      && !need_to_lower_bitfields)
     goto cleanup;

so we lower bitfields even when we cannot split critical edges?
why?

+  need_to_ifcvt
+    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;

likewise - if_convertible_loop_p performs other checks, the only
one we want to elide is the loop->num_nodes <= 2 check since
we want to lower bitfields in single-block loops as well.  That
means we only have to scan for bitfield accesses in the first
block "prematurely".  So I would interwind the need_to_lower_bitfields
into if_convertible_loop_p and if_convertible_loop_p_1 and
put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields.

+         tree op = gimple_get_lhs (stmt);
+         bool write = TREE_CODE (op) == COMPONENT_REF;
+
+         if (!write)
+           op = gimple_assign_rhs1 (stmt);
+
+         if (TREE_CODE (op) != COMPONENT_REF)
+           continue;
+
+         if (DECL_BIT_FIELD (TREE_OPERAND (op, 1)))

note the canonical test for a bitfield access is to check
DECL_BIT_FIELD_TYPE, not DECL_BIT_FIELD.  In particular for

struct { int a : 4; int b : 4; int c : 8; int d : 4; int e : 12; }

'c' will _not_ have DECL_BIT_FIELD set but you want to lower it's
access since you otherwise likely will get conflicting accesses
for the other fields (store forwarding).

+static bool
+bitfields_to_lower_p (class loop *loop, auto_vec <bitfield_data_t *, 4> 
*to_lower)

don't pass auto_vec<> *, just pass vec<>&, auto_vec will properly
decay.

+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info 
stmt_info,
+                                tree *type_out)
+{
+  gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!nop_stmt
+      || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR

CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (nop_stmt))

+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+
+  tree load = TREE_OPERAND (bf_ref, 0);
+  tree size = TREE_OPERAND (bf_ref, 1);
+  tree offset = TREE_OPERAND (bf_ref, 2);

use bit_field_{size,offset}

+  /* Bail out if the load is already a vector type.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (load)))
+    return NULL;

I think you want a "positive" check, what kind of type you
handle for the load.  An (unsigned?) INTEGRAL_TYPE_P one I guess.

+  tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt));
+

gimple_assign_lhs

+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+       = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+                              NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }

hm - so you have for example

 int _1 = MEM;
 int:3 _2 = BIT_FIELD_REF <_1, ...>
 type _3 = (type) _2;

and that _3 = (type) _2 is because of integer promotion and you
perform all the shifting in that type.  I suppose you should
verify that the cast is indeed promoting, not narrowing, since
otherwise you'll produce wrong code?  That said, shouldn't you
perform the shift / mask in the type of _1 instead?  (the hope
is, of course, that typeof (_1) == type in most cases)

Similar comments apply to vect_recog_bit_insert_pattern.

Overall it looks reasonable but it does still need some work.

Thanks,
Richard.



> Bootstrapped and regression tested this patch on aarch64-none-linux-gnu.
> 
> gcc/ChangeLog:
> 2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>         * tree-if-conv.cc (includes): Add expr.h and langhooks.h to list of
> includes.
>         (need_to_lower_bitfields): New static bool.
>         (need_to_ifcvt): Likewise.
>         (version_loop_for_if_conversion): Adapt to work for bitfield 
> lowering-only path.
>         (bitfield_data_t): New typedef.
>         (get_bitfield_data): New function.
>         (lower_bitfield): New function.
>         (bitfields_to_lower_p): New function.
>         (tree_if_conversion): Change to lower-bitfields too.
>         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): 
> Modify dump message to be more accurate.
>         * tree-vect-patterns.cc (includes): Add gimplify-me.h include.
>         (vect_recog_bitfield_ref_pattern): New function.
>         (vect_recog_bit_insert_pattern): New function.
>         (vect_vect_recog_func_ptrs): Add two new patterns.
> 
> gcc/testsuite/ChangeLog:
> 2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>         * gcc.dg/vect/vect-bitfield-read-1.c: New test.
>         * gcc.dg/vect/vect-bitfield-read-2.c: New test.
>         * gcc.dg/vect/vect-bitfield-read-3.c: New test.
>         * gcc.dg/vect/vect-bitfield-read-4.c: New test.
>         * gcc.dg/vect/vect-bitfield-write-1.c: New test.
>         * gcc.dg/vect/vect-bitfield-write-2.c: New test.
>         * gcc.dg/vect/vect-bitfield-write-3.c: New test.
> 
> Kind regards,
> Andre
Andre Vieira (lists) Aug. 16, 2022, 10:24 a.m. UTC | #2
Hi,

New version of the patch attached, but haven't recreated the ChangeLog 
yet, just waiting to see if this is what you had in mind. See also some 
replies to your comments in-line below:

On 09/08/2022 15:34, Richard Biener wrote:

> @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool
> aggressive_if_conv)
>     auto_vec<edge> critical_edges;
>
>     /* Loop is not well formed.  */
> -  if (num <= 2 || loop->inner || !single_exit (loop))
> +  if (num <= 2 || loop->inner)
>       return false;
>
>     body = get_loop_body (loop);
>
> this doesn't appear in the ChangeLog nor is it clear to me why it's
> needed?  Likewise
So both these and...
>
> -  /* Save BB->aux around loop_version as that uses the same field.  */
> -  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
> -  void **saved_preds = XALLOCAVEC (void *, save_length);
> -  for (unsigned i = 0; i < save_length; i++)
> -    saved_preds[i] = ifc_bbs[i]->aux;
> +  void **saved_preds = NULL;
> +  if (any_complicated_phi || need_to_predicate)
> +    {
> +      /* Save BB->aux around loop_version as that uses the same field.
> */
> +      save_length = loop->inner ? loop->inner->num_nodes :
> loop->num_nodes;
> +      saved_preds = XALLOCAVEC (void *, save_length);
> +      for (unsigned i = 0; i < save_length; i++)
> +       saved_preds[i] = ifc_bbs[i]->aux;
> +    }
>
> is that just premature optimization?

.. these changes are to make sure we can still use the loop versioning 
code even for cases where there are bitfields to lower but no ifcvts 
(i.e. num of BBs <= 2).
I wasn't sure about the loop-inner condition and the small examples I 
tried it seemed to work, that is loop version seems to be able to handle 
nested loops.

The single_exit condition is still required for both, because the code 
to create the loop versions depends on it. It does look like I missed 
this in the ChangeLog...

> +  /* BITSTART and BITEND describe the region we can safely load from
> inside the
> +     structure.  BITPOS is the bit position of the value inside the
> +     representative that we will end up loading OFFSET bytes from the
> start
> +     of the struct.  BEST_MODE is the mode describing the optimal size of
> the
> +     representative chunk we load.  If this is a write we will store the
> same
> +     sized representative back, after we have changed the appropriate
> bits.  */
> +  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);
>
> I think you need to give up when get_bit_range sets bitstart = bitend to
> zero
>
> +  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
> +                    TYPE_ALIGN (TREE_TYPE (struct_expr)),
> +                    INT_MAX, false, &best_mode))
>
> +  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> +                             NULL_TREE, rep_type);
> +  /* Load from the start of 'offset + bitpos % alignment'.  */
> +  uint64_t extra_offset = bitpos.to_constant ();
>
> you shouldn't build a new FIELD_DECL.  Either you use
> DECL_BIT_FIELD_REPRESENTATIVE directly or you use a
> BIT_FIELD_REF accessing the "representative".
> DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain
> a variable field offset, you can also subset that with an
> intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is
> too large for your taste.
>
> I'm not sure all the offset calculation you do is correct, but
> since you shouldn't invent a new FIELD_DECL it probably needs
> to change anyway ...
I can use the DECL_BIT_FIELD_REPRESENTATIVE, but I'll still need some 
offset calculation/extraction. It's easier to example with an example:

In vect-bitfield-read-3.c the struct:
typedef struct {
     int  c;
     int  b;
     bool a : 1;
} struct_t;

and field access 'vect_false[i].a' or 'vect_true[i].a' will lead to a 
DECL_BIT_FIELD_REPRESENTATIVE of TYPE_SIZE of 8 (and TYPE_PRECISION is 
also 8 as expected). However, the DECL_FIELD_OFFSET of either the 
original field decl, the actual bitfield member, or the 
DECL_BIT_FIELD_REPRESENTATIVE is 0 and the DECL_FIELD_BIT_OFFSET is 64. 
These will lead to the correct load:
_1 = vect_false[i].D;

D here being the representative is an 8-bit load from vect_false[i] + 
64bits. So all good there. However, when we construct BIT_FIELD_REF we 
can't simply use DECL_FIELD_BIT_OFFSET (field_decl) as the 
BIT_FIELD_REF's bitpos.  During `verify_gimple` it checks that bitpos + 
bitsize < TYPE_SIZE (TREE_TYPE (load)) where BIT_FIELD_REF (load, 
bitsize, bitpos).

So instead I change bitpos such that:
align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
bitpos -= bitpos.to_constant () / align_of_representative * 
align_of_representative;

I've now rewritten this to:
poly_int64 q,r;
if (can_trunc_div_p(bitpos, align_of_representative, &q, &r))
   bitpos = r;

It makes it slightly clearer, also because I no longer need the changes 
to the original tree offset as I'm just using D for the load.
> Note that for optimization it will be important that all
> accesses to the bitfield members of the same bitfield use the
> same underlying area (CSE and store-forwarding will thank you).
>
> +
> +  need_to_lower_bitfields = bitfields_to_lower_p (loop,
> &bitfields_to_lower);
> +  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
> +      && !need_to_lower_bitfields)
>       goto cleanup;
>
> so we lower bitfields even when we cannot split critical edges?
> why?
>
> +  need_to_ifcvt
> +    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
> +  if (!need_to_ifcvt && !need_to_lower_bitfields)
>       goto cleanup;
>
> likewise - if_convertible_loop_p performs other checks, the only
> one we want to elide is the loop->num_nodes <= 2 check since
> we want to lower bitfields in single-block loops as well.  That
> means we only have to scan for bitfield accesses in the first
> block "prematurely".  So I would interwind the need_to_lower_bitfields
> into if_convertible_loop_p and if_convertible_loop_p_1 and
> put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields.
I'm not sure I understood this. But I'd rather keep the 'need_to_ifcvt' 
(new) and 'need_to_lower_bitfields' separate. One thing I did change is 
that we no longer check for bitfields to lower if there are if-stmts 
that we can't lower, since we will not be vectorizing this loop anyway 
so no point in wasting time lowering bitfields. At the same time though, 
I'd like to be able to lower-bitfields if there are no ifcvts.
> +  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
> +    {
> +      pattern_stmt
> +       = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
> +                              NOP_EXPR, lhs);
> +      lhs = gimple_get_lhs (pattern_stmt);
> +      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
> +    }
>
> hm - so you have for example
>
>   int _1 = MEM;
>   int:3 _2 = BIT_FIELD_REF <_1, ...>
>   type _3 = (type) _2;
>
> and that _3 = (type) _2 is because of integer promotion and you
> perform all the shifting in that type.  I suppose you should
> verify that the cast is indeed promoting, not narrowing, since
> otherwise you'll produce wrong code?  That said, shouldn't you
> perform the shift / mask in the type of _1 instead?  (the hope
> is, of course, that typeof (_1) == type in most cases)
>
> Similar comments apply to vect_recog_bit_insert_pattern.
Good shout, hadn't realized that yet because of how the testcases didn't 
have that problem, but when using the REPRESENTATIVE macro they do test 
that now. I don't think the bit_insert is a problem though. In 
bit_insert, 'value' always has the relevant bits starting at its LSB. So 
regardless of whether the load (and store) type is larger or smaller 
than the type, performing the shifts and masks in this type should be OK 
as you'll only be 'cutting off' the MSB's which would be the ones that 
would get truncated anyway? Or am missing something here?
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..f450dbb1922586b3d405281f605fb0d8a7fc8fc2 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -2898,18 +2908,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2935,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3274,196 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, poly_int64 *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_get_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  if (bitpos)
+    *bitpos = tree_to_poly_int64 (DECL_FIELD_BIT_OFFSET (field_decl));
+
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  poly_int64 bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_get_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* BITPOS represents the position of the first bit of the bitfield field we
+     are accessing.  However, sometimes it can be from the start of the struct,
+     and sometimes from the start of the representative we are loading.  For
+     the first, the following code will adapt BITPOS to the latter since that
+     is the value BIT_FIELD_REF is expecting as bitposition.  For the latter
+     this should no effect.  */
+  HOST_WIDE_INT q;
+  poly_int64 r;
+  poly_int64 rep_align = TYPE_ALIGN (rep_type);
+  if (can_div_trunc_p (bitpos, rep_align, &q, &r))
+    bitpos = r;
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos_tree), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos_tree);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_get_lhs (stmt),
+						     new_val));
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  basic_block *bbs = get_loop_body (loop);
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_get_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3474,18 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  vec <gassign *> reads_to_lower;
+  vec <gassign *> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
+  reads_to_lower.create (4);
+  writes_to_lower.create (4);
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,16 +3501,30 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
-    goto cleanup;
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+      if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+	goto cleanup;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						  writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3310,7 +3535,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3576,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3380,6 +3627,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   todo |= TODO_cleanup_cfg;
 
  cleanup:
+  reads_to_lower.release ();
+  writes_to_lower.release ();
   if (ifc_bbs)
     {
       unsigned int i;
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..5486aa72a33274db954abf275c2c30dae3accc1c 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1828,6 +1829,206 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   _2 = BIT_FIELD_REF (_1, bitsize, bitpos);
+   _3 = (type) _2;
+
+   where type is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type) - (TYPE_UNSIGNED (type) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   _3 = (type) _2;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = (type) _1;
+   patt2 = patt1 >> bitpos;
+   _3 = patt2 & ((1 << bitsize) - 1);
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!nop_stmt
+      || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (nop_stmt))
+      || TREE_CODE (gimple_assign_rhs1 (nop_stmt)) != SSA_NAME)
+    return NULL;
+
+  gassign *bf_stmt
+    = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (nop_stmt)));
+
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree lhs = TREE_OPERAND (bf_ref, 0);
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
+    return NULL;
+
+  gimple *pattern_stmt;
+  tree ret_type = TREE_TYPE (gimple_assign_lhs (nop_stmt));
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (lhs)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_i = bit_field_size (bf_ref).to_constant ();
+  tree mask = build_int_cst (TREE_TYPE (lhs),
+			     ((1ULL << mask_i) - 1) << shift_n);
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			   BIT_AND_EXPR, lhs, mask);
+  lhs = gimple_get_lhs (pattern_stmt);
+  if (shift_n)
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			      get_vectype_for_scalar_type (vinfo,
+							   TREE_TYPE (lhs)));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			       RSHIFT_EXPR, lhs,
+			       build_int_cst (sizetype, shift_n));
+      lhs = gimple_get_lhs (pattern_stmt);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   _3 = BIT_INSERT_EXPR (_1, _2, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = _2 & mask;		    // Clearing of the non-relevant bits in the
+				    // 'to-write value'.
+   patt2 = patt1 << bitpos;	    // Shift the cleaned value in to place.
+   patt3 = _1 & ~(mask << bitpos);  // Clearing the bits we want to write to,
+				    // from the value we want to write to.
+   _3 = patt3 | patt2;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree load = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree offset = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree load_type = TREE_TYPE (load);
+
+  if (!INTEGRAL_TYPE_P (load_type))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  if (!useless_type_conversion_p (TREE_TYPE (value), load_type))
+    {
+      value = fold_build1 (NOP_EXPR, load_type, value);
+      if (!CONSTANT_CLASS_P (value))
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+				   value);
+	  value = gimple_get_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+	}
+    }
+
+  unsigned HOST_WIDE_INT mask_i = (1ULL << TYPE_PRECISION (bf_type)) - 1;
+  tree mask_t = build_int_cst (load_type, mask_i);
+  /* Clear bits we don't want to write back from value and shift it in place.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   fold_build2 (BIT_AND_EXPR, load_type, value,
+					mask_t));
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			       LSHIFT_EXPR, value, offset);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+  /* Mask off the bits in the loaded value.  */
+  mask_i <<= shift_n;
+  mask_i = ~mask_i;
+  mask_t = build_int_cst (load_type, mask_i);
+
+  tree lhs = vect_recog_temp_ssa_var (load_type, NULL);
+  pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Compose the value to write back.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   BIT_IOR_EXPR, lhs, value);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5824,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */
Richard Biener Aug. 17, 2022, 12:49 p.m. UTC | #3
On Tue, 16 Aug 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> New version of the patch attached, but haven't recreated the ChangeLog yet,
> just waiting to see if this is what you had in mind. See also some replies to
> your comments in-line below:
> 
> On 09/08/2022 15:34, Richard Biener wrote:
> 
> > @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool
> > aggressive_if_conv)
> >     auto_vec<edge> critical_edges;
> >
> >     /* Loop is not well formed.  */
> > -  if (num <= 2 || loop->inner || !single_exit (loop))
> > +  if (num <= 2 || loop->inner)
> >       return false;
> >
> >     body = get_loop_body (loop);
> >
> > this doesn't appear in the ChangeLog nor is it clear to me why it's
> > needed?  Likewise
> So both these and...
> >
> > -  /* Save BB->aux around loop_version as that uses the same field.  */
> > -  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
> > -  void **saved_preds = XALLOCAVEC (void *, save_length);
> > -  for (unsigned i = 0; i < save_length; i++)
> > -    saved_preds[i] = ifc_bbs[i]->aux;
> > +  void **saved_preds = NULL;
> > +  if (any_complicated_phi || need_to_predicate)
> > +    {
> > +      /* Save BB->aux around loop_version as that uses the same field.
> > */
> > +      save_length = loop->inner ? loop->inner->num_nodes :
> > loop->num_nodes;
> > +      saved_preds = XALLOCAVEC (void *, save_length);
> > +      for (unsigned i = 0; i < save_length; i++)
> > +       saved_preds[i] = ifc_bbs[i]->aux;
> > +    }
> >
> > is that just premature optimization?
> 
> .. these changes are to make sure we can still use the loop versioning code
> even for cases where there are bitfields to lower but no ifcvts (i.e. num of
> BBs <= 2).
> I wasn't sure about the loop-inner condition and the small examples I tried it
> seemed to work, that is loop version seems to be able to handle nested loops.
> 
> The single_exit condition is still required for both, because the code to
> create the loop versions depends on it. It does look like I missed this in the
> ChangeLog...
> 
> > +  /* BITSTART and BITEND describe the region we can safely load from
> > inside the
> > +     structure.  BITPOS is the bit position of the value inside the
> > +     representative that we will end up loading OFFSET bytes from the
> > start
> > +     of the struct.  BEST_MODE is the mode describing the optimal size of
> > the
> > +     representative chunk we load.  If this is a write we will store the
> > same
> > +     sized representative back, after we have changed the appropriate
> > bits.  */
> > +  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);
> >
> > I think you need to give up when get_bit_range sets bitstart = bitend to
> > zero
> >
> > +  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
> > +                    TYPE_ALIGN (TREE_TYPE (struct_expr)),
> > +                    INT_MAX, false, &best_mode))
> >
> > +  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> > +                             NULL_TREE, rep_type);
> > +  /* Load from the start of 'offset + bitpos % alignment'.  */
> > +  uint64_t extra_offset = bitpos.to_constant ();
> >
> > you shouldn't build a new FIELD_DECL.  Either you use
> > DECL_BIT_FIELD_REPRESENTATIVE directly or you use a
> > BIT_FIELD_REF accessing the "representative".
> > DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain
> > a variable field offset, you can also subset that with an
> > intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is
> > too large for your taste.
> >
> > I'm not sure all the offset calculation you do is correct, but
> > since you shouldn't invent a new FIELD_DECL it probably needs
> > to change anyway ...
> I can use the DECL_BIT_FIELD_REPRESENTATIVE, but I'll still need some offset
> calculation/extraction. It's easier to example with an example:
> 
> In vect-bitfield-read-3.c the struct:
> typedef struct {
>     int  c;
>     int  b;
>     bool a : 1;
> } struct_t;
> 
> and field access 'vect_false[i].a' or 'vect_true[i].a' will lead to a
> DECL_BIT_FIELD_REPRESENTATIVE of TYPE_SIZE of 8 (and TYPE_PRECISION is also 8
> as expected). However, the DECL_FIELD_OFFSET of either the original field
> decl, the actual bitfield member, or the DECL_BIT_FIELD_REPRESENTATIVE is 0
> and the DECL_FIELD_BIT_OFFSET is 64. These will lead to the correct load:
> _1 = vect_false[i].D;
> 
> D here being the representative is an 8-bit load from vect_false[i] + 64bits.
> So all good there. However, when we construct BIT_FIELD_REF we can't simply
> use DECL_FIELD_BIT_OFFSET (field_decl) as the BIT_FIELD_REF's bitpos.  During
> `verify_gimple` it checks that bitpos + bitsize < TYPE_SIZE (TREE_TYPE (load))
> where BIT_FIELD_REF (load, bitsize, bitpos).

Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield
access - that's the offset within the representative (by construction
both fields share DECL_FIELD_OFFSET).

> So instead I change bitpos such that:
> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
> bitpos -= bitpos.to_constant () / align_of_representative *
> align_of_representative;

?  Not sure why alignment comes into play here?

> I've now rewritten this to:
> poly_int64 q,r;
> if (can_trunc_div_p(bitpos, align_of_representative, &q, &r))
>   bitpos = r;
> 
> It makes it slightly clearer, also because I no longer need the changes to the
> original tree offset as I'm just using D for the load.
>
> > Note that for optimization it will be important that all
> > accesses to the bitfield members of the same bitfield use the
> > same underlying area (CSE and store-forwarding will thank you).
> >
> > +
> > +  need_to_lower_bitfields = bitfields_to_lower_p (loop,
> > &bitfields_to_lower);
> > +  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
> > +      && !need_to_lower_bitfields)
> >       goto cleanup;
> >
> > so we lower bitfields even when we cannot split critical edges?
> > why?
> >
> > +  need_to_ifcvt
> > +    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
> > +  if (!need_to_ifcvt && !need_to_lower_bitfields)
> >       goto cleanup;
> >
> > likewise - if_convertible_loop_p performs other checks, the only
> > one we want to elide is the loop->num_nodes <= 2 check since
> > we want to lower bitfields in single-block loops as well.  That
> > means we only have to scan for bitfield accesses in the first
> > block "prematurely".  So I would interwind the need_to_lower_bitfields
> > into if_convertible_loop_p and if_convertible_loop_p_1 and
> > put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields.
> I'm not sure I understood this. But I'd rather keep the 'need_to_ifcvt' (new)
> and 'need_to_lower_bitfields' separate. One thing I did change is that we no
> longer check for bitfields to lower if there are if-stmts that we can't lower,
> since we will not be vectorizing this loop anyway so no point in wasting time
> lowering bitfields. At the same time though, I'd like to be able to
> lower-bitfields if there are no ifcvts.

Sure.

> > +  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
> > +    {
> > +      pattern_stmt
> > +       = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
> > +                              NOP_EXPR, lhs);
> > +      lhs = gimple_get_lhs (pattern_stmt);
> > +      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
> > +    }
> >
> > hm - so you have for example
> >
> >   int _1 = MEM;
> >   int:3 _2 = BIT_FIELD_REF <_1, ...>
> >   type _3 = (type) _2;
> >
> > and that _3 = (type) _2 is because of integer promotion and you
> > perform all the shifting in that type.  I suppose you should
> > verify that the cast is indeed promoting, not narrowing, since
> > otherwise you'll produce wrong code?  That said, shouldn't you
> > perform the shift / mask in the type of _1 instead?  (the hope
> > is, of course, that typeof (_1) == type in most cases)
> >
> > Similar comments apply to vect_recog_bit_insert_pattern.
> Good shout, hadn't realized that yet because of how the testcases didn't have
> that problem, but when using the REPRESENTATIVE macro they do test that now. I
> don't think the bit_insert is a problem though. In bit_insert, 'value' always
> has the relevant bits starting at its LSB. So regardless of whether the load
> (and store) type is larger or smaller than the type, performing the shifts and
> masks in this type should be OK as you'll only be 'cutting off' the MSB's
> which would be the ones that would get truncated anyway? Or am missing
> something here?

Not sure what you are saying but "yes", all shifting and masking should
happen in the type of the representative.

+  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);

for your convenience there's bitsize_int (bitpos) you can use.

I don't think you are using the correct bitpos though, you fail to
adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.

+                        build_int_cst (bitsizetype, TYPE_PRECISION 
(bf_type)),

the size of the bitfield reference is DECL_SIZE of the original
FIELD_DECL - it might be bigger than the precision of its type.
You probably want to double-check it's equal to the precision
(because of the insert but also because of all the masking) and
refuse to lower if not.

+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill 
TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+                     vec <gassign *> &reads_to_lower,
+                     vec <gassign *> &writes_to_lower)
+{
+  basic_block *bbs = get_loop_body (loop);
+  gimple_stmt_iterator gsi;

as said I'd prefer to do this walk as part of the other walks we
already do - if and if only because get_loop_body () is a DFS
walk over the loop body (you should at least share that).

+         gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+         if (!stmt)
+           continue;
+
+         tree op = gimple_get_lhs (stmt);

gimple_assign_lhs (stmt)

+         bool write = TREE_CODE (op) == COMPONENT_REF;
+
+         if (!write)
+           op = gimple_assign_rhs1 (stmt);
+
+         if (TREE_CODE (op) != COMPONENT_REF)
+           continue;
+
+         if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+           {

rumors say that at least with Ada you can have non-integral, maybe
even aggregate "bitfields", so please add

  && INTEGRAL_TYPE_P (TREE_TYPE (op))

@@ -3269,12 +3474,18 @@ tree_if_conversion (class loop *loop, vec<gimple 
*> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  vec <gassign *> reads_to_lower;
+  vec <gassign *> writes_to_lower;
   bitmap exit_bbs;

you should be able to use auto_vec<> here

  again:
+  reads_to_lower.create (4);
+  writes_to_lower.create (4);

I think repeated .create will not release what is there.  With
auto_vec<> above there's no need to .create, just do
truncate (0) here.

+  tree mask = build_int_cst (TREE_TYPE (lhs),
+                            ((1ULL << mask_i) - 1) << shift_n);

please use wide_int_to_tree (TREE_TYPE (lhs),
                             wi::shifted_mask (shift_n, mask_i, false
  , TYPE_PRECISION (TREE_TYPE (lhs)));

1ULL would better be (unsigned HOST_WIDE_INT)1 or HOST_WIDE_INT_1U.
But note the representative could be __int128_t where uint64_t
mask operations fall apart...

Btw, instead of (val & mask) >> shift it might be better to use
(val >> shift) & mask since the resulting mask values are "smaller"
and maybe easier to code generate?

+   patt1 = _2 & mask;              // Clearing of the non-relevant bits 
in the
+                                   // 'to-write value'.
+   patt2 = patt1 << bitpos;        // Shift the cleaned value in to 
place.
+   patt3 = _1 & ~(mask << bitpos);  // Clearing the bits we want to write 
to,

same here, shifting patt1 first and then masking allows to just
invert the mask (or use andn), no need for two different (constant)
masks?

+      value = fold_build1 (NOP_EXPR, load_type, value);

fold_convert (load_type, value)

+      if (!CONSTANT_CLASS_P (value))
+       {
+         pattern_stmt
+           = gimple_build_assign (vect_recog_temp_ssa_var (load_type, 
NULL),
+                                  value);
+         value = gimple_get_lhs (pattern_stmt);

there's in principle

     gimple_seq stmts = NULL;
     value = gimple_convert (&stmts, load_type, value);
     if (!gimple_seq_empty_p (stmts))
       {
         pattern_stmt = gimple_seq_first_stmt (stmts);
         append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
       }

though a append_pattern_def_seq helper to add a convenience sequence
would be nice to have here.

+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+                          fold_build2 (BIT_AND_EXPR, load_type, value,
+                                       mask_t));

please avoid building GENERIC and then gimple from it.  Either use

  gimple_build_assing (..., BIT_AND_EXPR, load_type, value, mask_t);

or, if you want to fold, use

  result_value = gimple_build (&stmts, BIT_AND_EXPR, load_type, value, 
mask_t);

as above with gimple_convert.  See my comment about the nice to have
helper so you can block-process the 'stmts' sequence as pattern
def sequence.

+  mask_i <<= shift_n;
+  mask_i = ~mask_i;

you have to use wide_ints again, a HOST_WIDE_INT might not be
large enough.

You probably want to double-check your lowering code by
bootstrapping / testing with -ftree-loop-if-convert.

Richard.
Andre Vieira (lists) Aug. 25, 2022, 9:09 a.m. UTC | #4
On 17/08/2022 13:49, Richard Biener wrote:
> Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
> of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield
> access - that's the offset within the representative (by construction
> both fields share DECL_FIELD_OFFSET).
Doh! That makes sense...
>> So instead I change bitpos such that:
>> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
>> bitpos -= bitpos.to_constant () / align_of_representative *
>> align_of_representative;
> ?  Not sure why alignment comes into play here?
Yeah just forget about this... it was my ill attempt at basically doing 
what you described above.
> Not sure what you are saying but "yes", all shifting and masking should
> happen in the type of the representative.
>
> +  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
>
> for your convenience there's bitsize_int (bitpos) you can use.
>
> I don't think you are using the correct bitpos though, you fail to
> adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.
Not sure I understand what you mean? I do adjust it, I've changed it now 
so it should hopefully be clearer.
>
> +                        build_int_cst (bitsizetype, TYPE_PRECISION
> (bf_type)),
>
> the size of the bitfield reference is DECL_SIZE of the original
> FIELD_DECL - it might be bigger than the precision of its type.
> You probably want to double-check it's equal to the precision
> (because of the insert but also because of all the masking) and
> refuse to lower if not.
I added a check for this but out of curiosity, how can the DECL_SIZE of 
a bitfield FIELD_DECL be different than it's type's precision?
>
> +/* Return TRUE if there are bitfields to lower in this LOOP.  Fill
> TO_LOWER
> +   with data structures representing these bitfields.  */
> +
> +static bool
> +bitfields_to_lower_p (class loop *loop,
> +                     vec <gassign *> &reads_to_lower,
> +                     vec <gassign *> &writes_to_lower)
> +{
> +  basic_block *bbs = get_loop_body (loop);
> +  gimple_stmt_iterator gsi;
>
> as said I'd prefer to do this walk as part of the other walks we
> already do - if and if only because get_loop_body () is a DFS
> walk over the loop body (you should at least share that).
I'm now sharing the use of ifc_bbs. The reason why I'd rather not share 
the walk over them is because it becomes quite complex to split out the 
decision to not lower if's because there are none, for which we will 
still want to lower bitfields, versus not lowering if's when they are 
there but aren't lowerable at which point we will forego lowering 
bitfields since we will not vectorize this loop anyway.
>
> +      value = fold_build1 (NOP_EXPR, load_type, value);
>
> fold_convert (load_type, value)
>
> +      if (!CONSTANT_CLASS_P (value))
> +       {
> +         pattern_stmt
> +           = gimple_build_assign (vect_recog_temp_ssa_var (load_type,
> NULL),
> +                                  value);
> +         value = gimple_get_lhs (pattern_stmt);
>
> there's in principle
>
>       gimple_seq stmts = NULL;
>       value = gimple_convert (&stmts, load_type, value);
>       if (!gimple_seq_empty_p (stmts))
>         {
>           pattern_stmt = gimple_seq_first_stmt (stmts);
>           append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
>         }
>
> though a append_pattern_def_seq helper to add a convenience sequence
> would be nice to have here.
Ended up using the existing 'vect_convert_input', seems to do nicely here.
> You probably want to double-check your lowering code by
> bootstrapping / testing with -ftree-loop-if-convert.
Done, this lead me to find a new failure mode, where the type of the 
first operand of BIT_FIELD_REF was a FP type (TF mode), which then lead 
to failures when constructing the masking and shifting. I ended up 
adding a nop-conversion to an INTEGER type of the same width first if 
necessary. Also did a follow-up bootstrap with the addition of 
`-ftree-vectorize` and `-fno-vect-cost-model` to further test the 
codegen. All seems to be working on aarch64-linux-gnu.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..c5c6d937a645e9caa0092c941c52c5192363bbd7 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 
   calculate_dominance_info (CDI_DOMINATORS);
 
-  /* Allow statements that can be handled during if-conversion.  */
-  ifc_bbs = get_loop_body_in_if_conv_order (loop);
-  if (!ifc_bbs)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Irreducible loop\n");
-      return false;
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -2898,18 +2899,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2926,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3004,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3265,200 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_assign_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's
+     precision.  */
+  unsigned HOST_WIDE_INT decl_size = tree_to_uhwi (DECL_SIZE (field_decl));
+  if (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))) != decl_size)
+    return NULL_TREE;
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  if (bitpos)
+    *bitpos
+      = fold_build2 (MINUS_EXPR, bitsizetype,
+		     DECL_FIELD_BIT_OFFSET (field_decl),
+		     DECL_FIELD_BIT_OFFSET (rep_decl));
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  tree bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs (stmt),
+						     new_val));
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = ifc_bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_assign_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NO OK to lower,"
+					" field type is not Integral.\n");
+		  return false;
+		}
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3469,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <gassign *, 4> reads_to_lower;
+  auto_vec <gassign *, 4> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,16 +3494,40 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2
+      && !ifcvt_split_critical_edges (loop, aggressive_if_conv))
     goto cleanup;
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  ifc_bbs = get_loop_body_in_if_conv_order (loop);
+  if (!ifc_bbs)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Irreducible loop\n");
+      goto cleanup;
+    }
+
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
+
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						  writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3310,7 +3538,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3579,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3393,6 +3643,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   if (rloop != NULL)
     {
       loop = rloop;
+      reads_to_lower.truncate (0);
+      writes_to_lower.truncate (0);
       goto again;
     }
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..731b7c2bc1962ff22288c4439679c0b11232cb4a 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1828,6 +1830,294 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   _2 = BIT_FIELD_REF (_1, bitsize, bitpos);
+   _3 = (type_out) _2;
+
+   where type_out is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   _3 = (type_out) _2;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. If the precision of type_out is bigger
+   than the precision type of _1 we perform the widening before the shifting,
+   since the new precision will be large enough to shift the value and moving
+   widening operations up the statement chain enables the generation of
+   widening loads.  If we are widening and the operation after the pattern is
+   an addition then we mask first and shift later, to enable the generation of
+   shifting adds.  In the case of narrowing we will always mask first, shift
+   last and then perform a narrowing operation.  This will enable the
+   generation of narrowing shifts.
+
+   Widening with mask first, shift later:
+   patt1 = (type_out) _1;
+   patt2 = patt1 & (((1 << bitsize) - 1) << bitpos);
+   _3 = patt2 >> bitpos;
+
+   Widening with shift first, mask last:
+   patt1 = (type_out) _1;
+   patt2 = patt1 >> bitpos;
+   _3 = patt2 & ((1 <<bitsize) - 1);
+
+   Narrowing:
+   patt1 = _1 & (((1 << bitsize) - 1) << bitpos);
+   patt2 = patt1 >> bitpos;
+   _3 = (type_out) patt2;
+
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+
+  if (!first_stmt)
+    return NULL;
+
+  gassign *bf_stmt;
+  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
+      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt
+	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      if (!second_stmt || gimple_code (second_stmt) != GIMPLE_ASSIGN
+	  || gimple_assign_rhs_code (second_stmt) != BIT_FIELD_REF)
+	return NULL;
+      bf_stmt = static_cast <gassign *> (second_stmt);
+    }
+  else
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree lhs = TREE_OPERAND (bf_ref, 0);
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)))
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);
+  tree ret_type = TREE_TYPE (ret);
+  bool shift_first = true;
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (lhs)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+  else if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    /* If we are doing the conversion last then also delay the shift as we may
+       be able to combine the shift and conversion in certain cases.  */
+    shift_first = false;
+
+  tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert
+     it to one of the same width so we can perform the necessary masking and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
+    {
+      tree int_type
+	= build_nonstandard_integer_type (TYPE_PRECISION (TREE_TYPE (lhs)),
+					  true);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (int_type, NULL),
+			       NOP_EXPR, lhs);
+      vectype = get_vectype_for_scalar_type (vinfo, int_type);
+      lhs = gimple_assign_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+    }
+
+  /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
+     PLUS_EXPR then do the shift last as some targets can combine the shift and
+     add into a single instruction.  */
+  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+    {
+      if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
+	shift_first = false;
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
+  unsigned int prec = TYPE_PRECISION (TREE_TYPE (lhs));
+  if (shift_first)
+    {
+      if (shift_n)
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs),
+							    NULL),
+				   RSHIFT_EXPR, lhs,
+				   build_int_cst (sizetype, shift_n));
+	  lhs = gimple_assign_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	}
+
+      tree mask = wide_int_to_tree (TREE_TYPE (lhs),
+				    wi::mask (mask_width, false, prec));
+
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs),
+							NULL),
+			       BIT_AND_EXPR, lhs, mask);
+      lhs = gimple_assign_lhs (pattern_stmt);
+    }
+  else
+    {
+      tree mask = wide_int_to_tree (TREE_TYPE (lhs),
+				    wi::shifted_mask (shift_n, mask_width,
+						      false, prec));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs),
+							NULL),
+			       BIT_AND_EXPR, lhs, mask);
+      lhs = gimple_assign_lhs (pattern_stmt);
+      if (shift_n)
+	{
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs),
+							    NULL),
+				   RSHIFT_EXPR, lhs,
+				   build_int_cst (sizetype, shift_n));
+	  lhs = gimple_assign_lhs (pattern_stmt);
+	}
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   _3 = BIT_INSERT_EXPR (_1, _2, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = _2 << bitpos;	      // Shift value into place
+   patt2 = patt1 & (mask << bitpos);  // Clearing of the non-relevant bits in the
+				      // 'to-write value'.
+   patt3 = _1 & ~(mask << bitpos);    // Clearing the bits we want to write to,
+				      // from the value we want to write to.
+   _3 = patt3 | patt2;		      // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree load = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree offset = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree load_type = TREE_TYPE (load);
+
+  if (!INTEGRAL_TYPE_P (load_type))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  vect_unpromoted_value unprom;
+  unprom.set_op (value, vect_internal_def);
+  value = vect_convert_input (vinfo, stmt_info, load_type, &unprom,
+			      get_vectype_for_scalar_type (vinfo, load_type));
+
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			       LSHIFT_EXPR, value, offset);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type);
+  unsigned int prec = TYPE_PRECISION (load_type);
+  tree mask_t
+    = wide_int_to_tree (load_type,
+			wi::shifted_mask (shift_n, mask_width, false, prec));
+
+  /* Clear bits we don't want to write back from value and shift it in place.  */
+  gimple_seq stmts = NULL;
+  value = gimple_build (&stmts, BIT_AND_EXPR, load_type, value, mask_t);
+  if (!gimple_seq_empty_p (stmts))
+    {
+      pattern_stmt = gimple_seq_first_stmt (stmts);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  /* Mask off the bits in the loaded value.  */
+  mask_t = wide_int_to_tree (load_type,
+			     wi::shifted_mask (shift_n, mask_width, true, prec));
+  tree lhs = vect_recog_temp_ssa_var (load_type, NULL);
+  pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Compose the value to write back.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   BIT_IOR_EXPR, lhs, value);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5913,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */
Andre Vieira (lists) Sept. 8, 2022, 9:07 a.m. UTC | #5
Ping.

On 25/08/2022 10:09, Andre Vieira (lists) via Gcc-patches wrote:
>
> On 17/08/2022 13:49, Richard Biener wrote:
>> Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
>> of the representative from DECL_FIELD_BIT_OFFSET of the original 
>> bitfield
>> access - that's the offset within the representative (by construction
>> both fields share DECL_FIELD_OFFSET).
> Doh! That makes sense...
>>> So instead I change bitpos such that:
>>> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
>>> bitpos -= bitpos.to_constant () / align_of_representative *
>>> align_of_representative;
>> ?  Not sure why alignment comes into play here?
> Yeah just forget about this... it was my ill attempt at basically 
> doing what you described above.
>> Not sure what you are saying but "yes", all shifting and masking should
>> happen in the type of the representative.
>>
>> +  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
>>
>> for your convenience there's bitsize_int (bitpos) you can use.
>>
>> I don't think you are using the correct bitpos though, you fail to
>> adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.
> Not sure I understand what you mean? I do adjust it, I've changed it 
> now so it should hopefully be clearer.
>>
>> +                        build_int_cst (bitsizetype, TYPE_PRECISION
>> (bf_type)),
>>
>> the size of the bitfield reference is DECL_SIZE of the original
>> FIELD_DECL - it might be bigger than the precision of its type.
>> You probably want to double-check it's equal to the precision
>> (because of the insert but also because of all the masking) and
>> refuse to lower if not.
> I added a check for this but out of curiosity, how can the DECL_SIZE 
> of a bitfield FIELD_DECL be different than it's type's precision?
>>
>> +/* Return TRUE if there are bitfields to lower in this LOOP. Fill
>> TO_LOWER
>> +   with data structures representing these bitfields.  */
>> +
>> +static bool
>> +bitfields_to_lower_p (class loop *loop,
>> +                     vec <gassign *> &reads_to_lower,
>> +                     vec <gassign *> &writes_to_lower)
>> +{
>> +  basic_block *bbs = get_loop_body (loop);
>> +  gimple_stmt_iterator gsi;
>>
>> as said I'd prefer to do this walk as part of the other walks we
>> already do - if and if only because get_loop_body () is a DFS
>> walk over the loop body (you should at least share that).
> I'm now sharing the use of ifc_bbs. The reason why I'd rather not 
> share the walk over them is because it becomes quite complex to split 
> out the decision to not lower if's because there are none, for which 
> we will still want to lower bitfields, versus not lowering if's when 
> they are there but aren't lowerable at which point we will forego 
> lowering bitfields since we will not vectorize this loop anyway.
>>
>> +      value = fold_build1 (NOP_EXPR, load_type, value);
>>
>> fold_convert (load_type, value)
>>
>> +      if (!CONSTANT_CLASS_P (value))
>> +       {
>> +         pattern_stmt
>> +           = gimple_build_assign (vect_recog_temp_ssa_var (load_type,
>> NULL),
>> +                                  value);
>> +         value = gimple_get_lhs (pattern_stmt);
>>
>> there's in principle
>>
>>       gimple_seq stmts = NULL;
>>       value = gimple_convert (&stmts, load_type, value);
>>       if (!gimple_seq_empty_p (stmts))
>>         {
>>           pattern_stmt = gimple_seq_first_stmt (stmts);
>>           append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
>>         }
>>
>> though a append_pattern_def_seq helper to add a convenience sequence
>> would be nice to have here.
> Ended up using the existing 'vect_convert_input', seems to do nicely 
> here.
>> You probably want to double-check your lowering code by
>> bootstrapping / testing with -ftree-loop-if-convert.
> Done, this lead me to find a new failure mode, where the type of the 
> first operand of BIT_FIELD_REF was a FP type (TF mode), which then 
> lead to failures when constructing the masking and shifting. I ended 
> up adding a nop-conversion to an INTEGER type of the same width first 
> if necessary. Also did a follow-up bootstrap with the addition of 
> `-ftree-vectorize` and `-fno-vect-cost-model` to further test the 
> codegen. All seems to be working on aarch64-linux-gnu.
Richard Biener Sept. 8, 2022, 11:51 a.m. UTC | #6
On Thu, 25 Aug 2022, Andre Vieira (lists) wrote:

> 
> On 17/08/2022 13:49, Richard Biener wrote:
> > Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
> > of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield
> > access - that's the offset within the representative (by construction
> > both fields share DECL_FIELD_OFFSET).
> Doh! That makes sense...
> >> So instead I change bitpos such that:
> >> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
> >> bitpos -= bitpos.to_constant () / align_of_representative *
> >> align_of_representative;
> > ?  Not sure why alignment comes into play here?
> Yeah just forget about this... it was my ill attempt at basically doing what
> you described above.
> > Not sure what you are saying but "yes", all shifting and masking should
> > happen in the type of the representative.
> >
> > +  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
> >
> > for your convenience there's bitsize_int (bitpos) you can use.
> >
> > I don't think you are using the correct bitpos though, you fail to
> > adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.
> Not sure I understand what you mean? I do adjust it, I've changed it now so it
> should hopefully be clearer.
> >
> > +                        build_int_cst (bitsizetype, TYPE_PRECISION
> > (bf_type)),
> >
> > the size of the bitfield reference is DECL_SIZE of the original
> > FIELD_DECL - it might be bigger than the precision of its type.
> > You probably want to double-check it's equal to the precision
> > (because of the insert but also because of all the masking) and
> > refuse to lower if not.
> I added a check for this but out of curiosity, how can the DECL_SIZE of a
> bitfield FIELD_DECL be different than it's type's precision?

It's probably not possible to create a C testcase but I don't see
what makes this impossible in general to have padding in a bitfield 
object.

> >
> > +/* Return TRUE if there are bitfields to lower in this LOOP.  Fill
> > TO_LOWER
> > +   with data structures representing these bitfields.  */
> > +
> > +static bool
> > +bitfields_to_lower_p (class loop *loop,
> > +                     vec <gassign *> &reads_to_lower,
> > +                     vec <gassign *> &writes_to_lower)
> > +{
> > +  basic_block *bbs = get_loop_body (loop);
> > +  gimple_stmt_iterator gsi;
> >
> > as said I'd prefer to do this walk as part of the other walks we
> > already do - if and if only because get_loop_body () is a DFS
> > walk over the loop body (you should at least share that).
> I'm now sharing the use of ifc_bbs. The reason why I'd rather not share the
> walk over them is because it becomes quite complex to split out the decision
> to not lower if's because there are none, for which we will still want to
> lower bitfields, versus not lowering if's when they are there but aren't
> lowerable at which point we will forego lowering bitfields since we will not
> vectorize this loop anyway.
> >
> > +      value = fold_build1 (NOP_EXPR, load_type, value);
> >
> > fold_convert (load_type, value)
> >
> > +      if (!CONSTANT_CLASS_P (value))
> > +       {
> > +         pattern_stmt
> > +           = gimple_build_assign (vect_recog_temp_ssa_var (load_type,
> > NULL),
> > +                                  value);
> > +         value = gimple_get_lhs (pattern_stmt);
> >
> > there's in principle
> >
> >       gimple_seq stmts = NULL;
> >       value = gimple_convert (&stmts, load_type, value);
> >       if (!gimple_seq_empty_p (stmts))
> >         {
> >           pattern_stmt = gimple_seq_first_stmt (stmts);
> >           append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
> >         }
> >
> > though a append_pattern_def_seq helper to add a convenience sequence
> > would be nice to have here.
> Ended up using the existing 'vect_convert_input', seems to do nicely here.
> > You probably want to double-check your lowering code by
> > bootstrapping / testing with -ftree-loop-if-convert.
> Done, this lead me to find a new failure mode, where the type of the first
> operand of BIT_FIELD_REF was a FP type (TF mode), which then lead to failures
> when constructing the masking and shifting. I ended up adding a nop-conversion
> to an INTEGER type of the same width first if necessary.

You want a VIEW_CONVERT (aka bit-cast) here.

> Also did a follow-up
> bootstrap with the addition of `-ftree-vectorize` and `-fno-vect-cost-model`
> to further test the codegen. All seems to be working on aarch64-linux-gnu.

+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+                 tree *struct_expr)
...
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the 
BF's
+     precision.  */
+  unsigned HOST_WIDE_INT decl_size = tree_to_uhwi (DECL_SIZE 
(field_decl));
+  if (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))) != decl_size)
+    return NULL_TREE;

you can

use compare_tree_int (DECL_SIZE (field_decl), TYPE_PRECISION (...)) != 0

which avoids caring for the case the size isn't a uhwi ...

+      gimple *new_stmt = gimple_build_assign (unshare_expr 
(rep_comp_ref),
+                                             new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;

you can use gimple_move_vops (new_stmt, stmt); here

+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+                        build_int_cst (bitsizetype, TYPE_PRECISION 
(bf_type)),
+                        bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs 
(stmt),
+                                                    new_val));

I'm curious, why the push to redundant_ssa_names?  That could use
a comment ...

+  need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+                                                 writes_to_lower);

do we want to conditionalize this on flag_tree_loop_vectorize?  That is,
I think the lowering should for now happen only on the loop version
guarded by .IFN_VECTORIZED.  There's

  if ((need_to_predicate || any_complicated_phi)
      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
          || loop->dont_vectorize))
    goto cleanup;

for the cases that will force versioning, but I think we should
simply not lower bitfields in the

         ((!flag_tree_loop_vectorize && !loop->force_vectorize)
          || loop->dont_vectorize)

case?

+      if (!second_stmt || gimple_code (second_stmt) != GIMPLE_ASSIGN
+         || gimple_assign_rhs_code (second_stmt) != BIT_FIELD_REF)
+       return NULL;

the first || goes to a new line

+      bf_stmt = static_cast <gassign *> (second_stmt);

"nicer" and shorter is

       bf_stmt = dyn_cast <gassign *> (second_stmt);
       if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
         return NULL;

+  tree lhs = TREE_OPERAND (bf_ref, 0);
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)))
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);

just when reading, generic variables like 'lhs' are not helpful
(when they are not an actual lhs even less so ...).
You have nice docs ontop of the function - when you use
atual names for _2 = BIT_FIELD_REF (_1, ...) variables you can
even use them in the code so docs and code match up nicely.

+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, 
convert
+     it to one of the same width so we can perform the necessary masking 
and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
+    {
+      tree int_type
+       = build_nonstandard_integer_type (TYPE_PRECISION (TREE_TYPE 
(lhs)),
+                                         true);

so you probably run into this from code that's not lowered from
original bitfield reads?  Note you should use TYPE_SIZE here,
definitely not TYPE_PRECISION on arbitrary types (if its a vector
type then that will yield the number of units for example).

+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant 
();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant 
();

is there anything that prevents this to run on VLA vector extractions?
I think it would be nice to test constantness at the start of the
function.

+         pattern_stmt
+           = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE 
(lhs),
+                                                           NULL),

eh, seeing that multiple times the vect_recog_temp_ssa_var needs
a defaulted NULL second argument ...

Note I fear we will have endianess issues when translating
bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts.  Rules
for memory and register operations do not match up (IIRC, I repeatedly
run into issues here myself).  The testcases all look like they
won't catch this - I think an example would be sth like
struct X { unsigned a : 23; unsigned b : 9; }, can you see to do
testing on a big-endian target?

Otherwise the patch looks good, so there's only minor things to
fix up (in case the endianess issue turns out to be a non-issue).

Sorry for the delay in reviewing.

Thanks,
Richard.
Andre Vieira (lists) Sept. 26, 2022, 3:23 p.m. UTC | #7
On 08/09/2022 12:51, Richard Biener wrote:
>
> I'm curious, why the push to redundant_ssa_names?  That could use
> a comment ...
So I purposefully left a #if 0 #else #endif in there so you can see the 
two options. But the reason I used redundant_ssa_names is because ifcvt 
seems to use that as a container for all pairs of (old, new) ssa names 
to replace later. So I just piggy backed on that. I don't know if 
there's a specific reason they do the replacement at the end? Maybe some 
ordering issue? Either way both adding it to redundant_ssa_names or 
doing the replacement inline work for the bitfield lowering (or work in 
my testing at least).
> Note I fear we will have endianess issues when translating
> bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts.  Rules
> for memory and register operations do not match up (IIRC, I repeatedly
> run into issues here myself).  The testcases all look like they
> won't catch this - I think an example would be sth like
> struct X { unsigned a : 23; unsigned b : 9; }, can you see to do
> testing on a big-endian target?
I've done some testing and you were right, it did fall apart on 
big-endian. I fixed it by changing the way we compute the 'shift' value 
and added two extra testcases for read and write each.
>
> Sorry for the delay in reviewing.
No worries, apologies myself for the delay in reworking this, had a nice 
little week holiday in between :)

I'll write the ChangeLogs once the patch has stabilized.

Thanks,
Andre
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..1dc24d3eded192144dc9ad94589b4c5c3d999e65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 9;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..7d24c29975865883a7cdc7aa057fbb6bf413e0bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 8;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..fae6ea3557dcaba7b330ebdaa471281d33d2ba15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 9;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..99360c2967b076212c67eb4f34b8fd91711d8821
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 8;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..ee6226b7bee713598141468de00728abff675e52 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 
   calculate_dominance_info (CDI_DOMINATORS);
 
-  /* Allow statements that can be handled during if-conversion.  */
-  ifc_bbs = get_loop_body_in_if_conv_order (loop);
-  if (!ifc_bbs)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Irreducible loop\n");
-      return false;
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -2898,18 +2899,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2926,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3004,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3265,202 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_assign_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's
+     precision.  */
+  unsigned HOST_WIDE_INT bf_prec
+    = TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt)));
+  if (compare_tree_int (DECL_SIZE (field_decl), bf_prec) != 0)
+    return NULL_TREE;
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  if (bitpos)
+    *bitpos
+      = fold_build2 (MINUS_EXPR, bitsizetype,
+		     DECL_FIELD_BIT_OFFSET (field_decl),
+		     DECL_FIELD_BIT_OFFSET (rep_decl));
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  tree bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+#if 0
+      redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs (stmt),
+						     new_val));
+#else
+    replace_uses_by (gimple_assign_lhs (stmt), new_val);
+#endif
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = ifc_bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_assign_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NO OK to lower,"
+					" field type is not Integral.\n");
+		  return false;
+		}
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3471,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <gassign *, 4> reads_to_lower;
+  auto_vec <gassign *, 4> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,16 +3496,42 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2
+      && !ifcvt_split_critical_edges (loop, aggressive_if_conv))
     goto cleanup;
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  ifc_bbs = get_loop_body_in_if_conv_order (loop);
+  if (!ifc_bbs)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Irreducible loop\n");
+      goto cleanup;
+    }
+
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
+
+  if ((flag_tree_loop_vectorize || loop->force_vectorize)
+      && !loop->dont_vectorize)
+    need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						    writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3310,7 +3542,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3583,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3393,6 +3647,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   if (rloop != NULL)
     {
       loop = rloop;
+      reads_to_lower.truncate (0);
+      writes_to_lower.truncate (0);
       goto again;
     }
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..9042599f04399eca37fe9038d2bd5c9f78e3a9e4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -663,7 +665,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
    is NULL, the caller must set SSA_NAME_DEF_STMT for the returned SSA var. */
 
 static tree
-vect_recog_temp_ssa_var (tree type, gimple *stmt)
+vect_recog_temp_ssa_var (tree type, gimple *stmt = NULL)
 {
   return make_temp_ssa_name (type, stmt, "patt");
 }
@@ -1828,6 +1830,329 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
+   result = (type_out) bf_value;
+
+   where type_out is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   result = (type_out) bf_value;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. If the precision of type_out is bigger
+   than the precision type of _1 we perform the widening before the shifting,
+   since the new precision will be large enough to shift the value and moving
+   widening operations up the statement chain enables the generation of
+   widening loads.  If we are widening and the operation after the pattern is
+   an addition then we mask first and shift later, to enable the generation of
+   shifting adds.  In the case of narrowing we will always mask first, shift
+   last and then perform a narrowing operation.  This will enable the
+   generation of narrowing shifts.
+
+   Widening with mask first, shift later:
+   container = (type_out) container;
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = patt2 >> masked;
+
+   Widening with shift first, mask last:
+   container = (type_out) container;
+   shifted = container >> bitpos;
+   result = shifted & ((1 << bitsize) - 1);
+
+   Narrowing:
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = masked >> bitpos;
+   result = (type_out) result;
+
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+
+  if (!first_stmt)
+    return NULL;
+
+  gassign *bf_stmt;
+  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
+      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt
+	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      bf_stmt = dyn_cast <gassign *> (second_stmt);
+      if (!bf_stmt
+	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+	return NULL;
+    }
+  else
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree container = TREE_OPERAND (bf_ref, 0);
+
+  if (!bit_field_offset (bf_ref).is_constant ()
+      || !bit_field_size (bf_ref).is_constant ()
+      || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (container))))
+    return NULL;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)))
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);
+  tree ret_type = TREE_TYPE (ret);
+  bool shift_first = true;
+  tree vectype;
+
+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert
+     it to one of the same width so we can perform the necessary masking and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (container)))
+    {
+      unsigned HOST_WIDE_INT container_size =
+	tree_to_uhwi (TYPE_SIZE (TREE_TYPE (container)));
+      tree int_type = build_nonstandard_integer_type (container_size, true);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (int_type),
+			       VIEW_CONVERT_EXPR, container);
+      vectype = get_vectype_for_scalar_type (vinfo, int_type);
+      container = gimple_assign_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+    }
+  else
+    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (container));
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, container);
+      container = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+  else if (!useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    /* If we are doing the conversion last then also delay the shift as we may
+       be able to combine the shift and conversion in certain cases.  */
+    shift_first = false;
+
+  tree container_type = TREE_TYPE (container);
+
+  /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
+     PLUS_EXPR then do the shift last as some targets can combine the shift and
+     add into a single instruction.  */
+  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+    {
+      if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
+	shift_first = false;
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  if (BYTES_BIG_ENDIAN)
+    shift_n = prec - shift_n - mask_width;
+
+  /* If we don't have to shift we only generate the mask, so just fix the
+     code-path to shift_first.  */
+  if (shift_n == 0)
+    shift_first = true;
+
+  tree result;
+  if (shift_first)
+    {
+      tree shifted = container;
+      if (shift_n)
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+				   RSHIFT_EXPR, container,
+				   build_int_cst (sizetype, shift_n));
+	  shifted = gimple_assign_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	}
+
+      tree mask = wide_int_to_tree (container_type,
+				    wi::mask (mask_width, false, prec));
+
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, shifted, mask);
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+  else
+    {
+      tree mask = wide_int_to_tree (container_type,
+				    wi::shifted_mask (shift_n, mask_width,
+						      false, prec));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, container, mask);
+      tree masked = gimple_assign_lhs (pattern_stmt);
+
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       RSHIFT_EXPR, masked,
+			       build_int_cst (sizetype, shift_n));
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (result), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, result);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   written = BIT_INSERT_EXPR (container, value, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   value = (container_type) value;	    // Make sure
+   shifted = value << bitpos;		    // Shift value into place
+   masked = shifted & (mask << bitpos);	    // Mask off the non-relevant bits in
+					    // the 'to-write value'.
+   cleared = container & ~(mask << bitpos); // Clearing the bits we want to
+					    // write to from the value we want
+					    // to write to.
+   written = cleared | masked;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (value)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree container = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree shift = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree container_type = TREE_TYPE (container);
+
+  if (!INTEGRAL_TYPE_P (container_type)
+      || !tree_fits_uhwi_p (TYPE_SIZE (container_type)))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  vect_unpromoted_value unprom;
+  unprom.set_op (value, vect_internal_def);
+  value = vect_convert_input (vinfo, stmt_info, container_type, &unprom,
+			      get_vectype_for_scalar_type (vinfo,
+							   container_type));
+
+  unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type);
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (shift);
+  if (BYTES_BIG_ENDIAN)
+    {
+      shift_n = prec - shift_n - mask_width;
+      shift = build_int_cst (TREE_TYPE (shift), shift_n);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (value), container_type))
+    {
+      pattern_stmt =
+	gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			     NOP_EXPR, value);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+
+  /* Shift VALUE into place.  */
+  tree shifted = value;
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       LSHIFT_EXPR, value, shift);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      shifted = gimple_get_lhs (pattern_stmt);
+    }
+
+  tree mask_t
+    = wide_int_to_tree (container_type,
+			wi::shifted_mask (shift_n, mask_width, false, prec));
+
+  /* Clear bits we don't want to write back from SHIFTED.  */
+  gimple_seq stmts = NULL;
+  tree masked = gimple_build (&stmts, BIT_AND_EXPR, container_type, shifted,
+			      mask_t);
+  if (!gimple_seq_empty_p (stmts))
+    {
+      pattern_stmt = gimple_seq_first_stmt (stmts);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  /* Mask off the bits in the container that we are to write to.  */
+  mask_t = wide_int_to_tree (container_type,
+			     wi::shifted_mask (shift_n, mask_width, true, prec));
+  tree cleared = vect_recog_temp_ssa_var (container_type);
+  pattern_stmt = gimple_build_assign (cleared, BIT_AND_EXPR, container, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Write MASKED into CLEARED.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			   BIT_IOR_EXPR, cleared, masked);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5948,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */
Richard Biener Sept. 27, 2022, 12:34 p.m. UTC | #8
On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:

> 
> On 08/09/2022 12:51, Richard Biener wrote:
> >
> > I'm curious, why the push to redundant_ssa_names?  That could use
> > a comment ...
> So I purposefully left a #if 0 #else #endif in there so you can see the two
> options. But the reason I used redundant_ssa_names is because ifcvt seems to
> use that as a container for all pairs of (old, new) ssa names to replace
> later. So I just piggy backed on that. I don't know if there's a specific
> reason they do the replacement at the end? Maybe some ordering issue? Either
> way both adding it to redundant_ssa_names or doing the replacement inline work
> for the bitfield lowering (or work in my testing at least).

Possibly because we (in the past?) inserted/copied stuff based on
predicates generated at analysis time after we decide to elide something
so we need to watch for later appearing uses.  But who knows ... my mind
fails me here.

If it works to replace uses immediately please do so.  But now
I wonder why we need this - the value shouldn't change so you
should get away with re-using the existing SSA name for the final value?

> > Note I fear we will have endianess issues when translating
> > bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts.  Rules
> > for memory and register operations do not match up (IIRC, I repeatedly
> > run into issues here myself).  The testcases all look like they
> > won't catch this - I think an example would be sth like
> > struct X { unsigned a : 23; unsigned b : 9; }, can you see to do
> > testing on a big-endian target?
> I've done some testing and you were right, it did fall apart on big-endian. I
> fixed it by changing the way we compute the 'shift' value and added two extra
> testcases for read and write each.
> >
> > Sorry for the delay in reviewing.
> No worries, apologies myself for the delay in reworking this, had a nice
> little week holiday in between :)
> 
> I'll write the ChangeLogs once the patch has stabilized.

Thanks,
Richard.
Andre Vieira (lists) Sept. 28, 2022, 9:43 a.m. UTC | #9
On 27/09/2022 13:34, Richard Biener wrote:
> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
>
>> On 08/09/2022 12:51, Richard Biener wrote:
>>> I'm curious, why the push to redundant_ssa_names?  That could use
>>> a comment ...
>> So I purposefully left a #if 0 #else #endif in there so you can see the two
>> options. But the reason I used redundant_ssa_names is because ifcvt seems to
>> use that as a container for all pairs of (old, new) ssa names to replace
>> later. So I just piggy backed on that. I don't know if there's a specific
>> reason they do the replacement at the end? Maybe some ordering issue? Either
>> way both adding it to redundant_ssa_names or doing the replacement inline work
>> for the bitfield lowering (or work in my testing at least).
> Possibly because we (in the past?) inserted/copied stuff based on
> predicates generated at analysis time after we decide to elide something
> so we need to watch for later appearing uses.  But who knows ... my mind
> fails me here.
>
> If it works to replace uses immediately please do so.  But now
> I wonder why we need this - the value shouldn't change so you
> should get away with re-using the existing SSA name for the final value?

Yeah... good point. A quick change and minor testing seems to agree. I'm sure I had a good reason to do it initially ;)

I'll run a full-regression on this change to make sure I didn't miss anything.
Andre Vieira (lists) Sept. 28, 2022, 5:31 p.m. UTC | #10
Made the change and also created the ChangeLogs.

gcc/ChangeLog:

         * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of 
loop bb's from here...
         (tree_if_conversion): ... to here.  Also call bitfield lowering 
when appropriate.
         (version_loop_for_if_conversion): Adapt to enable loop 
versioning when we only need
         to lower bitfields.
         (ifcvt_split_critical_edges): Relax condition of expected loop 
form as this is checked earlier.
         (get_bitfield_rep): New function.
         (lower_bitfield): Likewise.
         (bitfields_to_lower_p): Likewise.
         (need_to_lower_bitfields): New global boolean.
         (need_to_ifcvt): Likewise.
         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): 
Improve diagnostic message.
         * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default 
value for last parameter.
         (vect_recog_bitfield_ref_pattern): New.
         (vect_recog_bit_insert_pattern): New.

gcc/testsuite/ChangeLog:

         * gcc.dg/vect/vect-bitfield-read-1.c: New test.
         * gcc.dg/vect/vect-bitfield-read-2.c: New test.
         * gcc.dg/vect/vect-bitfield-read-3.c: New test.
         * gcc.dg/vect/vect-bitfield-read-4.c: New test.
         * gcc.dg/vect/vect-bitfield-read-5.c: New test.
         * gcc.dg/vect/vect-bitfield-read-6.c: New test.
         * gcc.dg/vect/vect-bitfield-write-1.c: New test.
         * gcc.dg/vect/vect-bitfield-write-2.c: New test.
         * gcc.dg/vect/vect-bitfield-write-3.c: New test.
         * gcc.dg/vect/vect-bitfield-write-4.c: New test.
         * gcc.dg/vect/vect-bitfield-write-5.c: New test.

On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
>
> On 27/09/2022 13:34, Richard Biener wrote:
>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
>>
>>> On 08/09/2022 12:51, Richard Biener wrote:
>>>> I'm curious, why the push to redundant_ssa_names?  That could use
>>>> a comment ...
>>> So I purposefully left a #if 0 #else #endif in there so you can see 
>>> the two
>>> options. But the reason I used redundant_ssa_names is because ifcvt 
>>> seems to
>>> use that as a container for all pairs of (old, new) ssa names to 
>>> replace
>>> later. So I just piggy backed on that. I don't know if there's a 
>>> specific
>>> reason they do the replacement at the end? Maybe some ordering 
>>> issue? Either
>>> way both adding it to redundant_ssa_names or doing the replacement 
>>> inline work
>>> for the bitfield lowering (or work in my testing at least).
>> Possibly because we (in the past?) inserted/copied stuff based on
>> predicates generated at analysis time after we decide to elide something
>> so we need to watch for later appearing uses.  But who knows ... my mind
>> fails me here.
>>
>> If it works to replace uses immediately please do so.  But now
>> I wonder why we need this - the value shouldn't change so you
>> should get away with re-using the existing SSA name for the final value?
>
> Yeah... good point. A quick change and minor testing seems to agree. 
> I'm sure I had a good reason to do it initially ;)
>
> I'll run a full-regression on this change to make sure I didn't miss 
> anything.
>
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..1dc24d3eded192144dc9ad94589b4c5c3d999e65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 9;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..7d24c29975865883a7cdc7aa057fbb6bf413e0bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 8;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..fae6ea3557dcaba7b330ebdaa471281d33d2ba15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 9;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..99360c2967b076212c67eb4f34b8fd91711d8821
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 8;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..d13b2fa6661d56e911bb9ec37cd3a9885fa653bb 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 
   calculate_dominance_info (CDI_DOMINATORS);
 
-  /* Allow statements that can be handled during if-conversion.  */
-  ifc_bbs = get_loop_body_in_if_conv_order (loop);
-  if (!ifc_bbs)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Irreducible loop\n");
-      return false;
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -2898,18 +2899,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2926,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3004,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3265,201 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_assign_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's
+     precision.  */
+  unsigned HOST_WIDE_INT bf_prec
+    = TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt)));
+  if (compare_tree_int (DECL_SIZE (field_decl), bf_prec) != 0)
+    return NULL_TREE;
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  if (bitpos)
+    *bitpos
+      = fold_build2 (MINUS_EXPR, bitsizetype,
+		     DECL_FIELD_BIT_OFFSET (field_decl),
+		     DECL_FIELD_BIT_OFFSET (rep_decl));
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  tree bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+
+      gimple *new_stmt = gimple_build_assign (gimple_assign_lhs (stmt),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = ifc_bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_assign_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NO OK to lower,"
+					" field type is not Integral.\n");
+		  return false;
+		}
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3470,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <gassign *, 4> reads_to_lower;
+  auto_vec <gassign *, 4> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,16 +3495,42 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2
+      && !ifcvt_split_critical_edges (loop, aggressive_if_conv))
     goto cleanup;
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  ifc_bbs = get_loop_body_in_if_conv_order (loop);
+  if (!ifc_bbs)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Irreducible loop\n");
+      goto cleanup;
+    }
+
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
+
+  if ((flag_tree_loop_vectorize || loop->force_vectorize)
+      && !loop->dont_vectorize)
+    need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						    writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3310,7 +3541,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3582,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3393,6 +3646,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   if (rloop != NULL)
     {
       loop = rloop;
+      reads_to_lower.truncate (0);
+      writes_to_lower.truncate (0);
       goto again;
     }
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..9042599f04399eca37fe9038d2bd5c9f78e3a9e4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -663,7 +665,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
    is NULL, the caller must set SSA_NAME_DEF_STMT for the returned SSA var. */
 
 static tree
-vect_recog_temp_ssa_var (tree type, gimple *stmt)
+vect_recog_temp_ssa_var (tree type, gimple *stmt = NULL)
 {
   return make_temp_ssa_name (type, stmt, "patt");
 }
@@ -1828,6 +1830,329 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
+   result = (type_out) bf_value;
+
+   where type_out is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   result = (type_out) bf_value;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. If the precision of type_out is bigger
+   than the precision type of _1 we perform the widening before the shifting,
+   since the new precision will be large enough to shift the value and moving
+   widening operations up the statement chain enables the generation of
+   widening loads.  If we are widening and the operation after the pattern is
+   an addition then we mask first and shift later, to enable the generation of
+   shifting adds.  In the case of narrowing we will always mask first, shift
+   last and then perform a narrowing operation.  This will enable the
+   generation of narrowing shifts.
+
+   Widening with mask first, shift later:
+   container = (type_out) container;
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = patt2 >> masked;
+
+   Widening with shift first, mask last:
+   container = (type_out) container;
+   shifted = container >> bitpos;
+   result = shifted & ((1 << bitsize) - 1);
+
+   Narrowing:
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = masked >> bitpos;
+   result = (type_out) result;
+
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+
+  if (!first_stmt)
+    return NULL;
+
+  gassign *bf_stmt;
+  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
+      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt
+	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      bf_stmt = dyn_cast <gassign *> (second_stmt);
+      if (!bf_stmt
+	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+	return NULL;
+    }
+  else
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree container = TREE_OPERAND (bf_ref, 0);
+
+  if (!bit_field_offset (bf_ref).is_constant ()
+      || !bit_field_size (bf_ref).is_constant ()
+      || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (container))))
+    return NULL;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)))
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);
+  tree ret_type = TREE_TYPE (ret);
+  bool shift_first = true;
+  tree vectype;
+
+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert
+     it to one of the same width so we can perform the necessary masking and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (container)))
+    {
+      unsigned HOST_WIDE_INT container_size =
+	tree_to_uhwi (TYPE_SIZE (TREE_TYPE (container)));
+      tree int_type = build_nonstandard_integer_type (container_size, true);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (int_type),
+			       VIEW_CONVERT_EXPR, container);
+      vectype = get_vectype_for_scalar_type (vinfo, int_type);
+      container = gimple_assign_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+    }
+  else
+    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (container));
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, container);
+      container = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+  else if (!useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    /* If we are doing the conversion last then also delay the shift as we may
+       be able to combine the shift and conversion in certain cases.  */
+    shift_first = false;
+
+  tree container_type = TREE_TYPE (container);
+
+  /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
+     PLUS_EXPR then do the shift last as some targets can combine the shift and
+     add into a single instruction.  */
+  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+    {
+      if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
+	shift_first = false;
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  if (BYTES_BIG_ENDIAN)
+    shift_n = prec - shift_n - mask_width;
+
+  /* If we don't have to shift we only generate the mask, so just fix the
+     code-path to shift_first.  */
+  if (shift_n == 0)
+    shift_first = true;
+
+  tree result;
+  if (shift_first)
+    {
+      tree shifted = container;
+      if (shift_n)
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+				   RSHIFT_EXPR, container,
+				   build_int_cst (sizetype, shift_n));
+	  shifted = gimple_assign_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	}
+
+      tree mask = wide_int_to_tree (container_type,
+				    wi::mask (mask_width, false, prec));
+
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, shifted, mask);
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+  else
+    {
+      tree mask = wide_int_to_tree (container_type,
+				    wi::shifted_mask (shift_n, mask_width,
+						      false, prec));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, container, mask);
+      tree masked = gimple_assign_lhs (pattern_stmt);
+
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       RSHIFT_EXPR, masked,
+			       build_int_cst (sizetype, shift_n));
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (result), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, result);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   written = BIT_INSERT_EXPR (container, value, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   value = (container_type) value;	    // Make sure
+   shifted = value << bitpos;		    // Shift value into place
+   masked = shifted & (mask << bitpos);	    // Mask off the non-relevant bits in
+					    // the 'to-write value'.
+   cleared = container & ~(mask << bitpos); // Clearing the bits we want to
+					    // write to from the value we want
+					    // to write to.
+   written = cleared | masked;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (value)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree container = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree shift = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree container_type = TREE_TYPE (container);
+
+  if (!INTEGRAL_TYPE_P (container_type)
+      || !tree_fits_uhwi_p (TYPE_SIZE (container_type)))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  vect_unpromoted_value unprom;
+  unprom.set_op (value, vect_internal_def);
+  value = vect_convert_input (vinfo, stmt_info, container_type, &unprom,
+			      get_vectype_for_scalar_type (vinfo,
+							   container_type));
+
+  unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type);
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (shift);
+  if (BYTES_BIG_ENDIAN)
+    {
+      shift_n = prec - shift_n - mask_width;
+      shift = build_int_cst (TREE_TYPE (shift), shift_n);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (value), container_type))
+    {
+      pattern_stmt =
+	gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			     NOP_EXPR, value);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+
+  /* Shift VALUE into place.  */
+  tree shifted = value;
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       LSHIFT_EXPR, value, shift);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      shifted = gimple_get_lhs (pattern_stmt);
+    }
+
+  tree mask_t
+    = wide_int_to_tree (container_type,
+			wi::shifted_mask (shift_n, mask_width, false, prec));
+
+  /* Clear bits we don't want to write back from SHIFTED.  */
+  gimple_seq stmts = NULL;
+  tree masked = gimple_build (&stmts, BIT_AND_EXPR, container_type, shifted,
+			      mask_t);
+  if (!gimple_seq_empty_p (stmts))
+    {
+      pattern_stmt = gimple_seq_first_stmt (stmts);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  /* Mask off the bits in the container that we are to write to.  */
+  mask_t = wide_int_to_tree (container_type,
+			     wi::shifted_mask (shift_n, mask_width, true, prec));
+  tree cleared = vect_recog_temp_ssa_var (container_type);
+  pattern_stmt = gimple_build_assign (cleared, BIT_AND_EXPR, container, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Write MASKED into CLEARED.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			   BIT_IOR_EXPR, cleared, masked);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5948,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */
Richard Biener Sept. 29, 2022, 7:54 a.m. UTC | #11
On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Made the change and also created the ChangeLogs.

OK if bootstrap / testing succeeds.

Thanks,
Richard.

> gcc/ChangeLog:
>
>          * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of
> loop bb's from here...
>          (tree_if_conversion): ... to here.  Also call bitfield lowering
> when appropriate.
>          (version_loop_for_if_conversion): Adapt to enable loop
> versioning when we only need
>          to lower bitfields.
>          (ifcvt_split_critical_edges): Relax condition of expected loop
> form as this is checked earlier.
>          (get_bitfield_rep): New function.
>          (lower_bitfield): Likewise.
>          (bitfields_to_lower_p): Likewise.
>          (need_to_lower_bitfields): New global boolean.
>          (need_to_ifcvt): Likewise.
>          * tree-vect-data-refs.cc (vect_find_stmt_data_reference):
> Improve diagnostic message.
>          * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default
> value for last parameter.
>          (vect_recog_bitfield_ref_pattern): New.
>          (vect_recog_bit_insert_pattern): New.
>
> gcc/testsuite/ChangeLog:
>
>          * gcc.dg/vect/vect-bitfield-read-1.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-2.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-3.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-4.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-5.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-6.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-1.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-2.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-3.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-4.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-5.c: New test.
>
> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
> >
> > On 27/09/2022 13:34, Richard Biener wrote:
> >> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
> >>
> >>> On 08/09/2022 12:51, Richard Biener wrote:
> >>>> I'm curious, why the push to redundant_ssa_names?  That could use
> >>>> a comment ...
> >>> So I purposefully left a #if 0 #else #endif in there so you can see
> >>> the two
> >>> options. But the reason I used redundant_ssa_names is because ifcvt
> >>> seems to
> >>> use that as a container for all pairs of (old, new) ssa names to
> >>> replace
> >>> later. So I just piggy backed on that. I don't know if there's a
> >>> specific
> >>> reason they do the replacement at the end? Maybe some ordering
> >>> issue? Either
> >>> way both adding it to redundant_ssa_names or doing the replacement
> >>> inline work
> >>> for the bitfield lowering (or work in my testing at least).
> >> Possibly because we (in the past?) inserted/copied stuff based on
> >> predicates generated at analysis time after we decide to elide something
> >> so we need to watch for later appearing uses.  But who knows ... my mind
> >> fails me here.
> >>
> >> If it works to replace uses immediately please do so.  But now
> >> I wonder why we need this - the value shouldn't change so you
> >> should get away with re-using the existing SSA name for the final value?
> >
> > Yeah... good point. A quick change and minor testing seems to agree.
> > I'm sure I had a good reason to do it initially ;)
> >
> > I'll run a full-regression on this change to make sure I didn't miss
> > anything.
> >
Andre Vieira (lists) Oct. 7, 2022, 2:20 p.m. UTC | #12
Hi,

Whilst running a bootstrap with extra options to force bitfield 
vectorization '-O2 -ftree-vectorize -ftree-loop-if-convert 
-fno-vect-cost-model' I ran into an ICE in vect-patterns where a 
bit_field_ref had a container that wasn't INTEGRAL_TYPE and had a 
E_BLKmode, which meant we failed to build an integer type with the same 
size. For that reason I added a check to bail out earlier if the 
TYPE_MODE of the container is indeed E_BLKmode. The pattern for the 
bitfield inserts required no change as we currently don't support 
containers that aren't integer typed.

Also changed a testcase because in BIG-ENDIAN it was not vectorizing due 
to a different size of container that wasn't supported.

This passes the same bootstrap and regressions on aarch64-none-linux and 
no regressions on aarch64_be-none-elf either.

I assume you are OK with these changes Richard, but I don't like to 
commit on Friday in case something breaks over the weekend, so I'll 
leave it until Monday.

Thanks,
Andre

On 29/09/2022 08:54, Richard Biener wrote:
> On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> Made the change and also created the ChangeLogs.
> OK if bootstrap / testing succeeds.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>>           * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of
>> loop bb's from here...
>>           (tree_if_conversion): ... to here.  Also call bitfield lowering
>> when appropriate.
>>           (version_loop_for_if_conversion): Adapt to enable loop
>> versioning when we only need
>>           to lower bitfields.
>>           (ifcvt_split_critical_edges): Relax condition of expected loop
>> form as this is checked earlier.
>>           (get_bitfield_rep): New function.
>>           (lower_bitfield): Likewise.
>>           (bitfields_to_lower_p): Likewise.
>>           (need_to_lower_bitfields): New global boolean.
>>           (need_to_ifcvt): Likewise.
>>           * tree-vect-data-refs.cc (vect_find_stmt_data_reference):
>> Improve diagnostic message.
>>           * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default
>> value for last parameter.
>>           (vect_recog_bitfield_ref_pattern): New.
>>           (vect_recog_bit_insert_pattern): New.
>>
>> gcc/testsuite/ChangeLog:
>>
>>           * gcc.dg/vect/vect-bitfield-read-1.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-2.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-3.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-4.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-5.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-6.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-1.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-2.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-3.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-4.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-5.c: New test.
>>
>> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
>>> On 27/09/2022 13:34, Richard Biener wrote:
>>>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
>>>>
>>>>> On 08/09/2022 12:51, Richard Biener wrote:
>>>>>> I'm curious, why the push to redundant_ssa_names?  That could use
>>>>>> a comment ...
>>>>> So I purposefully left a #if 0 #else #endif in there so you can see
>>>>> the two
>>>>> options. But the reason I used redundant_ssa_names is because ifcvt
>>>>> seems to
>>>>> use that as a container for all pairs of (old, new) ssa names to
>>>>> replace
>>>>> later. So I just piggy backed on that. I don't know if there's a
>>>>> specific
>>>>> reason they do the replacement at the end? Maybe some ordering
>>>>> issue? Either
>>>>> way both adding it to redundant_ssa_names or doing the replacement
>>>>> inline work
>>>>> for the bitfield lowering (or work in my testing at least).
>>>> Possibly because we (in the past?) inserted/copied stuff based on
>>>> predicates generated at analysis time after we decide to elide something
>>>> so we need to watch for later appearing uses.  But who knows ... my mind
>>>> fails me here.
>>>>
>>>> If it works to replace uses immediately please do so.  But now
>>>> I wonder why we need this - the value shouldn't change so you
>>>> should get away with re-using the existing SSA name for the final value?
>>> Yeah... good point. A quick change and minor testing seems to agree.
>>> I'm sure I had a good reason to do it initially ;)
>>>
>>> I'll run a full-regression on this change to make sure I didn't miss
>>> anything.
>>>
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..849f4a017e1818eee4abd66385417a326c497696
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,44 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+    int  d : 31;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0, 0x7FFFFFFF }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1, 0x7FFFFFFF }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..1dc24d3eded192144dc9ad94589b4c5c3d999e65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 9;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..7d24c29975865883a7cdc7aa057fbb6bf413e0bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 8;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..fae6ea3557dcaba7b330ebdaa471281d33d2ba15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 9;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..99360c2967b076212c67eb4f34b8fd91711d8821
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 8;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index bac29fb557462f5d3193481ef180f1412e8bc639..e468a4659fa28a3a31c3390cf19bee65f4590b80 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 
   calculate_dominance_info (CDI_DOMINATORS);
 
-  /* Allow statements that can be handled during if-conversion.  */
-  ifc_bbs = get_loop_body_in_if_conv_order (loop);
-  if (!ifc_bbs)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Irreducible loop\n");
-      return false;
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -2899,18 +2900,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2922,8 +2927,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2999,7 +3005,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3260,6 +3266,201 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_assign_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's
+     precision.  */
+  unsigned HOST_WIDE_INT bf_prec
+    = TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt)));
+  if (compare_tree_int (DECL_SIZE (field_decl), bf_prec) != 0)
+    return NULL_TREE;
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  if (bitpos)
+    *bitpos
+      = fold_build2 (MINUS_EXPR, bitsizetype,
+		     DECL_FIELD_BIT_OFFSET (field_decl),
+		     DECL_FIELD_BIT_OFFSET (rep_decl));
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  tree bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+
+      gimple *new_stmt = gimple_build_assign (gimple_assign_lhs (stmt),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = ifc_bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_assign_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NO OK to lower,"
+					" field type is not Integral.\n");
+		  return false;
+		}
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3270,12 +3471,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <gassign *, 4> reads_to_lower;
+  auto_vec <gassign *, 4> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3291,16 +3496,42 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2
+      && !ifcvt_split_critical_edges (loop, aggressive_if_conv))
     goto cleanup;
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  ifc_bbs = get_loop_body_in_if_conv_order (loop);
+  if (!ifc_bbs)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Irreducible loop\n");
+      goto cleanup;
+    }
+
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
+
+  if ((flag_tree_loop_vectorize || loop->force_vectorize)
+      && !loop->dont_vectorize)
+    need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						    writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3311,7 +3542,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3351,10 +3583,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3394,6 +3647,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   if (rloop != NULL)
     {
       loop = rloop;
+      reads_to_lower.truncate (0);
+      writes_to_lower.truncate (0);
       goto again;
     }
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index e03b50498d164144da3220df8ee5bcf4248db821..4a23d6172aaa12ad7049dc626e5c4afbd5ca3f74 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4302,7 +4302,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d2bd15b5e9005bce2612f0b32c0acf6ffe776343..0cc315d312667c05a27df4cdf435f0d0e6fd4a52 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -663,7 +665,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
    is NULL, the caller must set SSA_NAME_DEF_STMT for the returned SSA var. */
 
 static tree
-vect_recog_temp_ssa_var (tree type, gimple *stmt)
+vect_recog_temp_ssa_var (tree type, gimple *stmt = NULL)
 {
   return make_temp_ssa_name (type, stmt, "patt");
 }
@@ -1829,6 +1831,330 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
+   result = (type_out) bf_value;
+
+   where type_out is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   result = (type_out) bf_value;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. If the precision of type_out is bigger
+   than the precision type of _1 we perform the widening before the shifting,
+   since the new precision will be large enough to shift the value and moving
+   widening operations up the statement chain enables the generation of
+   widening loads.  If we are widening and the operation after the pattern is
+   an addition then we mask first and shift later, to enable the generation of
+   shifting adds.  In the case of narrowing we will always mask first, shift
+   last and then perform a narrowing operation.  This will enable the
+   generation of narrowing shifts.
+
+   Widening with mask first, shift later:
+   container = (type_out) container;
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = patt2 >> masked;
+
+   Widening with shift first, mask last:
+   container = (type_out) container;
+   shifted = container >> bitpos;
+   result = shifted & ((1 << bitsize) - 1);
+
+   Narrowing:
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = masked >> bitpos;
+   result = (type_out) result;
+
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+
+  if (!first_stmt)
+    return NULL;
+
+  gassign *bf_stmt;
+  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
+      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt
+	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      bf_stmt = dyn_cast <gassign *> (second_stmt);
+      if (!bf_stmt
+	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+	return NULL;
+    }
+  else
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree container = TREE_OPERAND (bf_ref, 0);
+
+  if (!bit_field_offset (bf_ref).is_constant ()
+      || !bit_field_size (bf_ref).is_constant ()
+      || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (container))))
+    return NULL;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref))
+      || TYPE_MODE (TREE_TYPE (container)) == E_BLKmode)
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);
+  tree ret_type = TREE_TYPE (ret);
+  bool shift_first = true;
+  tree vectype;
+
+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert
+     it to one of the same width so we can perform the necessary masking and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (container)))
+    {
+      unsigned HOST_WIDE_INT container_size =
+	tree_to_uhwi (TYPE_SIZE (TREE_TYPE (container)));
+      tree int_type = build_nonstandard_integer_type (container_size, true);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (int_type),
+			       VIEW_CONVERT_EXPR, container);
+      vectype = get_vectype_for_scalar_type (vinfo, int_type);
+      container = gimple_assign_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+    }
+  else
+    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (container));
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, container);
+      container = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+  else if (!useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    /* If we are doing the conversion last then also delay the shift as we may
+       be able to combine the shift and conversion in certain cases.  */
+    shift_first = false;
+
+  tree container_type = TREE_TYPE (container);
+
+  /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
+     PLUS_EXPR then do the shift last as some targets can combine the shift and
+     add into a single instruction.  */
+  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+    {
+      if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
+	shift_first = false;
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  if (BYTES_BIG_ENDIAN)
+    shift_n = prec - shift_n - mask_width;
+
+  /* If we don't have to shift we only generate the mask, so just fix the
+     code-path to shift_first.  */
+  if (shift_n == 0)
+    shift_first = true;
+
+  tree result;
+  if (shift_first)
+    {
+      tree shifted = container;
+      if (shift_n)
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+				   RSHIFT_EXPR, container,
+				   build_int_cst (sizetype, shift_n));
+	  shifted = gimple_assign_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	}
+
+      tree mask = wide_int_to_tree (container_type,
+				    wi::mask (mask_width, false, prec));
+
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, shifted, mask);
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+  else
+    {
+      tree mask = wide_int_to_tree (container_type,
+				    wi::shifted_mask (shift_n, mask_width,
+						      false, prec));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, container, mask);
+      tree masked = gimple_assign_lhs (pattern_stmt);
+
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       RSHIFT_EXPR, masked,
+			       build_int_cst (sizetype, shift_n));
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (result), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, result);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   written = BIT_INSERT_EXPR (container, value, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   value = (container_type) value;	    // Make sure
+   shifted = value << bitpos;		    // Shift value into place
+   masked = shifted & (mask << bitpos);	    // Mask off the non-relevant bits in
+					    // the 'to-write value'.
+   cleared = container & ~(mask << bitpos); // Clearing the bits we want to
+					    // write to from the value we want
+					    // to write to.
+   written = cleared | masked;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (value)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree container = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree shift = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree container_type = TREE_TYPE (container);
+
+  if (!INTEGRAL_TYPE_P (container_type)
+      || !tree_fits_uhwi_p (TYPE_SIZE (container_type)))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  vect_unpromoted_value unprom;
+  unprom.set_op (value, vect_internal_def);
+  value = vect_convert_input (vinfo, stmt_info, container_type, &unprom,
+			      get_vectype_for_scalar_type (vinfo,
+							   container_type));
+
+  unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type);
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (shift);
+  if (BYTES_BIG_ENDIAN)
+    {
+      shift_n = prec - shift_n - mask_width;
+      shift = build_int_cst (TREE_TYPE (shift), shift_n);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (value), container_type))
+    {
+      pattern_stmt =
+	gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			     NOP_EXPR, value);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+
+  /* Shift VALUE into place.  */
+  tree shifted = value;
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       LSHIFT_EXPR, value, shift);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      shifted = gimple_get_lhs (pattern_stmt);
+    }
+
+  tree mask_t
+    = wide_int_to_tree (container_type,
+			wi::shifted_mask (shift_n, mask_width, false, prec));
+
+  /* Clear bits we don't want to write back from SHIFTED.  */
+  gimple_seq stmts = NULL;
+  tree masked = gimple_build (&stmts, BIT_AND_EXPR, container_type, shifted,
+			      mask_t);
+  if (!gimple_seq_empty_p (stmts))
+    {
+      pattern_stmt = gimple_seq_first_stmt (stmts);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  /* Mask off the bits in the container that we are to write to.  */
+  mask_t = wide_int_to_tree (container_type,
+			     wi::shifted_mask (shift_n, mask_width, true, prec));
+  tree cleared = vect_recog_temp_ssa_var (container_type);
+  pattern_stmt = gimple_build_assign (cleared, BIT_AND_EXPR, container, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Write MASKED into CLEARED.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			   BIT_IOR_EXPR, cleared, masked);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5622,6 +5948,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */
Hongtao Liu Oct. 12, 2022, 1:55 a.m. UTC | #13
This commit failed tests

FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%xmm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%xmm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%xmm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%ymm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%ymm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%ymm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2

On Fri, Oct 7, 2022 at 10:21 PM Andre Vieira (lists) via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> Whilst running a bootstrap with extra options to force bitfield
> vectorization '-O2 -ftree-vectorize -ftree-loop-if-convert
> -fno-vect-cost-model' I ran into an ICE in vect-patterns where a
> bit_field_ref had a container that wasn't INTEGRAL_TYPE and had a
> E_BLKmode, which meant we failed to build an integer type with the same
> size. For that reason I added a check to bail out earlier if the
> TYPE_MODE of the container is indeed E_BLKmode. The pattern for the
> bitfield inserts required no change as we currently don't support
> containers that aren't integer typed.
>
> Also changed a testcase because in BIG-ENDIAN it was not vectorizing due
> to a different size of container that wasn't supported.
>
> This passes the same bootstrap and regressions on aarch64-none-linux and
> no regressions on aarch64_be-none-elf either.
>
> I assume you are OK with these changes Richard, but I don't like to
> commit on Friday in case something breaks over the weekend, so I'll
> leave it until Monday.
>
> Thanks,
> Andre
>
> On 29/09/2022 08:54, Richard Biener wrote:
> > On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >> Made the change and also created the ChangeLogs.
> > OK if bootstrap / testing succeeds.
> >
> > Thanks,
> > Richard.
> >
> >> gcc/ChangeLog:
> >>
> >>           * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of
> >> loop bb's from here...
> >>           (tree_if_conversion): ... to here.  Also call bitfield lowering
> >> when appropriate.
> >>           (version_loop_for_if_conversion): Adapt to enable loop
> >> versioning when we only need
> >>           to lower bitfields.
> >>           (ifcvt_split_critical_edges): Relax condition of expected loop
> >> form as this is checked earlier.
> >>           (get_bitfield_rep): New function.
> >>           (lower_bitfield): Likewise.
> >>           (bitfields_to_lower_p): Likewise.
> >>           (need_to_lower_bitfields): New global boolean.
> >>           (need_to_ifcvt): Likewise.
> >>           * tree-vect-data-refs.cc (vect_find_stmt_data_reference):
> >> Improve diagnostic message.
> >>           * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default
> >> value for last parameter.
> >>           (vect_recog_bitfield_ref_pattern): New.
> >>           (vect_recog_bit_insert_pattern): New.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>           * gcc.dg/vect/vect-bitfield-read-1.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-2.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-3.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-4.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-5.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-6.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-1.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-2.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-3.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-4.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-5.c: New test.
> >>
> >> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
> >>> On 27/09/2022 13:34, Richard Biener wrote:
> >>>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
> >>>>
> >>>>> On 08/09/2022 12:51, Richard Biener wrote:
> >>>>>> I'm curious, why the push to redundant_ssa_names?  That could use
> >>>>>> a comment ...
> >>>>> So I purposefully left a #if 0 #else #endif in there so you can see
> >>>>> the two
> >>>>> options. But the reason I used redundant_ssa_names is because ifcvt
> >>>>> seems to
> >>>>> use that as a container for all pairs of (old, new) ssa names to
> >>>>> replace
> >>>>> later. So I just piggy backed on that. I don't know if there's a
> >>>>> specific
> >>>>> reason they do the replacement at the end? Maybe some ordering
> >>>>> issue? Either
> >>>>> way both adding it to redundant_ssa_names or doing the replacement
> >>>>> inline work
> >>>>> for the bitfield lowering (or work in my testing at least).
> >>>> Possibly because we (in the past?) inserted/copied stuff based on
> >>>> predicates generated at analysis time after we decide to elide something
> >>>> so we need to watch for later appearing uses.  But who knows ... my mind
> >>>> fails me here.
> >>>>
> >>>> If it works to replace uses immediately please do so.  But now
> >>>> I wonder why we need this - the value shouldn't change so you
> >>>> should get away with re-using the existing SSA name for the final value?
> >>> Yeah... good point. A quick change and minor testing seems to agree.
> >>> I'm sure I had a good reason to do it initially ;)
> >>>
> >>> I'll run a full-regression on this change to make sure I didn't miss
> >>> anything.
> >>>
Hongtao Liu Oct. 12, 2022, 2:11 a.m. UTC | #14
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107226

On Wed, Oct 12, 2022 at 9:55 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> This commit failed tests
>
> FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
> FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
> FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
> FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
> FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%xmm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%xmm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%xmm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%ymm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%ymm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%ymm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
>
> On Fri, Oct 7, 2022 at 10:21 PM Andre Vieira (lists) via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > Whilst running a bootstrap with extra options to force bitfield
> > vectorization '-O2 -ftree-vectorize -ftree-loop-if-convert
> > -fno-vect-cost-model' I ran into an ICE in vect-patterns where a
> > bit_field_ref had a container that wasn't INTEGRAL_TYPE and had a
> > E_BLKmode, which meant we failed to build an integer type with the same
> > size. For that reason I added a check to bail out earlier if the
> > TYPE_MODE of the container is indeed E_BLKmode. The pattern for the
> > bitfield inserts required no change as we currently don't support
> > containers that aren't integer typed.
> >
> > Also changed a testcase because in BIG-ENDIAN it was not vectorizing due
> > to a different size of container that wasn't supported.
> >
> > This passes the same bootstrap and regressions on aarch64-none-linux and
> > no regressions on aarch64_be-none-elf either.
> >
> > I assume you are OK with these changes Richard, but I don't like to
> > commit on Friday in case something breaks over the weekend, so I'll
> > leave it until Monday.
> >
> > Thanks,
> > Andre
> >
> > On 29/09/2022 08:54, Richard Biener wrote:
> > > On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > >> Made the change and also created the ChangeLogs.
> > > OK if bootstrap / testing succeeds.
> > >
> > > Thanks,
> > > Richard.
> > >
> > >> gcc/ChangeLog:
> > >>
> > >>           * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of
> > >> loop bb's from here...
> > >>           (tree_if_conversion): ... to here.  Also call bitfield lowering
> > >> when appropriate.
> > >>           (version_loop_for_if_conversion): Adapt to enable loop
> > >> versioning when we only need
> > >>           to lower bitfields.
> > >>           (ifcvt_split_critical_edges): Relax condition of expected loop
> > >> form as this is checked earlier.
> > >>           (get_bitfield_rep): New function.
> > >>           (lower_bitfield): Likewise.
> > >>           (bitfields_to_lower_p): Likewise.
> > >>           (need_to_lower_bitfields): New global boolean.
> > >>           (need_to_ifcvt): Likewise.
> > >>           * tree-vect-data-refs.cc (vect_find_stmt_data_reference):
> > >> Improve diagnostic message.
> > >>           * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default
> > >> value for last parameter.
> > >>           (vect_recog_bitfield_ref_pattern): New.
> > >>           (vect_recog_bit_insert_pattern): New.
> > >>
> > >> gcc/testsuite/ChangeLog:
> > >>
> > >>           * gcc.dg/vect/vect-bitfield-read-1.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-2.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-3.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-4.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-5.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-6.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-1.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-2.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-3.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-4.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-5.c: New test.
> > >>
> > >> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
> > >>> On 27/09/2022 13:34, Richard Biener wrote:
> > >>>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
> > >>>>
> > >>>>> On 08/09/2022 12:51, Richard Biener wrote:
> > >>>>>> I'm curious, why the push to redundant_ssa_names?  That could use
> > >>>>>> a comment ...
> > >>>>> So I purposefully left a #if 0 #else #endif in there so you can see
> > >>>>> the two
> > >>>>> options. But the reason I used redundant_ssa_names is because ifcvt
> > >>>>> seems to
> > >>>>> use that as a container for all pairs of (old, new) ssa names to
> > >>>>> replace
> > >>>>> later. So I just piggy backed on that. I don't know if there's a
> > >>>>> specific
> > >>>>> reason they do the replacement at the end? Maybe some ordering
> > >>>>> issue? Either
> > >>>>> way both adding it to redundant_ssa_names or doing the replacement
> > >>>>> inline work
> > >>>>> for the bitfield lowering (or work in my testing at least).
> > >>>> Possibly because we (in the past?) inserted/copied stuff based on
> > >>>> predicates generated at analysis time after we decide to elide something
> > >>>> so we need to watch for later appearing uses.  But who knows ... my mind
> > >>>> fails me here.
> > >>>>
> > >>>> If it works to replace uses immediately please do so.  But now
> > >>>> I wonder why we need this - the value shouldn't change so you
> > >>>> should get away with re-using the existing SSA name for the final value?
> > >>> Yeah... good point. A quick change and minor testing seems to agree.
> > >>> I'm sure I had a good reason to do it initially ;)
> > >>>
> > >>> I'll run a full-regression on this change to make sure I didn't miss
> > >>> anything.
> > >>>
>
>
>
> --
> BR,
> Hongtao
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@ 
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@ 
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@ 
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@ 
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@ 
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@ 
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@ 
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..4070fa2f45970e564f13de794707613356cb5045 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@  static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -2898,18 +2908,22 @@  version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2935,9 @@  version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3013,7 @@  ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (num <= 2 || loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3274,225 @@  ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+typedef struct
+{
+  scalar_int_mode best_mode;
+  tree struct_expr;
+  tree bf_type;
+  tree offset;
+  poly_int64 bitpos;
+  bool write;
+  gassign *stmt;
+} bitfield_data_t;
+
+/* Return TRUE if we can lower the bitfield in STMT.  Fill DATA with the
+   relevant information required to lower this bitfield.  */
+
+static bool
+get_bitfield_data (gassign *stmt, bool write, bitfield_data_t *data)
+{
+  poly_uint64 bitstart, bitend;
+  scalar_int_mode best_mode;
+  tree comp_ref = write ? gimple_get_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+  tree struct_expr = TREE_OPERAND (comp_ref, 0);
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree bf_type = TREE_TYPE (field_decl);
+  poly_int64 bitpos
+    = tree_to_poly_int64 (DECL_FIELD_BIT_OFFSET (field_decl));
+  unsigned HOST_WIDE_INT bitsize = TYPE_PRECISION (bf_type);
+  tree offset = DECL_FIELD_OFFSET (field_decl);
+  /* BITSTART and BITEND describe the region we can safely load from inside the
+     structure.  BITPOS is the bit position of the value inside the
+     representative that we will end up loading OFFSET bytes from the start
+     of the struct.  BEST_MODE is the mode describing the optimal size of the
+     representative chunk we load.  If this is a write we will store the same
+     sized representative back, after we have changed the appropriate bits.  */
+  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);
+  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
+		     TYPE_ALIGN (TREE_TYPE (struct_expr)),
+		     INT_MAX, false, &best_mode))
+    {
+      data->best_mode = best_mode;
+      data->struct_expr = struct_expr;
+      data->bf_type = bf_type;
+      data->offset = offset;
+      data->bitpos = bitpos;
+      data->write = write;
+      data->stmt = stmt;
+      return true;
+    }
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\t\tCan not lower Bitfield, could not determine"
+			  " best mode.\n");
+    }
+  return false;
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (bitfield_data_t *data)
+{
+  scalar_int_mode best_mode = data->best_mode;
+  tree struct_expr = data->struct_expr;
+  tree bf_type = data->bf_type;
+  tree offset = data->offset;
+  poly_int64 bitpos = data->bitpos;
+  bool write = data->write;
+  gassign *stmt = data->stmt;
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  /* Type of the representative.  */
+  tree rep_type
+    = lang_hooks.types.type_for_mode (best_mode, TYPE_UNSIGNED (bf_type));
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			      NULL_TREE, rep_type);
+  /* Load from the start of 'offset + bitpos % alignment'.  */
+  uint64_t extra_offset = bitpos.to_constant ();
+  extra_offset /= TYPE_ALIGN (bf_type);
+  extra_offset *= TYPE_ALIGN (bf_type);
+  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset,
+			build_int_cst (TREE_TYPE (offset),
+				       extra_offset / BITS_PER_UNIT));
+  /* Adapt the BITPOS to reflect the number of bits between the start of the
+     load and the start of the bitfield value.  */
+  bitpos -= extra_offset;
+  DECL_FIELD_BIT_OFFSET (rep_decl) = build_zero_cst (bitsizetype);
+  DECL_FIELD_OFFSET (rep_decl) = offset;
+  DECL_SIZE (rep_decl) = TYPE_SIZE (rep_type);
+  DECL_CONTEXT (rep_decl) = TREE_TYPE (struct_expr);
+  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos_tree), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos_tree);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_get_lhs (stmt),
+						     new_val));
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop, auto_vec <bitfield_data_t *, 4> *to_lower)
+{
+  basic_block *bbs = get_loop_body (loop);
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_get_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      bitfield_data_t *data = new bitfield_data_t ();
+	      if (get_bitfield_data (stmt, write, data))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\tBitfield OK to lower.\n");
+		  to_lower->safe_push (data);
+		}
+	      else
+		{
+		  delete data;
+		  return false;
+		}
+	    }
+	}
+    }
+  return !to_lower->is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3503,15 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <bitfield_data_t *, 4> bitfields_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,11 +3527,17 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
+    goto cleanup;
+
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, &bitfields_to_lower);
+  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
+      && !need_to_lower_bitfields)
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  need_to_ifcvt
+    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   if ((need_to_predicate || any_complicated_phi)
@@ -3310,7 +3553,8 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3594,32 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!bitfields_to_lower.is_empty ())
+	{
+	  bitfield_data_t *data = bitfields_to_lower.pop ();
+	  lower_bitfield (data);
+	  delete data;
+	}
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3395,6 +3661,11 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
       loop = rloop;
       goto again;
     }
+  while (!bitfields_to_lower.is_empty ())
+    {
+      bitfield_data_t *data = bitfields_to_lower.pop ();
+      delete data;
+    }
 
   return todo;
 }
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@  vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..435b75f860784a929041d5214d39c876c5ba790a 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1828,6 +1829,204 @@  vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   _2 = BIT_FIELD_REF (_1, bitsize, bitpos);
+   _3 = (type) _2;
+
+   where type is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type) - (TYPE_UNSIGNED (type) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   _3 = (type) _2;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = (type) _1;
+   patt2 = patt1 >> bitpos;
+   _3 = patt2 & ((1 << bitsize) - 1);
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!nop_stmt
+      || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR
+      || TREE_CODE (gimple_assign_rhs1 (nop_stmt)) != SSA_NAME)
+    return NULL;
+
+  gassign *bf_stmt
+    = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (nop_stmt)));
+
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+
+  tree load = TREE_OPERAND (bf_ref, 0);
+  tree size = TREE_OPERAND (bf_ref, 1);
+  tree offset = TREE_OPERAND (bf_ref, 2);
+
+  /* Bail out if the load is already a vector type.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (load)))
+    return NULL;
+
+
+  gimple *pattern_stmt;
+  tree lhs = load;
+  tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt));
+
+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			       RSHIFT_EXPR, lhs, offset);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT mask_i = tree_to_uhwi (size);
+  tree mask = build_int_cst (TREE_TYPE (lhs), (1ULL << mask_i) - 1);
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			   BIT_AND_EXPR, lhs, mask);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   _3 = BIT_INSERT_EXPR (_1, _2, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = _2 & mask;		    // Clearing of the non-relevant bits in the
+				    // 'to-write value'.
+   patt2 = patt1 << bitpos;	    // Shift the cleaned value in to place.
+   patt3 = _1 & ~(mask << bitpos);  // Clearing the bits we want to write to,
+				    // from the value we want to write to.
+   _3 = patt3 | patt2;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree load = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree offset = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree load_type = TREE_TYPE (load);
+
+  /* Bail out if the load is already of vector type.  */
+  if (VECTOR_TYPE_P (load_type))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  if (CONSTANT_CLASS_P (value))
+    value = fold_build1 (NOP_EXPR, load_type, value);
+  else
+    {
+      if (TREE_CODE (value) != SSA_NAME)
+	return NULL;
+      gassign *nop_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (value));
+      if (!nop_stmt || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR)
+	return NULL;
+      if (!useless_type_conversion_p (TREE_TYPE (value), load_type))
+	{
+	  value = fold_build1 (NOP_EXPR, load_type, gimple_assign_rhs1 (nop_stmt));
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+				   value);
+	  value = gimple_get_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+	}
+    }
+
+  unsigned HOST_WIDE_INT mask_i = (1ULL << TYPE_PRECISION (bf_type)) - 1;
+  tree mask_t = build_int_cst (load_type, mask_i);
+  /* Clear bits we don't want to write back from value and shift it in place.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   fold_build2 (BIT_AND_EXPR, load_type, value,
+					mask_t));
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			       LSHIFT_EXPR, value, offset);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+  /* Mask off the bits in the loaded value.  */
+  mask_i <<= shift_n;
+  mask_i = ~mask_i;
+  mask_t = build_int_cst (load_type, mask_i);
+
+  tree lhs = vect_recog_temp_ssa_var (load_type, NULL);
+  pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Compose the value to write back.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   BIT_IOR_EXPR, lhs, value);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5822,8 @@  struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */