diff mbox series

[next-queue,1/3] i40e: Store the irq number in i40e_q_vector

Message ID 1664958703-4224-2-git-send-email-jdamato@fastly.com
State Superseded
Headers show
Series i40e: Add an i40e_napi_poll tracepoint | expand

Commit Message

Joe Damato Oct. 5, 2022, 8:31 a.m. UTC
Make it easy to figure out the IRQ number for a particular i40e_q_vector by
storing the assigned IRQ in the structure itself.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 2 files changed, 2 insertions(+)

Comments

Maciej Fijalkowski Oct. 5, 2022, 10:29 a.m. UTC | #1
On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
> Make it easy to figure out the IRQ number for a particular i40e_q_vector by
> storing the assigned IRQ in the structure itself.
> 
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> index 9926c4e..8e1f395 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -992,6 +992,7 @@ struct i40e_q_vector {
>  	struct rcu_head rcu;	/* to avoid race with update stats on free */
>  	char name[I40E_INT_NAME_STR_LEN];
>  	bool arm_wb_state;
> +	int irq_num;		/* IRQ assigned to this q_vector */

This struct looks like a mess in terms of members order. Can you check
with pahole how your patch affects the layout of it? Maybe while at it you
could pack it in a better way?

>  } ____cacheline_internodealigned_in_smp;
>  
>  /* lan device */
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 6b7535a..6efe130 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -4123,6 +4123,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
>  		}
>  
>  		/* register for affinity change notifications */
> +		q_vector->irq_num = irq_num;
>  		q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
>  		q_vector->affinity_notify.release = i40e_irq_affinity_release;
>  		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
> -- 
> 2.7.4
>
Joe Damato Oct. 5, 2022, 5 p.m. UTC | #2
On Wed, Oct 05, 2022 at 12:29:24PM +0200, Maciej Fijalkowski wrote:
> On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
> > Make it easy to figure out the IRQ number for a particular i40e_q_vector by
> > storing the assigned IRQ in the structure itself.
> > 
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
> >  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> > index 9926c4e..8e1f395 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> > @@ -992,6 +992,7 @@ struct i40e_q_vector {
> >  	struct rcu_head rcu;	/* to avoid race with update stats on free */
> >  	char name[I40E_INT_NAME_STR_LEN];
> >  	bool arm_wb_state;
> > +	int irq_num;		/* IRQ assigned to this q_vector */
> 
> This struct looks like a mess in terms of members order. Can you check
> with pahole how your patch affects the layout of it? Maybe while at it you
> could pack it in a better way?

OK, sure. I used pahole and asked it to reorganize the struct members,
which saves 24 bytes.

I'll update this commit to include the following reorganization in the v2 of
this set:

$ pahole -R -C i40e_q_vector i40e.ko

struct i40e_q_vector {
	struct i40e_vsi *          vsi;                  /*     0     8 */
	u16                        v_idx;                /*     8     2 */
	u16                        reg_idx;              /*    10     2 */
	u8                         num_ringpairs;        /*    12     1 */
	u8                         itr_countdown;        /*    13     1 */
	bool                       arm_wb_state;         /*    14     1 */

	/* XXX 1 byte hole, try to pack */

	struct napi_struct         napi;                 /*    16   400 */
	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
	struct i40e_ring_container rx;                   /*   416    32 */
	/* --- cacheline 7 boundary (448 bytes) --- */
	struct i40e_ring_container tx;                   /*   448    32 */
	cpumask_t                  affinity_mask;        /*   480    24 */
	struct irq_affinity_notify affinity_notify;      /*   504    56 */
	/* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */
	struct callback_head       rcu;                  /*   560    16 */
	/* --- cacheline 9 boundary (576 bytes) --- */
	char                       name[32];             /*   576    32 */

	/* XXX 4 bytes hole, try to pack */

	int                        irq_num;              /*   612     4 */

	/* size: 616, cachelines: 10, members: 14 */
	/* sum members: 611, holes: 2, sum holes: 5 */
	/* last cacheline: 40 bytes */
};   /* saved 24 bytes! */
Jesse Brandeburg Oct. 5, 2022, 6:25 p.m. UTC | #3
On 10/5/2022 10:00 AM, Joe Damato wrote:
> On Wed, Oct 05, 2022 at 12:29:24PM +0200, Maciej Fijalkowski wrote:
>> On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
>>> Make it easy to figure out the IRQ number for a particular i40e_q_vector by
>>> storing the assigned IRQ in the structure itself.
>>>
>>> Signed-off-by: Joe Damato <jdamato@fastly.com>
>>> ---
>>>   drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
>>>   drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
>>>   2 files changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
>>> index 9926c4e..8e1f395 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e.h
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
>>> @@ -992,6 +992,7 @@ struct i40e_q_vector {
>>>   	struct rcu_head rcu;	/* to avoid race with update stats on free */
>>>   	char name[I40E_INT_NAME_STR_LEN];
>>>   	bool arm_wb_state;
>>> +	int irq_num;		/* IRQ assigned to this q_vector */
>>
>> This struct looks like a mess in terms of members order. Can you check
>> with pahole how your patch affects the layout of it? Maybe while at it you
>> could pack it in a better way?
> 
> OK, sure. I used pahole and asked it to reorganize the struct members,
> which saves 24 bytes.

Hi Joe, thanks for your patches,

Saving 24 bytes is admirable, but these structures are generally 
optimized in access pattern order (most used at the top) and not so much 
for "packing efficiency" especially since it has that alignment 
directive at the bottom which causes each struct to start at it's own 
cacheline anyway.


> 
> I'll update this commit to include the following reorganization in the v2 of
> this set:
> 
> $ pahole -R -C i40e_q_vector i40e.ko
> 
> struct i40e_q_vector {
> 	struct i40e_vsi *          vsi;                  /*     0     8 */
> 	u16                        v_idx;                /*     8     2 */
> 	u16                        reg_idx;              /*    10     2 */
> 	u8                         num_ringpairs;        /*    12     1 */
> 	u8                         itr_countdown;        /*    13     1 */
> 	bool                       arm_wb_state;         /*    14     1 */
> 
> 	/* XXX 1 byte hole, try to pack */
> 
> 	struct napi_struct         napi;                 /*    16   400 */
> 	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
> 	struct i40e_ring_container rx;                   /*   416    32 */
> 	/* --- cacheline 7 boundary (448 bytes) --- */
> 	struct i40e_ring_container tx;                   /*   448    32 */
> 	cpumask_t                  affinity_mask;        /*   480    24 */
> 	struct irq_affinity_notify affinity_notify;      /*   504    56 */
> 	/* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */
> 	struct callback_head       rcu;                  /*   560    16 */
> 	/* --- cacheline 9 boundary (576 bytes) --- */
> 	char                       name[32];             /*   576    32 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	int                        irq_num;              /*   612     4 */

The right spot for this debug item is at the end of the struct, so that 
part is good.

> 
> 	/* size: 616, cachelines: 10, members: 14 */
> 	/* sum members: 611, holes: 2, sum holes: 5 */
> 	/* last cacheline: 40 bytes */
> };   /* saved 24 bytes! */

I'd prefer it if you don't do two things at once in a single patch (add 
members / reorganize).

I know Maciej said this is a mess and I kind of agree with him, but I'm 
not sure it's a priority for your patch set to fix it now, especially 
since you're trying to add a debugging assist, and not performance 
tuning the code.

If you're really wanting to reorganize these structs I'd prefer a bit 
more diligent effort to prove no inadvertent side effects (like maybe by 
turning up the interrupt rate and looking at perf data while receiving 
512 byte packets. The rate should remain the same (or better) and the 
number of cache misses on these structs should remain roughly the same. 
Maybe a seperate patch series?

Jesse
Joe Damato Oct. 5, 2022, 6:40 p.m. UTC | #4
On Wed, Oct 05, 2022 at 11:25:32AM -0700, Jesse Brandeburg wrote:
> On 10/5/2022 10:00 AM, Joe Damato wrote:
> >On Wed, Oct 05, 2022 at 12:29:24PM +0200, Maciej Fijalkowski wrote:
> >>On Wed, Oct 05, 2022 at 01:31:41AM -0700, Joe Damato wrote:
> >>>Make it easy to figure out the IRQ number for a particular i40e_q_vector by
> >>>storing the assigned IRQ in the structure itself.
> >>>
> >>>Signed-off-by: Joe Damato <jdamato@fastly.com>
> >>>---
> >>>  drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
> >>>  drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
> >>>  2 files changed, 2 insertions(+)
> >>>
> >>>diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> >>>index 9926c4e..8e1f395 100644
> >>>--- a/drivers/net/ethernet/intel/i40e/i40e.h
> >>>+++ b/drivers/net/ethernet/intel/i40e/i40e.h
> >>>@@ -992,6 +992,7 @@ struct i40e_q_vector {
> >>>  	struct rcu_head rcu;	/* to avoid race with update stats on free */
> >>>  	char name[I40E_INT_NAME_STR_LEN];
> >>>  	bool arm_wb_state;
> >>>+	int irq_num;		/* IRQ assigned to this q_vector */
> >>
> >>This struct looks like a mess in terms of members order. Can you check
> >>with pahole how your patch affects the layout of it? Maybe while at it you
> >>could pack it in a better way?
> >
> >OK, sure. I used pahole and asked it to reorganize the struct members,
> >which saves 24 bytes.
> 
> Hi Joe, thanks for your patches,
> 
> Saving 24 bytes is admirable, but these structures are generally optimized
> in access pattern order (most used at the top) and not so much for "packing
> efficiency" especially since it has that alignment directive at the bottom
> which causes each struct to start at it's own cacheline anyway.
> 
> 
> >
> >I'll update this commit to include the following reorganization in the v2 of
> >this set:
> >
> >$ pahole -R -C i40e_q_vector i40e.ko
> >
> >struct i40e_q_vector {
> >	struct i40e_vsi *          vsi;                  /*     0     8 */
> >	u16                        v_idx;                /*     8     2 */
> >	u16                        reg_idx;              /*    10     2 */
> >	u8                         num_ringpairs;        /*    12     1 */
> >	u8                         itr_countdown;        /*    13     1 */
> >	bool                       arm_wb_state;         /*    14     1 */
> >
> >	/* XXX 1 byte hole, try to pack */
> >
> >	struct napi_struct         napi;                 /*    16   400 */
> >	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
> >	struct i40e_ring_container rx;                   /*   416    32 */
> >	/* --- cacheline 7 boundary (448 bytes) --- */
> >	struct i40e_ring_container tx;                   /*   448    32 */
> >	cpumask_t                  affinity_mask;        /*   480    24 */
> >	struct irq_affinity_notify affinity_notify;      /*   504    56 */
> >	/* --- cacheline 8 boundary (512 bytes) was 48 bytes ago --- */
> >	struct callback_head       rcu;                  /*   560    16 */
> >	/* --- cacheline 9 boundary (576 bytes) --- */
> >	char                       name[32];             /*   576    32 */
> >
> >	/* XXX 4 bytes hole, try to pack */
> >
> >	int                        irq_num;              /*   612     4 */
> 
> The right spot for this debug item is at the end of the struct, so that part
> is good.
> 
> >
> >	/* size: 616, cachelines: 10, members: 14 */
> >	/* sum members: 611, holes: 2, sum holes: 5 */
> >	/* last cacheline: 40 bytes */
> >};   /* saved 24 bytes! */
> 
> I'd prefer it if you don't do two things at once in a single patch (add
> members / reorganize).
> 
> I know Maciej said this is a mess and I kind of agree with him, but I'm not
> sure it's a priority for your patch set to fix it now, especially since
> you're trying to add a debugging assist, and not performance tuning the
> code.
> 
> If you're really wanting to reorganize these structs I'd prefer a bit more
> diligent effort to prove no inadvertent side effects (like maybe by turning
> up the interrupt rate and looking at perf data while receiving 512 byte
> packets. The rate should remain the same (or better) and the number of cache
> misses on these structs should remain roughly the same. Maybe a seperate
> patch series?

I honestly did think that reorganizing the struct was probably out of scope
of this change, so if you agree so I'll drop this change from the v2 and
keep the original which adds irq_num to the end of the struct.
Jesse Brandeburg Oct. 5, 2022, 7:37 p.m. UTC | #5
On 10/5/2022 11:40 AM, Joe Damato wrote:

>> If you're really wanting to reorganize these structs I'd prefer a bit more
>> diligent effort to prove no inadvertent side effects (like maybe by turning
>> up the interrupt rate and looking at perf data while receiving 512 byte
>> packets. The rate should remain the same (or better) and the number of cache
>> misses on these structs should remain roughly the same. Maybe a seperate
>> patch series?
> 
> I honestly did think that reorganizing the struct was probably out of scope
> of this change, so if you agree so I'll drop this change from the v2 and
> keep the original which adds irq_num to the end of the struct.

I agree, especially in these routines, doing simple, 
explainable/observable changes is best.
Maciej Fijalkowski Oct. 6, 2022, 1:06 p.m. UTC | #6
On Wed, Oct 05, 2022 at 12:37:19PM -0700, Jesse Brandeburg wrote:
> On 10/5/2022 11:40 AM, Joe Damato wrote:
> 
> > > If you're really wanting to reorganize these structs I'd prefer a bit more
> > > diligent effort to prove no inadvertent side effects (like maybe by turning
> > > up the interrupt rate and looking at perf data while receiving 512 byte
> > > packets. The rate should remain the same (or better) and the number of cache
> > > misses on these structs should remain roughly the same. Maybe a seperate
> > > patch series?
> > 
> > I honestly did think that reorganizing the struct was probably out of scope
> > of this change, so if you agree so I'll drop this change from the v2 and
> > keep the original which adds irq_num to the end of the struct.
> 
> I agree, especially in these routines, doing simple, explainable/observable
> changes is best.

Jesse, I recall now that this weird qvector struct layout is also the case
on ice driver. Maybe we should document it somewhere/somehow (or even
explain) to avoid touching it? I believe that not only me would have such
a knee-jerk reaction to try to pack it.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 9926c4e..8e1f395 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -992,6 +992,7 @@  struct i40e_q_vector {
 	struct rcu_head rcu;	/* to avoid race with update stats on free */
 	char name[I40E_INT_NAME_STR_LEN];
 	bool arm_wb_state;
+	int irq_num;		/* IRQ assigned to this q_vector */
 } ____cacheline_internodealigned_in_smp;
 
 /* lan device */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6b7535a..6efe130 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4123,6 +4123,7 @@  static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
 		}
 
 		/* register for affinity change notifications */
+		q_vector->irq_num = irq_num;
 		q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
 		q_vector->affinity_notify.release = i40e_irq_affinity_release;
 		irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);