diff mbox series

[net] enic: add wq clean up budget

Message ID 20171205191441.7288-1-gvaradar@cisco.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series [net] enic: add wq clean up budget | expand

Commit Message

Govindarajulu Varadarajan Dec. 5, 2017, 7:14 p.m. UTC
In case of tx clean up, we set '-1' as budget. This means clean up until
wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
will run for long time and cause
"watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.

This patch sets wq clean up budget to 256.

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/net/ethernet/cisco/enic/enic.h      | 2 ++
 drivers/net/ethernet/cisco/enic/enic_main.c | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

Comments

David Miller Dec. 6, 2017, 7:09 p.m. UTC | #1
From: Govindarajulu Varadarajan <gvaradar@cisco.com>
Date: Tue,  5 Dec 2017 11:14:41 -0800

> In case of tx clean up, we set '-1' as budget. This means clean up until
> wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
> will run for long time and cause
> "watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.
> 
> This patch sets wq clean up budget to 256.
> 
> Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>

This driver with all of it's indirection and layers upon layers of
macros for queue processing is so difficult to read, and this can't
be generating nice optimal code either...

Anyways, I was walking over the driver to see if the logic is
contributing to this.

The limit you are proposing sounds unnecessary, nobody else I can
see needs this, and that includes all of the most heavily used
drivers under load.

If I had to guess I'd say that the problem is that the queue loop
keeps sampling the head and tail pointers, where as it should just
do that _once_ and only process that TX entries found in that
snapshot and return to the poll() routine immedately afterwards.

Thanks.
Govindarajulu Varadarajan Dec. 9, 2017, 12:14 a.m. UTC | #2
On Wed, 6 Dec 2017, David Miller wrote:

> From: Govindarajulu Varadarajan <gvaradar@cisco.com>
> Date: Tue,  5 Dec 2017 11:14:41 -0800
>
>> In case of tx clean up, we set '-1' as budget. This means clean up until
>> wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
>> will run for long time and cause
>> "watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.
>>
>> This patch sets wq clean up budget to 256.
>>
>> Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
>
> This driver with all of it's indirection and layers upon layers of
> macros for queue processing is so difficult to read, and this can't
> be generating nice optimal code either...
>
> Anyways, I was walking over the driver to see if the logic is
> contributing to this.
>
> The limit you are proposing sounds unnecessary, nobody else I can
> see needs this, and that includes all of the most heavily used
> drivers under load.

I used 256 as the limit because most of the other drivers use it.

* mlx4 uses MLX4_EN_DEFAULT_TX_WORK as the tx budget in mlx4_en_process_tx_cq()
   Added in commit fbc6daf19745 ("net/mlx4_en: Ignore budget on TX napi polling")

* i40e&vf uses vsi->work_limit as tx budget in i40e_clean_tx_irq(), which is
   set to I40E_DEFAULT_IRQ_WORK. Added in commit
   a619afe814453 ("i40e/i40evf: Add support for bulk free in Tx cleanup")

* ixgbe uses q_vector->tx.work_limit as tx budget in ixgbe_clean_tx_irq(),
   which is set to IXGBE_DEFAULT_TX_WORK. Added in commit
   592245559e900 ("ixgbe: Change default Tx work limit size to 256 buffers")

>
> If I had to guess I'd say that the problem is that the queue loop
> keeps sampling the head and tail pointers, where as it should just
> do that _once_ and only process that TX entries found in that
> snapshot and return to the poll() routine immedately afterwards.

The only way to know the tail pointer at the time napi is scheduled is to read
hw fetch_index register. This is discouraged by hw engineers.

We work around this by using color bit. Every cq entry has color bit. It is
either 0 or 1. Hw flips the bit when it creates a new cq entry. So every new
cq entry will have a different color bit than previous. We reach end of the
queue when previous color bit is same as current cq entry's color. i.e hw did
not flip the bit, so its not a new cq entry.

So enic driver cannot know the tail pointer at the time napi is scheduled, until
we reach the tail pointer.
Govindarajulu Varadarajan Dec. 20, 2017, 12:37 a.m. UTC | #3
On Fri, 8 Dec 2017, Govindarajulu Varadarajan wrote:

> On Wed, 6 Dec 2017, David Miller wrote:
>
>> From: Govindarajulu Varadarajan <gvaradar@cisco.com>
>> Date: Tue,  5 Dec 2017 11:14:41 -0800
>> 
>>> In case of tx clean up, we set '-1' as budget. This means clean up until
>>> wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
>>> will run for long time and cause
>>> "watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.
>>> 
>>> This patch sets wq clean up budget to 256.
>>> 
>>> Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
>> 
>> This driver with all of it's indirection and layers upon layers of
>> macros for queue processing is so difficult to read, and this can't
>> be generating nice optimal code either...
>> 
>> Anyways, I was walking over the driver to see if the logic is
>> contributing to this.
>> 
>> The limit you are proposing sounds unnecessary, nobody else I can
>> see needs this, and that includes all of the most heavily used
>> drivers under load.
>
> I used 256 as the limit because most of the other drivers use it.
>
> * mlx4 uses MLX4_EN_DEFAULT_TX_WORK as the tx budget in 
> mlx4_en_process_tx_cq()
>  Added in commit fbc6daf19745 ("net/mlx4_en: Ignore budget on TX napi 
> polling")
>
> * i40e&vf uses vsi->work_limit as tx budget in i40e_clean_tx_irq(), which is
>  set to I40E_DEFAULT_IRQ_WORK. Added in commit
>  a619afe814453 ("i40e/i40evf: Add support for bulk free in Tx cleanup")
>
> * ixgbe uses q_vector->tx.work_limit as tx budget in ixgbe_clean_tx_irq(),
>  which is set to IXGBE_DEFAULT_TX_WORK. Added in commit
>  592245559e900 ("ixgbe: Change default Tx work limit size to 256 buffers")
>
>> 
>> If I had to guess I'd say that the problem is that the queue loop
>> keeps sampling the head and tail pointers, where as it should just
>> do that _once_ and only process that TX entries found in that
>> snapshot and return to the poll() routine immedately afterwards.
>
> The only way to know the tail pointer at the time napi is scheduled is to 
> read
> hw fetch_index register. This is discouraged by hw engineers.
>
> We work around this by using color bit. Every cq entry has color bit. It is
> either 0 or 1. Hw flips the bit when it creates a new cq entry. So every new
> cq entry will have a different color bit than previous. We reach end of the
> queue when previous color bit is same as current cq entry's color. i.e hw did
> not flip the bit, so its not a new cq entry.
>
> So enic driver cannot know the tail pointer at the time napi is scheduled, 
> until
> we reach the tail pointer.
>

David,

How would you want us to fix this issue? Is doing an ioread on fetch_index for
every poll our only option? (to get head and tail point once)

If 256 is not reasonable, will wq_budget equal to wq ring size be acceptable?
At any point number of wq entries to be cleaned cannot be more than ring size.

Thanks
Govind
David Miller Dec. 20, 2017, 6:50 p.m. UTC | #4
From: Govindarajulu Varadarajan <gvaradar@cisco.com>
Date: Tue, 19 Dec 2017 16:37:14 -0800

> How would you want us to fix this issue? Is doing an ioread on
> fetch_index for every poll our only option? (to get head and tail
> point once)
> 
> If 256 is not reasonable, will wq_budget equal to wq ring size be
> acceptable?
> At any point number of wq entries to be cleaned cannot be more than
> ring size.

You can resubmit your orignal patch to increase things to 256 but
realize that I still consider this driver terrible at layering,
indirection, and how it does reclaim.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index 6a9527004cb1..9b218f0e5a4c 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -43,6 +43,8 @@ 
 #define ENIC_CQ_MAX		(ENIC_WQ_MAX + ENIC_RQ_MAX)
 #define ENIC_INTR_MAX		(ENIC_CQ_MAX + 2)
 
+#define ENIC_WQ_NAPI_BUDGET	256
+
 #define ENIC_AIC_LARGE_PKT_DIFF	3
 
 struct enic_msix_entry {
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index d98676e43e03..f202ba72a811 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1500,7 +1500,7 @@  static int enic_poll(struct napi_struct *napi, int budget)
 	unsigned int cq_wq = enic_cq_wq(enic, 0);
 	unsigned int intr = enic_legacy_io_intr();
 	unsigned int rq_work_to_do = budget;
-	unsigned int wq_work_to_do = -1; /* no limit */
+	unsigned int wq_work_to_do = ENIC_WQ_NAPI_BUDGET;
 	unsigned int  work_done, rq_work_done = 0, wq_work_done;
 	int err;
 
@@ -1598,7 +1598,7 @@  static int enic_poll_msix_wq(struct napi_struct *napi, int budget)
 	struct vnic_wq *wq = &enic->wq[wq_index];
 	unsigned int cq;
 	unsigned int intr;
-	unsigned int wq_work_to_do = -1; /* clean all desc possible */
+	unsigned int wq_work_to_do = ENIC_WQ_NAPI_BUDGET;
 	unsigned int wq_work_done;
 	unsigned int wq_irq;