mbox series

[PATCHv2,0/2] forcedeth: recv cache to make NIC work steadily

Message ID 1563713633-25528-1-git-send-email-yanjun.zhu@oracle.com
Headers show
Series forcedeth: recv cache to make NIC work steadily | expand

Message

Zhu Yanjun July 21, 2019, 12:53 p.m. UTC
These patches are to this scenario:

"
When the host run for long time, there are a lot of memory fragments in
the hosts. And it is possible that kernel will compact memory fragments.
But normally it is difficult for NIC driver to allocate a memory from
kernel. From this variable stat_rx_dropped, we can confirm that NIC driver
can not allocate skb very frequently.
"

Since NIC driver can not allocate skb in time, this makes some important
tasks not be completed in time.
To avoid it, a recv cache is created to pre-allocate skb for NIC driver.
This can make the important tasks be completed in time.
From Nan's tests in LAB, these patches can make NIC driver work steadily.
Now in production hosts, these patches are applied.

With these patches, one NIC port needs 125MiB reserved. This 125MiB memory
can not be used by others. To a host on which the communications are not
mandatory, it is not necessary to reserve so much memory. So this recv cache
is disabled by default.

V1->V2:
1. ndelay is replaced with GFP_KERNEL function __netdev_alloc_skb.
2. skb_queue_purge is used when recv cache is destroyed.
3. RECV_LIST_ALLOCATE bit is removed.
4. schedule_delayed_work is moved out of while loop.

Zhu Yanjun (2):
  forcedeth: add recv cache make nic work steadily
  forcedeth: disable recv cache by default

 drivers/net/ethernet/nvidia/Kconfig     |  11 +++
 drivers/net/ethernet/nvidia/Makefile    |   1 +
 drivers/net/ethernet/nvidia/forcedeth.c | 129 +++++++++++++++++++++++++++++++-
 3 files changed, 139 insertions(+), 2 deletions(-)

Comments

Andrew Lunn July 21, 2019, 2:53 p.m. UTC | #1
On Sun, Jul 21, 2019 at 08:53:51AM -0400, Zhu Yanjun wrote:
> These patches are to this scenario:
> 
> "
> When the host run for long time, there are a lot of memory fragments in
> the hosts. And it is possible that kernel will compact memory fragments.
> But normally it is difficult for NIC driver to allocate a memory from
> kernel. From this variable stat_rx_dropped, we can confirm that NIC driver
> can not allocate skb very frequently.
> "
> 
> Since NIC driver can not allocate skb in time, this makes some important
> tasks not be completed in time.
> To avoid it, a recv cache is created to pre-allocate skb for NIC driver.
> This can make the important tasks be completed in time.
> >From Nan's tests in LAB, these patches can make NIC driver work steadily.
> Now in production hosts, these patches are applied.
> 
> With these patches, one NIC port needs 125MiB reserved. This 125MiB memory
> can not be used by others. To a host on which the communications are not
> mandatory, it is not necessary to reserve so much memory. So this recv cache
> is disabled by default.
> 
> V1->V2:
> 1. ndelay is replaced with GFP_KERNEL function __netdev_alloc_skb.
> 2. skb_queue_purge is used when recv cache is destroyed.
> 3. RECV_LIST_ALLOCATE bit is removed.
> 4. schedule_delayed_work is moved out of while loop.

Hi Zhu

You don't appear to of address David's comment that this is probably
the wrong way to do this, it should be a generic solution.

Also, that there should be enough atomic memory in the system
anyway. Have you looked at what other drivers are using atomic memory?
It could actually be you need to debug some other driver, rather than
add hacks to forcedeth.

    Andrew
David Miller July 21, 2019, 6:45 p.m. UTC | #2
I made it abundantly clear that I am completely not supportive of
changes like this.

If anything, we need to improve the behavior of the core kernel
allocators, and the mid-level networking interfaces which use them,
to fix problems like this.

It is absolutely not sustainable to have every driver implement
a cache of some sort to "improve" allocation behavior.

Sorry, there is no way in the world I'm applying changes like
these.