diff mbox series

net/sock: don't drop udp packets if udp_mem[2] not reached

Message ID 20200907144435.43165-1-dust.li@linux.alibaba.com
State Changes Requested
Delegated to: David Miller
Headers show
Series net/sock: don't drop udp packets if udp_mem[2] not reached | expand

Commit Message

Dust Li Sept. 7, 2020, 2:44 p.m. UTC
We encoutered udp packets drop under a pretty low pressure
with net.ipv4.udp_mem[0] set to a small value (4096).

After some tracing and debugging, we found that for udp
protocol, __sk_mem_raise_allocated() will possiblly drop
packets if:
  udp_mem[0] < udp_prot.memory_allocated < udp_mem[2]

That's because __sk_mem_raise_allocated() didn't handle
the above condition for protocols like udp who doesn't
have sk_has_memory_pressure()

We can reproduce this with the following condition
1. udp_mem[0] is relateive small,
2. net.core.rmem_default/max > udp_mem[0] * 4K
3. The udp server receive slowly, causing the udp_prot->memory_allocated
   exceed udp_mem[0], but still under udp_mem[2]

I wrote a test script to reproduce this:
https://github.com/dust-li/kernel-test/blob/master/exceed_udp_mem_min_drop/exceed_udp_mem_min_drop.sh

Obviously, we should not drop packets when udp_prot.memory_allocated
just exceed udp_mem[0] but still under hard limit.

For protocols with memory_pressure callbacks (like TCP), this is
not a problem, because there is an extra check:
```
  if (sk_has_memory_pressure(sk)) {
  	u64 alloc;

  	if (!sk_under_memory_pressure(sk))
  		return 1;
  	alloc = sk_sockets_allocated_read_positive(sk);
  	if (sk_prot_mem_limits(sk, 2) > alloc *
  			sk_mem_pages(sk->sk_wmem_queued +
  				atomic_read(&sk->sk_rmem_alloc) +
  				sk->sk_forward_alloc))
  		return 1;
  }
```

But UDP didn't check this, so I add an extra check here
to make sure UDP packets are not dropped until the hard limit
is reached.

Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
---
 net/core/sock.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Paolo Abeni Sept. 7, 2020, 5:18 p.m. UTC | #1
Hi,

On Mon, 2020-09-07 at 22:44 +0800, Dust Li wrote:
> We encoutered udp packets drop under a pretty low pressure
> with net.ipv4.udp_mem[0] set to a small value (4096).
> 
> After some tracing and debugging, we found that for udp
> protocol, __sk_mem_raise_allocated() will possiblly drop
> packets if:
>   udp_mem[0] < udp_prot.memory_allocated < udp_mem[2]
> 
> That's because __sk_mem_raise_allocated() didn't handle
> the above condition for protocols like udp who doesn't
> have sk_has_memory_pressure()
> 
> We can reproduce this with the following condition
> 1. udp_mem[0] is relateive small,
> 2. net.core.rmem_default/max > udp_mem[0] * 4K

This looks like something that could/should be addressed at
configuration level ?!?

udp_mem[0] should accomodate confortably at least a socket.

Cheers,

Paolo
Dust Li Sept. 8, 2020, 3:15 a.m. UTC | #2
On Mon, Sep 07, 2020 at 07:18:48PM +0200, Paolo Abeni wrote:
>Hi,
>
>On Mon, 2020-09-07 at 22:44 +0800, Dust Li wrote:
>> We encoutered udp packets drop under a pretty low pressure
>> with net.ipv4.udp_mem[0] set to a small value (4096).
>> 
>> After some tracing and debugging, we found that for udp
>> protocol, __sk_mem_raise_allocated() will possiblly drop
>> packets if:
>>   udp_mem[0] < udp_prot.memory_allocated < udp_mem[2]
>> 
>> That's because __sk_mem_raise_allocated() didn't handle
>> the above condition for protocols like udp who doesn't
>> have sk_has_memory_pressure()
>> 
>> We can reproduce this with the following condition
>> 1. udp_mem[0] is relateive small,
>> 2. net.core.rmem_default/max > udp_mem[0] * 4K
>
>This looks like something that could/should be addressed at
>configuration level ?!?
Thanks a lot for the review !

Sorry, maybe I haven't make it clear enough

The real problem is the scability with the sockets number.
Since the udp_mem is for all UDP sockets, with the number of udp
sockets grows, soon or later, udp_prot.memory_allocated will
exceed udp_mem[0], and __sk_mem_raise_allocated() will cause
the packets drop here. But the total udp memory allocated
may still far under udp_mem[1] or udp_mem[2]

>
>udp_mem[0] should accomodate confortably at least a socket.

Yeah, I agree udp_mem[0] should be large enough for at least a
socket.

Here I use 4096 just for simple and reproduce what we met before.

I changed my test program a bit:
 - with 16 server sockets
 - with 1 client sending 3000 messages(size: 4096bytes) to each
   of those 8 server sockets
 - set net.core.rmem_default/max to (2*4096*4096)
 - and keep udp_mem unset, which by default on my 4GB VM is
   'net.ipv4.udp_mem = 91944        122592  183888'

https://github.com/dust-li/kernel-test/blob/master/exceed_udp_mem_min_drop/multi_sockets_test.sh


Actually, with more udp sockets, I can always make it large
enough to exceed udp_mem[0], and drop packets before udp_mem[1]
and udp_mem[2].

>
>Cheers,
>
>Paolo
Paolo Abeni Sept. 8, 2020, 8:46 a.m. UTC | #3
Hi,

On Tue, 2020-09-08 at 11:15 +0800, dust.li wrote:
> Actually, with more udp sockets, I can always make it large
> enough to exceed udp_mem[0], and drop packets before udp_mem[1]
> and udp_mem[2].

Sure, but with enough sockets you can also exceeeds any limits ;).

> diff --git a/net/core/sock.c b/net/core/sock.c
> index 6c5c6b18eff4..fed8211d8dbe 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2648,6 +2648,12 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>                                  atomic_read(&sk->sk_rmem_alloc) +
>                                  sk->sk_forward_alloc))
>                         return 1;
> +       } else {
> +               /* for prots without memory_pressure callbacks, we should not
> +                * drop until hard limit reached
> +                */
> +               if (allocated <= sk_prot_mem_limits(sk, 2))
> +                       return 1;

At this point, the above condition is always true, due to an earlier
check. Additionally, accepting any value below udp_mem[2] would make
the previous checks to allow a minimum per socket memory useless.

You can obtain the same result setting udp_mem[0] = udp_mem[2], without
any kernel change. 

But with this change applied you can't guarantee anymore a minimum per
socket amount of memory.

I think you are possibly mislead by your own comment: the point is that
we should never allow allocation above the hard limit, but the protocol
is allowed to drop as soon as the memory allocated raises above the
lower limit.

Note that the current behavior is correctly documented
in Documentation/networking/ip-sysctl.rst.

Your problem must be solved in another way e.g. raising udp_mem[0] -
and keeping udp_mem[2] above that value.

Cheers,

Paolo
Dust Li Sept. 9, 2020, 3:24 a.m. UTC | #4
On Tue, Sep 08, 2020 at 10:46:13AM +0200, Paolo Abeni wrote:
>Hi,
>
>On Tue, 2020-09-08 at 11:15 +0800, dust.li wrote:
>> Actually, with more udp sockets, I can always make it large
>> enough to exceed udp_mem[0], and drop packets before udp_mem[1]
>> and udp_mem[2].
>
>Sure, but with enough sockets you can also exceeeds any limits ;).
>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 6c5c6b18eff4..fed8211d8dbe 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2648,6 +2648,12 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>>                                  atomic_read(&sk->sk_rmem_alloc) +
>>                                  sk->sk_forward_alloc))
>>                         return 1;
>> +       } else {
>> +               /* for prots without memory_pressure callbacks, we should not
>> +                * drop until hard limit reached
>> +                */
>> +               if (allocated <= sk_prot_mem_limits(sk, 2))
>> +                       return 1;
>
>At this point, the above condition is always true, due to an earlier
>check. Additionally, accepting any value below udp_mem[2] would make
>the previous checks to allow a minimum per socket memory useless.

Thanks for your reply !

Yeah, this is wrong :O

>
>You can obtain the same result setting udp_mem[0] = udp_mem[2], without
>any kernel change.
>
>But with this change applied you can't guarantee anymore a minimum per
>socket amount of memory.

Yeah, understood.


>
>I think you are possibly mislead by your own comment: the point is that
>we should never allow allocation above the hard limit, but the protocol
>is allowed to drop as soon as the memory allocated raises above the
>lower limit.

Agree.


>
>Note that the current behavior is correctly documented
>in Documentation/networking/ip-sysctl.rst.
>
>Your problem must be solved in another way e.g. raising udp_mem[0] -
>and keeping udp_mem[2] above that value.

I think it's somehow not quite fit what Documentation/networking/ip-sysctl.rst
says:

    min: Below this number of pages UDP is not bothered about its
    memory appetite. When amount of memory allocated by UDP exceeds
    this number, UDP starts to moderate memory usage.

It says UDP starts to moderate memory usage when exceeds min, but here
it is actually dropping all the packets unless the socket is under minimum
, which means udp_mem[0] is almost equal to the hard limit, and
udp_mem[1]/udp_mem[2] is useless.


Do you think it's usefull if we move the per socket memory check to the
common path for both with and without memory_pressure protocols ? So
only those sockets who comsume most of the memory will got dropped before
hard limit. Something like this:

  if (sk_has_memory_pressure(sk)) {

          if (!sk_under_memory_pressure(sk))
                  return 1;
  }
  alloc = sk_sockets_allocated_read_positive(sk);
  if (sk_prot_mem_limits(sk, 2) > alloc *
      sk_mem_pages(sk->sk_wmem_queued +
                   atomic_read(&sk->sk_rmem_alloc) +
                   sk->sk_forward_alloc))
          return 1;


Sorry for bothering again :)

>
>Cheers,
>
>Paolo
diff mbox series

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index 6c5c6b18eff4..fed8211d8dbe 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2648,6 +2648,12 @@  int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 				 atomic_read(&sk->sk_rmem_alloc) +
 				 sk->sk_forward_alloc))
 			return 1;
+	} else {
+		/* for prots without memory_pressure callbacks, we should not
+		 * drop until hard limit reached
+		 */
+		if (allocated <= sk_prot_mem_limits(sk, 2))
+			return 1;
 	}
 
 suppress_allocation: