diff mbox series

[ovs-dev,3/3] util: munlockall() and retry when locked memory allocation fails

Message ID 20210326183024.3214527-3-blp@ovn.org
State Rejected
Headers show
Series [ovs-dev,1/3] ovs-lldp: Get rid of pointless null pointer check. | expand

Commit Message

Ben Pfaff March 26, 2021, 6:30 p.m. UTC
From: James Page <james.page@ubuntu.com>

Normally when OVS runs on a server the effective system limit for
locked memory is unlimited as the ovs-vswitchd runs as the root
user in the root namespace of the server.

When OVS is run in a non-priviledged container a system limit will
apply and its possible that this limit will be reached during
normal operation.

If this occurs unlock all memory and re-attempt (re)allocation of
memory rather than fail and abort.

If this subsequent attempt also fails abort the process as before.

Signed-off-by: James Page <james.page@ubuntu.com>
[blp@ovn.org adapted to cover more cases]
Signed-off-by: Ben Pfaff <blp@ovn.org>
---
 lib/stream-ssl.c |  4 ++--
 lib/util.c       | 40 +++++++++++++++++++++++++++++++---------
 lib/util.h       |  3 ++-
 3 files changed, 35 insertions(+), 12 deletions(-)

Comments

Ilya Maximets March 26, 2021, 6:52 p.m. UTC | #1
On 3/26/21 7:30 PM, Ben Pfaff wrote:
> From: James Page <james.page@ubuntu.com>
> 
> Normally when OVS runs on a server the effective system limit for
> locked memory is unlimited as the ovs-vswitchd runs as the root
> user in the root namespace of the server.
> 
> When OVS is run in a non-priviledged container a system limit will
> apply and its possible that this limit will be reached during
> normal operation.
> 
> If this occurs unlock all memory and re-attempt (re)allocation of
> memory rather than fail and abort.
> 
> If this subsequent attempt also fails abort the process as before.
> 
> Signed-off-by: James Page <james.page@ubuntu.com>
> [blp@ovn.org adapted to cover more cases]
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---

Hi, Ben and James.

I'll repeat here what I said in a comment on github.

DPDK uses memory locking for mempool pages and it relies on these
pages not being moved or unloaded, because it uses physical addresses
to configure hardware NICs.  Also, some DPDK drivers are using memory
locking to lock in memory pages that are used for driver-specific
device I/O.  Unlocking any of these pages in runtime will likely crash
the system by NMI if such a page will be moved or swapped without
notifying the device and driver.

Blindly unlocking all the memory is very dangerous.

I'd say it's much more safe to just run OVS without passing --mlockall.
Or we need to itroduce a special cmdline flag to allow dynamic unlocking
and also forbid DPDK initialization if this flag enabled.
And all this should be sufficiently documented.

Best regards, Ilya Maximets.
0-day Robot March 26, 2021, 7:02 p.m. UTC | #2
Bleep bloop.  Greetings Ben Pfaff, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Ben Pfaff <blp@ovn.org>
ERROR: C99 style comment
#88 FILE: lib/util.c:132:
#endif // HAVE_MLOCKALL

ERROR: Use xrealloc() in place of realloc()
#119 FILE: lib/util.c:167:
    void *rp = realloc(p, size ? size : 1);

Lines checked: 156, Warnings: 1, Errors: 2


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Ben Pfaff March 26, 2021, 8:18 p.m. UTC | #3
On Fri, Mar 26, 2021 at 07:52:18PM +0100, Ilya Maximets wrote:
> On 3/26/21 7:30 PM, Ben Pfaff wrote:
> > From: James Page <james.page@ubuntu.com>
> > 
> > Normally when OVS runs on a server the effective system limit for
> > locked memory is unlimited as the ovs-vswitchd runs as the root
> > user in the root namespace of the server.
> > 
> > When OVS is run in a non-priviledged container a system limit will
> > apply and its possible that this limit will be reached during
> > normal operation.
> > 
> > If this occurs unlock all memory and re-attempt (re)allocation of
> > memory rather than fail and abort.
> > 
> > If this subsequent attempt also fails abort the process as before.
> > 
> > Signed-off-by: James Page <james.page@ubuntu.com>
> > [blp@ovn.org adapted to cover more cases]
> > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > ---
> 
> Hi, Ben and James.
> 
> I'll repeat here what I said in a comment on github.
> 
> DPDK uses memory locking for mempool pages and it relies on these
> pages not being moved or unloaded, because it uses physical addresses
> to configure hardware NICs.  Also, some DPDK drivers are using memory
> locking to lock in memory pages that are used for driver-specific
> device I/O.  Unlocking any of these pages in runtime will likely crash
> the system by NMI if such a page will be moved or swapped without
> notifying the device and driver.
> 
> Blindly unlocking all the memory is very dangerous.
> 
> I'd say it's much more safe to just run OVS without passing --mlockall.
> Or we need to itroduce a special cmdline flag to allow dynamic unlocking
> and also forbid DPDK initialization if this flag enabled.
> And all this should be sufficiently documented.

I don't have a dog in this fight.  I'll leave it to you and James to
figure out the preferred behavior.

Patches 1 and 2 may be of interest either way.
Ilya Maximets April 7, 2021, 11:19 a.m. UTC | #4
On 3/26/21 9:18 PM, Ben Pfaff wrote:
> On Fri, Mar 26, 2021 at 07:52:18PM +0100, Ilya Maximets wrote:
>> On 3/26/21 7:30 PM, Ben Pfaff wrote:
>>> From: James Page <james.page@ubuntu.com>
>>>
>>> Normally when OVS runs on a server the effective system limit for
>>> locked memory is unlimited as the ovs-vswitchd runs as the root
>>> user in the root namespace of the server.
>>>
>>> When OVS is run in a non-priviledged container a system limit will
>>> apply and its possible that this limit will be reached during
>>> normal operation.
>>>
>>> If this occurs unlock all memory and re-attempt (re)allocation of
>>> memory rather than fail and abort.
>>>
>>> If this subsequent attempt also fails abort the process as before.
>>>
>>> Signed-off-by: James Page <james.page@ubuntu.com>
>>> [blp@ovn.org adapted to cover more cases]
>>> Signed-off-by: Ben Pfaff <blp@ovn.org>
>>> ---
>>
>> Hi, Ben and James.
>>
>> I'll repeat here what I said in a comment on github.
>>
>> DPDK uses memory locking for mempool pages and it relies on these
>> pages not being moved or unloaded, because it uses physical addresses
>> to configure hardware NICs.  Also, some DPDK drivers are using memory
>> locking to lock in memory pages that are used for driver-specific
>> device I/O.  Unlocking any of these pages in runtime will likely crash
>> the system by NMI if such a page will be moved or swapped without
>> notifying the device and driver.
>>
>> Blindly unlocking all the memory is very dangerous.
>>
>> I'd say it's much more safe to just run OVS without passing --mlockall.
>> Or we need to itroduce a special cmdline flag to allow dynamic unlocking
>> and also forbid DPDK initialization if this flag enabled.
>> And all this should be sufficiently documented.
> 
> I don't have a dog in this fight.  I'll leave it to you and James to
> figure out the preferred behavior.

Sure.  AFAIK, container management systems like kubernetes are tracking
memory consumption and doesn't schedule/create new containers if system
(kubelet daemon on the node) reports memory pressure.  So, it should not
be a big concern to run OVS without memory locking.

In its current form change is not acceptable as it will cause problems
for DPDK users (and, probably, afxdp, but I'm not sure about that).

> 
> Patches 1 and 2 may be of interest either way.
> 

Replied to these patches.

Best regards, Ilya Maximets.
Ben Pfaff April 7, 2021, 4:45 p.m. UTC | #5
On Wed, Apr 07, 2021 at 01:19:09PM +0200, Ilya Maximets wrote:
> On 3/26/21 9:18 PM, Ben Pfaff wrote:
> > On Fri, Mar 26, 2021 at 07:52:18PM +0100, Ilya Maximets wrote:
> >> On 3/26/21 7:30 PM, Ben Pfaff wrote:
> >>> From: James Page <james.page@ubuntu.com>
> >>>
> >>> Normally when OVS runs on a server the effective system limit for
> >>> locked memory is unlimited as the ovs-vswitchd runs as the root
> >>> user in the root namespace of the server.
> >>>
> >>> When OVS is run in a non-priviledged container a system limit will
> >>> apply and its possible that this limit will be reached during
> >>> normal operation.
> >>>
> >>> If this occurs unlock all memory and re-attempt (re)allocation of
> >>> memory rather than fail and abort.
> >>>
> >>> If this subsequent attempt also fails abort the process as before.
> >>>
> >>> Signed-off-by: James Page <james.page@ubuntu.com>
> >>> [blp@ovn.org adapted to cover more cases]
> >>> Signed-off-by: Ben Pfaff <blp@ovn.org>
> >>> ---
> >>
> >> Hi, Ben and James.
> >>
> >> I'll repeat here what I said in a comment on github.
> >>
> >> DPDK uses memory locking for mempool pages and it relies on these
> >> pages not being moved or unloaded, because it uses physical addresses
> >> to configure hardware NICs.  Also, some DPDK drivers are using memory
> >> locking to lock in memory pages that are used for driver-specific
> >> device I/O.  Unlocking any of these pages in runtime will likely crash
> >> the system by NMI if such a page will be moved or swapped without
> >> notifying the device and driver.
> >>
> >> Blindly unlocking all the memory is very dangerous.
> >>
> >> I'd say it's much more safe to just run OVS without passing --mlockall.
> >> Or we need to itroduce a special cmdline flag to allow dynamic unlocking
> >> and also forbid DPDK initialization if this flag enabled.
> >> And all this should be sufficiently documented.
> > 
> > I don't have a dog in this fight.  I'll leave it to you and James to
> > figure out the preferred behavior.
> 
> Sure.  AFAIK, container management systems like kubernetes are tracking
> memory consumption and doesn't schedule/create new containers if system
> (kubelet daemon on the node) reports memory pressure.  So, it should not
> be a big concern to run OVS without memory locking.
> 
> In its current form change is not acceptable as it will cause problems
> for DPDK users (and, probably, afxdp, but I'm not sure about that).

Consider this dropped.
James Page April 26, 2021, 9:56 a.m. UTC | #6
On Wed, Apr 7, 2021 at 5:45 PM Ben Pfaff <blp@ovn.org> wrote:

> On Wed, Apr 07, 2021 at 01:19:09PM +0200, Ilya Maximets wrote:
> > On 3/26/21 9:18 PM, Ben Pfaff wrote:
> > > On Fri, Mar 26, 2021 at 07:52:18PM +0100, Ilya Maximets wrote:
> > >> On 3/26/21 7:30 PM, Ben Pfaff wrote:
> > >>> From: James Page <james.page@ubuntu.com>
> > >>>
> > >>> Normally when OVS runs on a server the effective system limit for
> > >>> locked memory is unlimited as the ovs-vswitchd runs as the root
> > >>> user in the root namespace of the server.
> > >>>
> > >>> When OVS is run in a non-priviledged container a system limit will
> > >>> apply and its possible that this limit will be reached during
> > >>> normal operation.
> > >>>
> > >>> If this occurs unlock all memory and re-attempt (re)allocation of
> > >>> memory rather than fail and abort.
> > >>>
> > >>> If this subsequent attempt also fails abort the process as before.
> > >>>
> > >>> Signed-off-by: James Page <james.page@ubuntu.com>
> > >>> [blp@ovn.org adapted to cover more cases]
> > >>> Signed-off-by: Ben Pfaff <blp@ovn.org>
> > >>> ---
> > >>
> > >> Hi, Ben and James.
> > >>
> > >> I'll repeat here what I said in a comment on github.
> > >>
> > >> DPDK uses memory locking for mempool pages and it relies on these
> > >> pages not being moved or unloaded, because it uses physical addresses
> > >> to configure hardware NICs.  Also, some DPDK drivers are using memory
> > >> locking to lock in memory pages that are used for driver-specific
> > >> device I/O.  Unlocking any of these pages in runtime will likely crash
> > >> the system by NMI if such a page will be moved or swapped without
> > >> notifying the device and driver.
> > >>
> > >> Blindly unlocking all the memory is very dangerous.
> > >>
> > >> I'd say it's much more safe to just run OVS without passing
> --mlockall.
> > >> Or we need to itroduce a special cmdline flag to allow dynamic
> unlocking
> > >> and also forbid DPDK initialization if this flag enabled.
> > >> And all this should be sufficiently documented.
> > >
> > > I don't have a dog in this fight.  I'll leave it to you and James to
> > > figure out the preferred behavior.
> >
> > Sure.  AFAIK, container management systems like kubernetes are tracking
> > memory consumption and doesn't schedule/create new containers if system
> > (kubelet daemon on the node) reports memory pressure.  So, it should not
> > be a big concern to run OVS without memory locking.
> >
> > In its current form change is not acceptable as it will cause problems
> > for DPDK users (and, probably, afxdp, but I'm not sure about that).
>
> Consider this dropped.
>

Apologies for the laggy response - email woes now resolved.

I'll raise a review to simply detect execution in a container and stop
passing mlockall in this combo
diff mbox series

Patch

diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c
index 078fcbc3aa4a..cd906e9e7b1d 100644
--- a/lib/stream-ssl.c
+++ b/lib/stream-ssl.c
@@ -1096,10 +1096,10 @@  tmp_dh_callback(SSL *ssl OVS_UNUSED, int is_export OVS_UNUSED, int keylength)
 
     for (dh = dh_table; dh < &dh_table[ARRAY_SIZE(dh_table)]; dh++) {
         if (dh->keylength == keylength) {
-            if (!dh->dh) {
+            while (!dh->dh) {
                 dh->dh = dh->constructor();
                 if (!dh->dh) {
-                    out_of_memory();
+                    munlockall_or_out_of_memory();
                 }
             }
             return dh->dh;
diff --git a/lib/util.c b/lib/util.c
index 1195c7982118..e9c5c40094bc 100644
--- a/lib/util.c
+++ b/lib/util.c
@@ -41,6 +41,9 @@ 
 #ifdef _WIN32
 #include <shlwapi.h>
 #endif
+#ifdef HAVE_MLOCKALL
+#include <sys/mman.h>
+#endif
 
 VLOG_DEFINE_THIS_MODULE(util);
 
@@ -103,6 +106,12 @@  set_memory_locked(void)
     is_memory_locked = true;
 }
 
+void
+set_memory_unlocked(void)
+{
+    is_memory_locked = false;
+}
+
 bool
 memory_locked(void)
 {
@@ -110,8 +119,17 @@  memory_locked(void)
 }
 
 void
-out_of_memory(void)
-{
+munlockall_or_out_of_memory(void)
+{
+#ifdef HAVE_MLOCKALL
+    if (errno == EAGAIN && memory_locked()) {
+        /* Locked memory allocation failed
+         * unlock and try again */
+        munlockall();
+        set_memory_unlocked();
+        return;
+    }
+#endif // HAVE_MLOCKALL
     ovs_abort(0, "virtual memory exhausted");
 }
 
@@ -120,7 +138,8 @@  xcalloc__(size_t count, size_t size)
 {
     void *p = count && size ? calloc(count, size) : malloc(1);
     if (p == NULL) {
-        out_of_memory();
+        munlockall_or_out_of_memory();
+        return xcalloc__(count, size);
     }
     return p;
 }
@@ -136,7 +155,8 @@  xmalloc__(size_t size)
 {
     void *p = malloc(size ? size : 1);
     if (p == NULL) {
-        out_of_memory();
+        munlockall_or_out_of_memory();
+        return xmalloc__(size);
     }
     return p;
 }
@@ -144,11 +164,12 @@  xmalloc__(size_t size)
 void *
 xrealloc__(void *p, size_t size)
 {
-    p = realloc(p, size ? size : 1);
-    if (p == NULL) {
-        out_of_memory();
+    void *rp = realloc(p, size ? size : 1);
+    if (rp == NULL) {
+        munlockall_or_out_of_memory();
+        return xrealloc__(p, size);
     }
-    return p;
+    return rp;
 }
 
 void *
@@ -253,7 +274,8 @@  xmalloc_size_align(size_t size, size_t alignment)
     COVERAGE_INC(util_xalloc);
     error = posix_memalign(&p, alignment, size ? size : 1);
     if (error != 0) {
-        out_of_memory();
+        munlockall_or_out_of_memory();
+        return xmalloc_size_align(size, alignment);
     }
     return p;
 #else
diff --git a/lib/util.h b/lib/util.h
index 9c2b14fae304..a1958fcf93b4 100644
--- a/lib/util.h
+++ b/lib/util.h
@@ -144,8 +144,9 @@  void ctl_timeout_setup(unsigned int secs);
 void ovs_print_version(uint8_t min_ofp, uint8_t max_ofp);
 
 void set_memory_locked(void);
+void set_memory_unlocked(void);
 bool memory_locked(void);
-OVS_NO_RETURN void out_of_memory(void);
+void munlockall_or_out_of_memory(void);
 
 /* Allocation wrappers that abort if memory is exhausted. */
 void *xmalloc(size_t) MALLOC_LIKE;