Message ID | 20240919174251.5136-1-naveen.yerramneni@nutanix.com |
---|---|
State | New |
Headers | show |
Series | [ovs-dev] controller: Fix "use after free" issue in statctrl_run(). | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | fail | github build: failed |
Hi Team, Could you please review this patch ? Thanks, Naveen From: Naveen Yerramneni <naveen.yerramneni@nutanix.com> Date: Thursday, 19 September 2024 at 11:13 PM To: dev@openvswitch.org <dev@openvswitch.org> Cc: Mary Manohar <mary.manohar@nutanix.com>, Shibir Basak <shibir.basak@nutanix.com>, Naveen Yerramneni <naveen.yerramneni@nutanix.com> Subject: [PATCH ovn] controller: Fix "use after free" issue in statctrl_run(). Issue: ===== OVN controller main() is caching the "sbrec_mac_binding/sbrec_fdb" pointers in "mac_cache_mac_binding->mac_bindings/fdbs" which are used to update timestamp in mac_cache_mb_stats_run()/ mac_cache_fdb_stats_run(). There is a condition where cached "sbrec_mac_binding/sbrec_fdb" pointers gets freed without removing the refs from "mac_cache_mac_binding->mac_bindings/fdbs" which is leading to crash due to "use after free". This condition can occur due to following reason. - en_runtime_data is input to en_mac_cache. - en_runtime_data went for full recompute but engine_run() is called with recompute_allowed as false due to ofctrl_has_backlog() returned true. This caused to avoid the recompute in the current engine run. - If there are any mac_binding/fdb deletes in the same run then, entries gets deleted during ovsdb_idl_loop_run() and rows are freed in ovsdb_idl_track_clear() but references to them are present in mac_cache_data->mac_bindings/fdbs. Fix: === Avoid statctrl_run() when engine recompute is not allowed. Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> --- controller/ovn-controller.c | 57 +++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index c48667887..6bb8f07db 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -5363,6 +5363,7 @@ main(int argc, char *argv[]) /* Main loop. */ bool sb_monitor_all = false; while (!exit_args.exiting) { + bool recompute_allowed = true; memory_run(); if (memory_should_report()) { struct simap usage = SIMAP_INITIALIZER(&usage); @@ -5536,34 +5537,34 @@ main(int argc, char *argv[]) stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME, time_msec()); - if (ovnsb_idl_txn) { - if (ofctrl_has_backlog()) { - /* When there are in-flight messages pending to - * ovs-vswitchd, we should hold on recomputing so - * that the previous flow installations won't be - * delayed. However, we still want to try if - * recompute is not needed and we can quickly - * incrementally process the new changes, to avoid - * unnecessarily forced recomputes later on. This - * is because the OVSDB change tracker cannot - * preserve tracked changes across iterations. If - * change tracking is improved, we can simply skip - * this round of engine_run and continue processing - * acculated changes incrementally later when - * ofctrl_has_backlog() returns false. */ - engine_run(false); - } else { - engine_run(true); - } - } else { + if (!ovnsb_idl_txn || ofctrl_has_backlog()) { + /* No ovnsb_idl_txn */ /* Even if there's no SB DB transaction available, - * try to run the engine so that we can handle any - * incremental changes that don't require a recompute. - * If a recompute is required, the engine will cancel, - * triggerring a full run in the next iteration. - */ - engine_run(false); + * try to run the engine so that we can handle any + * incremental changes that don't require a recompute. + * If a recompute is required, the engine will cancel, + * triggerring a full run in the next iteration. + */ + + /* OR */ + + /* ofctrl_has_backlog */ + /* When there are in-flight messages pending to + * ovs-vswitchd, we should hold on recomputing so + * that the previous flow installations won't be + * delayed. However, we still want to try if + * recompute is not needed and we can quickly + * incrementally process the new changes, to avoid + * unnecessarily forced recomputes later on. This + * is because the OVSDB change tracker cannot + * preserve tracked changes across iterations. If + * change tracking is improved, we can simply skip + * this round of engine_run and continue processing + * acculated changes incrementally later when + * ofctrl_has_backlog() returns false. */ + recompute_allowed = false; } + engine_run(recompute_allowed); stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME, time_msec()); if (engine_has_updated()) { @@ -5685,7 +5686,9 @@ main(int argc, char *argv[]) } } - if (mac_cache_data) { + if (mac_cache_data && recompute_allowed) { + /* Run only when engine recompute is allowed + * since mac_binding/FDB rows are cached */ statctrl_run(ovnsb_idl_txn, mac_cache_data); } -- 2.36.6
On Thu, Sep 19, 2024 at 7:43 PM Naveen Yerramneni via dev < ovs-dev@openvswitch.org> wrote: > Issue: > ===== > OVN controller main() is caching the "sbrec_mac_binding/sbrec_fdb" > pointers in "mac_cache_mac_binding->mac_bindings/fdbs" which are > used to update timestamp in mac_cache_mb_stats_run()/ > mac_cache_fdb_stats_run(). There is a condition where cached > "sbrec_mac_binding/sbrec_fdb" pointers gets freed without > removing the refs from "mac_cache_mac_binding->mac_bindings/fdbs" > which is leading to crash due to "use after free". > > This condition can occur due to following reason. > - en_runtime_data is input to en_mac_cache. > - en_runtime_data went for full recompute but engine_run() > is called with recompute_allowed as false due to > ofctrl_has_backlog() returned true. This caused to avoid > the recompute in the current engine run. > - If there are any mac_binding/fdb deletes in the same run > then, entries gets deleted during ovsdb_idl_loop_run() and > rows are freed in ovsdb_idl_track_clear() but references > to them are present in mac_cache_data->mac_bindings/fdbs. > > Fix: > === > Avoid statctrl_run() when engine recompute is not allowed. > > Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> > --- > Hello Neveen, thank you for the patch. I have one comment down below. This also makes me wonder, don't we have potentially the same issue with pinctrl? I'm pretty sure that there are some SB references too. controller/ovn-controller.c | 57 +++++++++++++++++++------------------ > 1 file changed, 30 insertions(+), 27 deletions(-) > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index c48667887..6bb8f07db 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -5363,6 +5363,7 @@ main(int argc, char *argv[]) > /* Main loop. */ > bool sb_monitor_all = false; > while (!exit_args.exiting) { > + bool recompute_allowed = true; > There is no need for the scope of this variable to be the whole main loop. It could actually be changed to: bool recompute_allowed = !ovnsb_idl_txn || ofctrl_has_backlog(); engine_run(recompute_allowed); Keeping the combined comment of course. memory_run(); > if (memory_should_report()) { > struct simap usage = SIMAP_INITIALIZER(&usage); > @@ -5536,34 +5537,34 @@ main(int argc, char *argv[]) > > stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME, > time_msec()); > - if (ovnsb_idl_txn) { > - if (ofctrl_has_backlog()) { > - /* When there are in-flight messages pending > to > - * ovs-vswitchd, we should hold on > recomputing so > - * that the previous flow installations won't > be > - * delayed. However, we still want to try if > - * recompute is not needed and we can quickly > - * incrementally process the new changes, to > avoid > - * unnecessarily forced recomputes later on. > This > - * is because the OVSDB change tracker cannot > - * preserve tracked changes across > iterations. If > - * change tracking is improved, we can simply > skip > - * this round of engine_run and continue > processing > - * acculated changes incrementally later when > - * ofctrl_has_backlog() returns false. */ > - engine_run(false); > - } else { > - engine_run(true); > - } > - } else { > + if (!ovnsb_idl_txn || ofctrl_has_backlog()) { > + /* No ovnsb_idl_txn */ > /* Even if there's no SB DB transaction available, > - * try to run the engine so that we can handle any > - * incremental changes that don't require a > recompute. > - * If a recompute is required, the engine will > cancel, > - * triggerring a full run in the next iteration. > - */ > - engine_run(false); > + * try to run the engine so that we can handle any > + * incremental changes that don't require a > recompute. > + * If a recompute is required, the engine will > cancel, > + * triggerring a full run in the next iteration. > + */ > + > + /* OR */ > + > + /* ofctrl_has_backlog */ > + /* When there are in-flight messages pending to > + * ovs-vswitchd, we should hold on recomputing so > + * that the previous flow installations won't be > + * delayed. However, we still want to try if > + * recompute is not needed and we can quickly > + * incrementally process the new changes, to avoid > + * unnecessarily forced recomputes later on. This > + * is because the OVSDB change tracker cannot > + * preserve tracked changes across iterations. If > + * change tracking is improved, we can simply skip > + * this round of engine_run and continue > processing > + * acculated changes incrementally later when > + * ofctrl_has_backlog() returns false. */ > + recompute_allowed = false; > } > + engine_run(recompute_allowed); > stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME, > time_msec()); > if (engine_has_updated()) { > @@ -5685,7 +5686,9 @@ main(int argc, char *argv[]) > } > } > > - if (mac_cache_data) { > + if (mac_cache_data && recompute_allowed) { > + /* Run only when engine recompute is allowed > + * since mac_binding/FDB rows are cached */ > statctrl_run(ovnsb_idl_txn, mac_cache_data); > } > > -- > 2.36.6 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Thanks, Ales
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index c48667887..6bb8f07db 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -5363,6 +5363,7 @@ main(int argc, char *argv[]) /* Main loop. */ bool sb_monitor_all = false; while (!exit_args.exiting) { + bool recompute_allowed = true; memory_run(); if (memory_should_report()) { struct simap usage = SIMAP_INITIALIZER(&usage); @@ -5536,34 +5537,34 @@ main(int argc, char *argv[]) stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME, time_msec()); - if (ovnsb_idl_txn) { - if (ofctrl_has_backlog()) { - /* When there are in-flight messages pending to - * ovs-vswitchd, we should hold on recomputing so - * that the previous flow installations won't be - * delayed. However, we still want to try if - * recompute is not needed and we can quickly - * incrementally process the new changes, to avoid - * unnecessarily forced recomputes later on. This - * is because the OVSDB change tracker cannot - * preserve tracked changes across iterations. If - * change tracking is improved, we can simply skip - * this round of engine_run and continue processing - * acculated changes incrementally later when - * ofctrl_has_backlog() returns false. */ - engine_run(false); - } else { - engine_run(true); - } - } else { + if (!ovnsb_idl_txn || ofctrl_has_backlog()) { + /* No ovnsb_idl_txn */ /* Even if there's no SB DB transaction available, - * try to run the engine so that we can handle any - * incremental changes that don't require a recompute. - * If a recompute is required, the engine will cancel, - * triggerring a full run in the next iteration. - */ - engine_run(false); + * try to run the engine so that we can handle any + * incremental changes that don't require a recompute. + * If a recompute is required, the engine will cancel, + * triggerring a full run in the next iteration. + */ + + /* OR */ + + /* ofctrl_has_backlog */ + /* When there are in-flight messages pending to + * ovs-vswitchd, we should hold on recomputing so + * that the previous flow installations won't be + * delayed. However, we still want to try if + * recompute is not needed and we can quickly + * incrementally process the new changes, to avoid + * unnecessarily forced recomputes later on. This + * is because the OVSDB change tracker cannot + * preserve tracked changes across iterations. If + * change tracking is improved, we can simply skip + * this round of engine_run and continue processing + * acculated changes incrementally later when + * ofctrl_has_backlog() returns false. */ + recompute_allowed = false; } + engine_run(recompute_allowed); stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME, time_msec()); if (engine_has_updated()) { @@ -5685,7 +5686,9 @@ main(int argc, char *argv[]) } } - if (mac_cache_data) { + if (mac_cache_data && recompute_allowed) { + /* Run only when engine recompute is allowed + * since mac_binding/FDB rows are cached */ statctrl_run(ovnsb_idl_txn, mac_cache_data); }
Issue: ===== OVN controller main() is caching the "sbrec_mac_binding/sbrec_fdb" pointers in "mac_cache_mac_binding->mac_bindings/fdbs" which are used to update timestamp in mac_cache_mb_stats_run()/ mac_cache_fdb_stats_run(). There is a condition where cached "sbrec_mac_binding/sbrec_fdb" pointers gets freed without removing the refs from "mac_cache_mac_binding->mac_bindings/fdbs" which is leading to crash due to "use after free". This condition can occur due to following reason. - en_runtime_data is input to en_mac_cache. - en_runtime_data went for full recompute but engine_run() is called with recompute_allowed as false due to ofctrl_has_backlog() returned true. This caused to avoid the recompute in the current engine run. - If there are any mac_binding/fdb deletes in the same run then, entries gets deleted during ovsdb_idl_loop_run() and rows are freed in ovsdb_idl_track_clear() but references to them are present in mac_cache_data->mac_bindings/fdbs. Fix: === Avoid statctrl_run() when engine recompute is not allowed. Signed-off-by: Naveen Yerramneni <naveen.yerramneni@nutanix.com> --- controller/ovn-controller.c | 57 +++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 27 deletions(-)