Message ID | 566D7DD4.5070708@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
On Sun, Dec 13, 2015 at 07:46:52PM +0530, Numan Siddique wrote: > Fixes the issue > Reported-at: http://openvswitch.org/pipermail/discuss/2015-November/019443.html > > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> This patch has been sitting too long without review. This is my fault. I apologize. I'm concerned about this patch for a couple of reasons. First, the goal of the ovsdb_idl_loop abstraction is to make it easy for clients to get all of this stuff right. This patch adds more code outside the ovsdb_idl_loop implementation in its client, somewhat reducing the value of the abstraction. It would be better if we could instead integrate the logic into ovsdb_idl_loop itself. My second concern is larger. TXN_ERROR is supposed to indicate a hard error, one for which retrying immediately is not going to help. What kind of error did testing actually encounter? Perhaps the real bug is that an error is being misclassified as TXN_ERROR.
On 02/04/2016 05:05 AM, Ben Pfaff wrote: > I'm concerned about this patch for a couple of reasons. First, the goal > of the ovsdb_idl_loop abstraction is to make it easy for clients to get > all of this stuff right. This patch adds more code outside the > ovsdb_idl_loop implementation in its client, somewhat reducing the value > of the abstraction. It would be better if we could instead integrate > the logic into ovsdb_idl_loop itself. Thanks for the review and comments Ben. I agree. > My second concern is larger. TXN_ERROR is supposed to indicate a hard > error, one for which retrying immediately is not going to help. What > kind of error did testing actually encounter? Perhaps the real bug is > that an error is being misclassified as TXN_ERROR. I will investigate it further to see why TXN_ERROR is returned. Thanks Numan
On 02/04/2016 12:37 PM, Numan Siddique wrote: > Thanks for the review and comments Ben. I agree. >> > My second concern is larger. TXN_ERROR is supposed to indicate a hard >> > error, one for which retrying immediately is not going to help. What >> > kind of error did testing actually encounter? Perhaps the real bug is >> > that an error is being misclassified as TXN_ERROR. > I will investigate it further to see why TXN_ERROR is returned. I did some testing and I am not able to see the TXN_ERROR anymore, but still I am able to reproduce the issue mentioned here - http://openvswitch.org/pipermail/discuss/2015-November/019443.html Just to brief about the issue - Suppose Southbound db table Logical_Flows has 8 flows in it - And I modify the ovn_northd.c to add a flow (lets say in build_lswitch_flows) and restart ovn-northd, I don't see the Logical_Flows table updated to 9 flows. - Later if i update Northbound db (by running ovn-nbctl), the Logical_Flows gets synced up. - I am able to reproduce this issue most of the times. The reason for the issue is the below code ********************** if (ovnnb_seqno != ovsdb_idl_get_seqno(ctx.ovnnb_idl)) { ovnnb_seqno = ovsdb_idl_get_seqno(ctx.ovnnb_idl); ovnnb_db_run(&ctx); } if (ovnsb_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) { ovnsb_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl); ovnsb_db_run(&ctx); } ******************** When ovnnb_db_run is called, and if ctx->ovnsb_txn is NULL, ovnnb_db_run returns immediately without generating the logical flows. ovnnb_db_run is not called again until the Northbound db seqno changes. I think its better to revert this commit - f20396e051ea4fd623bfc022cc78d9bd52a850e5 Thanks Numan
diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index 0d02ae8..2e536c9 100644 --- a/lib/ovsdb-idl.c +++ b/lib/ovsdb-idl.c @@ -3209,6 +3209,7 @@ ovsdb_idl_loop_commit_and_wait(struct ovsdb_idl_loop *loop) ovsdb_idl_txn_destroy(txn); loop->committing_txn = NULL; } + loop->last_commit_status = status; } ovsdb_idl_wait(loop->idl); diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h index 4c66ae0..5f7f2fc 100644 --- a/lib/ovsdb-idl.h +++ b/lib/ovsdb-idl.h @@ -266,6 +266,7 @@ struct ovsdb_idl_loop { unsigned int precommit_seqno; struct ovsdb_idl_txn *open_txn; + enum ovsdb_idl_txn_status last_commit_status; }; #define OVSDB_IDL_LOOP_INITIALIZER(IDL) { .idl = (IDL) } diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 270b116..be4d6a8 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -1912,6 +1912,8 @@ main(int argc, char *argv[]) /* Main loop. */ exiting = false; + bool ovnnb_changes_pending = false; + bool ovnsb_changes_pending = false; while (!exiting) { struct northd_context ctx = { .ovnnb_idl = ovnnb_idl_loop.idl, @@ -1922,11 +1924,28 @@ main(int argc, char *argv[]) if (ovnnb_seqno != ovsdb_idl_get_seqno(ctx.ovnnb_idl)) { ovnnb_seqno = ovsdb_idl_get_seqno(ctx.ovnnb_idl); - ovnnb_db_run(&ctx); + ovnnb_changes_pending = true; } if (ovnsb_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) { ovnsb_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl); + ovnsb_changes_pending = true; + } + + /* + * We need to recalculate the mappings to the OVN-sb db + * - If the OVN-nb db has changed or + * - If the previous update to the OVN-sb is not yet completed. + */ + if (ovnnb_changes_pending || + ovnsb_idl_loop.last_commit_status == TXN_ERROR) { + ovnnb_db_run(&ctx); + ovnnb_changes_pending = false; + } + + if (ovnsb_changes_pending || + ovnnb_idl_loop.last_commit_status == TXN_ERROR) { ovnsb_db_run(&ctx); + ovnsb_changes_pending = false; } unixctl_server_run(unixctl);
Fixes the issue Reported-at: http://openvswitch.org/pipermail/discuss/2015-November/019443.html Signed-off-by: Numan Siddique <nusiddiq@redhat.com> --- lib/ovsdb-idl.c | 1 + lib/ovsdb-idl.h | 1 + ovn/northd/ovn-northd.c | 21 ++++++++++++++++++++- 3 files changed, 22 insertions(+), 1 deletion(-)