Message ID | 168192964823.4031872.3228556334798413886.stgit@fed.void |
---|---|
State | Changes Requested |
Headers | show |
Series | conntrack: Fix failed assertion in conn_update_state(). | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 4/19/23 20:40, Paolo Valerio wrote: > During the creation of a new connection, there's a chance both key and > rev_key end up having the same hash. This is more common in the case > of all-zero snat with no collisions. In that case, once the > connection is expired, but not cleaned up, if a new packet with the > same 5-tuple is received, an assertion failure gets triggered in > conn_update_state() because of a previous failure of retrieving a > CT_CONN_TYPE_DEFAULT connection. > > Fix it by releasing the nat_conn during the connection creation in the > case of same hash for both key and rev_key. This sounds a bit odd. Shouldn't we treat hash collision as a normal case? Looking at the code, I'm assuming that the issue comes from the following part in process_one(): if (OVS_LIKELY(conn)) { if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { ... conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); And here we get the same connection again, because the default one is already expired. Is that correct? If so, maybe we should add an extra condition to conn_key_lookup() to only look for DEFAULT connections instead, just for this case? Since we really don't want to get the UN_NAT one here. Best regards, Ilya Maximets. > > Reported-by: Michael Plato <michael.plato@tu-berlin.de> > Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") > Signed-off-by: Paolo Valerio <pvalerio@redhat.com> > --- > In this thread [0] there are some more details. A similar > approach here could be to avoid to add the nat_conn to the cmap and > letting the sweeper release the memory for nat_conn once the whole > connection gets freed. > That approach could still be ok, but the drawback is that it could > require a different patch for older branches that don't include > 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with > rculists."). It still worth to be considered. > > [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html > --- > lib/conntrack.c | 21 +++++++++++++-------- > 1 file changed, 13 insertions(+), 8 deletions(-) > > diff --git a/lib/conntrack.c b/lib/conntrack.c > index 7e1fc4b1f..d2ee127d9 100644 > --- a/lib/conntrack.c > +++ b/lib/conntrack.c > @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, > } > > nat_packet(pkt, nc, false, ctx->icmp_related); > - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); > - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); > - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > - nat_conn->nat_action = 0; > - nat_conn->alg = NULL; > - nat_conn->nat_conn = NULL; > - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); > - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); > + uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis); > + if (nat_hash != ctx->hash) { > + memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); > + memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); > + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > + nat_conn->nat_action = 0; > + nat_conn->alg = NULL; > + nat_conn->nat_conn = NULL; > + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); > + } else { > + free(nat_conn); > + nat_conn = NULL; > + } > } > > nc->nat_conn = nat_conn; >
Ilya Maximets <i.maximets@ovn.org> writes: > On 4/19/23 20:40, Paolo Valerio wrote: >> During the creation of a new connection, there's a chance both key and >> rev_key end up having the same hash. This is more common in the case >> of all-zero snat with no collisions. In that case, once the >> connection is expired, but not cleaned up, if a new packet with the >> same 5-tuple is received, an assertion failure gets triggered in >> conn_update_state() because of a previous failure of retrieving a >> CT_CONN_TYPE_DEFAULT connection. >> >> Fix it by releasing the nat_conn during the connection creation in the >> case of same hash for both key and rev_key. > > This sounds a bit odd. Shouldn't we treat hash collision as a normal case? > > Looking at the code, I'm assuming that the issue comes from the following > part in process_one(): > > if (OVS_LIKELY(conn)) { > if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { > ... > conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); > > And here we get the same connection again, because the default one is already > expired. Is that correct? > > If so, maybe we should add an extra condition to conn_key_lookup() to > only look for DEFAULT connections instead, just for this case? Since > we really don't want to get the UN_NAT one here. > Hello Ilya, It's a fair point. I initially thought about the approach you're suggesting, but I had some concerns about it that I'll try to summarize below. For sure it would fix the issue (it could require the first patch to be applied as well for the branches with rcu exp lists). Based on the current logic, new packets matching that expired connection but not evicted will be marked as +inv and further packets will be marked so for the whole sweep interval unless an exception like this get added: uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); /* the last flag indicates CT_CONN_TYPE_DEFAULT only */ conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true); /* special case where there's hash collision */ if (!conn && ctx->hash != hash) { pkt->md.ct_state |= CS_INVALID; write_ct_md(pkt, zone, NULL, NULL, NULL); ... return; } This would further require that subsequent lookup in the create_new_conn path are restricted to CT_CONN_TYPE_DEFAULT, e.g.: uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis); /* Only check for CT_CONN_TYPE_DEFAULT */ if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) { conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, helper, alg_exp, ct_alg_ctl, tp_id); } otherwise we could incur in a false positive which prevent to create a new connection. > Best regards, Ilya Maximets. > >> >> Reported-by: Michael Plato <michael.plato@tu-berlin.de> >> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") >> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> >> --- >> In this thread [0] there are some more details. A similar >> approach here could be to avoid to add the nat_conn to the cmap and >> letting the sweeper release the memory for nat_conn once the whole >> connection gets freed. >> That approach could still be ok, but the drawback is that it could >> require a different patch for older branches that don't include >> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with >> rculists."). It still worth to be considered. >> >> [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html >> --- >> lib/conntrack.c | 21 +++++++++++++-------- >> 1 file changed, 13 insertions(+), 8 deletions(-) >> >> diff --git a/lib/conntrack.c b/lib/conntrack.c >> index 7e1fc4b1f..d2ee127d9 100644 >> --- a/lib/conntrack.c >> +++ b/lib/conntrack.c >> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, >> } >> >> nat_packet(pkt, nc, false, ctx->icmp_related); >> - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >> - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >> - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >> - nat_conn->nat_action = 0; >> - nat_conn->alg = NULL; >> - nat_conn->nat_conn = NULL; >> - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); >> - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >> + uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis); >> + if (nat_hash != ctx->hash) { >> + memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >> + memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >> + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >> + nat_conn->nat_action = 0; >> + nat_conn->alg = NULL; >> + nat_conn->nat_conn = NULL; >> + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >> + } else { >> + free(nat_conn); >> + nat_conn = NULL; >> + } >> } >> >> nc->nat_conn = nat_conn; >>
On 5/4/23 19:21, Paolo Valerio wrote: > Ilya Maximets <i.maximets@ovn.org> writes: > >> On 4/19/23 20:40, Paolo Valerio wrote: >>> During the creation of a new connection, there's a chance both key and >>> rev_key end up having the same hash. This is more common in the case >>> of all-zero snat with no collisions. In that case, once the >>> connection is expired, but not cleaned up, if a new packet with the >>> same 5-tuple is received, an assertion failure gets triggered in >>> conn_update_state() because of a previous failure of retrieving a >>> CT_CONN_TYPE_DEFAULT connection. >>> >>> Fix it by releasing the nat_conn during the connection creation in the >>> case of same hash for both key and rev_key. >> >> This sounds a bit odd. Shouldn't we treat hash collision as a normal case? >> >> Looking at the code, I'm assuming that the issue comes from the following >> part in process_one(): >> >> if (OVS_LIKELY(conn)) { >> if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { >> ... >> conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); >> >> And here we get the same connection again, because the default one is already >> expired. Is that correct? >> >> If so, maybe we should add an extra condition to conn_key_lookup() to >> only look for DEFAULT connections instead, just for this case? Since >> we really don't want to get the UN_NAT one here. >> > > Hello Ilya, > > It's a fair point. > I initially thought about the approach you're suggesting, but I had some > concerns about it that I'll try to summarize below. > > For sure it would fix the issue (it could require the first patch to be > applied as well for the branches with rcu exp lists). > > Based on the current logic, new packets matching that expired connection > but not evicted will be marked as +inv and further packets will be > marked so for the whole sweep interval unless an exception like this get > added: > > uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); > /* the last flag indicates CT_CONN_TYPE_DEFAULT only */ > conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true); > /* special case where there's hash collision */ > if (!conn && ctx->hash != hash) { > pkt->md.ct_state |= CS_INVALID; > write_ct_md(pkt, zone, NULL, NULL, NULL); > ... > return; > } > > This would further require that subsequent lookup in the create_new_conn > path are restricted to CT_CONN_TYPE_DEFAULT, e.g.: > > uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis); > /* Only check for CT_CONN_TYPE_DEFAULT */ > if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) { > conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, > helper, alg_exp, ct_alg_ctl, tp_id); > } > > otherwise we could incur in a false positive which prevent to create a > new connection. I'm not really sure if what described above is more correct way of doing things or not... Aaron, do you have opinion on this? Another thought: Can we expire the CT_CONN_TYPE_UN_NAT connection the moment DEFAULT counterpart of it expires? Or that will that be against some logic / not possible to do? > >> Best regards, Ilya Maximets. >> >>> >>> Reported-by: Michael Plato <michael.plato@tu-berlin.de> >>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") >>> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> >>> --- >>> In this thread [0] there are some more details. A similar >>> approach here could be to avoid to add the nat_conn to the cmap and >>> letting the sweeper release the memory for nat_conn once the whole >>> connection gets freed. >>> That approach could still be ok, but the drawback is that it could >>> require a different patch for older branches that don't include >>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with >>> rculists."). It still worth to be considered. >>> >>> [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html >>> --- >>> lib/conntrack.c | 21 +++++++++++++-------- >>> 1 file changed, 13 insertions(+), 8 deletions(-) >>> >>> diff --git a/lib/conntrack.c b/lib/conntrack.c >>> index 7e1fc4b1f..d2ee127d9 100644 >>> --- a/lib/conntrack.c >>> +++ b/lib/conntrack.c >>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, >>> } >>> >>> nat_packet(pkt, nc, false, ctx->icmp_related); >>> - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >>> - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >>> - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >>> - nat_conn->nat_action = 0; >>> - nat_conn->alg = NULL; >>> - nat_conn->nat_conn = NULL; >>> - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); >>> - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >>> + uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis); >>> + if (nat_hash != ctx->hash) { >>> + memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >>> + memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >>> + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >>> + nat_conn->nat_action = 0; >>> + nat_conn->alg = NULL; >>> + nat_conn->nat_conn = NULL; >>> + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >>> + } else { >>> + free(nat_conn); >>> + nat_conn = NULL; >>> + } >>> } >>> >>> nc->nat_conn = nat_conn; >>> > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
Ilya Maximets <i.maximets@ovn.org> writes: > On 5/4/23 19:21, Paolo Valerio wrote: >> Ilya Maximets <i.maximets@ovn.org> writes: >> >>> On 4/19/23 20:40, Paolo Valerio wrote: >>>> During the creation of a new connection, there's a chance both key and >>>> rev_key end up having the same hash. This is more common in the case >>>> of all-zero snat with no collisions. In that case, once the >>>> connection is expired, but not cleaned up, if a new packet with the >>>> same 5-tuple is received, an assertion failure gets triggered in >>>> conn_update_state() because of a previous failure of retrieving a >>>> CT_CONN_TYPE_DEFAULT connection. >>>> >>>> Fix it by releasing the nat_conn during the connection creation in the >>>> case of same hash for both key and rev_key. >>> >>> This sounds a bit odd. Shouldn't we treat hash collision as a normal case? >>> >>> Looking at the code, I'm assuming that the issue comes from the following >>> part in process_one(): >>> >>> if (OVS_LIKELY(conn)) { >>> if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { >>> ... >>> conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); >>> >>> And here we get the same connection again, because the default one is already >>> expired. Is that correct? >>> >>> If so, maybe we should add an extra condition to conn_key_lookup() to >>> only look for DEFAULT connections instead, just for this case? Since >>> we really don't want to get the UN_NAT one here. >>> >> >> Hello Ilya, >> >> It's a fair point. >> I initially thought about the approach you're suggesting, but I had some >> concerns about it that I'll try to summarize below. >> >> For sure it would fix the issue (it could require the first patch to be >> applied as well for the branches with rcu exp lists). >> >> Based on the current logic, new packets matching that expired connection >> but not evicted will be marked as +inv and further packets will be >> marked so for the whole sweep interval unless an exception like this get >> added: >> >> uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); >> /* the last flag indicates CT_CONN_TYPE_DEFAULT only */ >> conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true); >> /* special case where there's hash collision */ >> if (!conn && ctx->hash != hash) { >> pkt->md.ct_state |= CS_INVALID; >> write_ct_md(pkt, zone, NULL, NULL, NULL); >> ... >> return; >> } >> >> This would further require that subsequent lookup in the create_new_conn >> path are restricted to CT_CONN_TYPE_DEFAULT, e.g.: >> >> uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis); >> /* Only check for CT_CONN_TYPE_DEFAULT */ >> if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) { >> conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, >> helper, alg_exp, ct_alg_ctl, tp_id); >> } >> >> otherwise we could incur in a false positive which prevent to create a >> new connection. > > I'm not really sure if what described above is more correct way of doing > things or not... Aaron, do you have opinion on this? > > Another thought: Can we expire the CT_CONN_TYPE_UN_NAT connection the > moment DEFAULT counterpart of it expires? Or that will that be against > some logic / not possible to do? > As far as I can tell, this could not be straightforward as simply marking it as expired should not be reliable (e.g. doing it from the sweeper), and I guess that managing the expiration time field for the nat_conn as well would require updating the nat_conn every time the default one gets updated, probably making it a bit unpractical. Another approach would be removing the nat_conn [1] altogether. The problem in this case is backporting. Some adjustments that would add to the patch might be needed for older branches. [1] https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0320@bytedance.com/ > >> >>> Best regards, Ilya Maximets. >>> >>>> >>>> Reported-by: Michael Plato <michael.plato@tu-berlin.de> >>>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") >>>> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> >>>> --- >>>> In this thread [0] there are some more details. A similar >>>> approach here could be to avoid to add the nat_conn to the cmap and >>>> letting the sweeper release the memory for nat_conn once the whole >>>> connection gets freed. >>>> That approach could still be ok, but the drawback is that it could >>>> require a different patch for older branches that don't include >>>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with >>>> rculists."). It still worth to be considered. >>>> >>>> [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html >>>> --- >>>> lib/conntrack.c | 21 +++++++++++++-------- >>>> 1 file changed, 13 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/lib/conntrack.c b/lib/conntrack.c >>>> index 7e1fc4b1f..d2ee127d9 100644 >>>> --- a/lib/conntrack.c >>>> +++ b/lib/conntrack.c >>>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, >>>> } >>>> >>>> nat_packet(pkt, nc, false, ctx->icmp_related); >>>> - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >>>> - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >>>> - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >>>> - nat_conn->nat_action = 0; >>>> - nat_conn->alg = NULL; >>>> - nat_conn->nat_conn = NULL; >>>> - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); >>>> - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >>>> + uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis); >>>> + if (nat_hash != ctx->hash) { >>>> + memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >>>> + memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >>>> + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >>>> + nat_conn->nat_action = 0; >>>> + nat_conn->alg = NULL; >>>> + nat_conn->nat_conn = NULL; >>>> + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >>>> + } else { >>>> + free(nat_conn); >>>> + nat_conn = NULL; >>>> + } >>>> } >>>> >>>> nc->nat_conn = nat_conn; >>>> >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>
Paolo Valerio <pvalerio@redhat.com> writes: > Ilya Maximets <i.maximets@ovn.org> writes: > >> On 5/4/23 19:21, Paolo Valerio wrote: >>> Ilya Maximets <i.maximets@ovn.org> writes: >>> >>>> On 4/19/23 20:40, Paolo Valerio wrote: >>>>> During the creation of a new connection, there's a chance both key and >>>>> rev_key end up having the same hash. This is more common in the case >>>>> of all-zero snat with no collisions. In that case, once the >>>>> connection is expired, but not cleaned up, if a new packet with the >>>>> same 5-tuple is received, an assertion failure gets triggered in >>>>> conn_update_state() because of a previous failure of retrieving a >>>>> CT_CONN_TYPE_DEFAULT connection. >>>>> >>>>> Fix it by releasing the nat_conn during the connection creation in the >>>>> case of same hash for both key and rev_key. >>>> >>>> This sounds a bit odd. Shouldn't we treat hash collision as a normal case? >>>> >>>> Looking at the code, I'm assuming that the issue comes from the following >>>> part in process_one(): >>>> >>>> if (OVS_LIKELY(conn)) { >>>> if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { >>>> ... >>>> conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); >>>> >>>> And here we get the same connection again, because the default one is already >>>> expired. Is that correct? >>>> >>>> If so, maybe we should add an extra condition to conn_key_lookup() to >>>> only look for DEFAULT connections instead, just for this case? Since >>>> we really don't want to get the UN_NAT one here. >>>> >>> >>> Hello Ilya, >>> >>> It's a fair point. >>> I initially thought about the approach you're suggesting, but I had some >>> concerns about it that I'll try to summarize below. >>> >>> For sure it would fix the issue (it could require the first patch to be >>> applied as well for the branches with rcu exp lists). >>> >>> Based on the current logic, new packets matching that expired connection >>> but not evicted will be marked as +inv and further packets will be >>> marked so for the whole sweep interval unless an exception like this get >>> added: >>> >>> uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); >>> /* the last flag indicates CT_CONN_TYPE_DEFAULT only */ >>> conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true); >>> /* special case where there's hash collision */ >>> if (!conn && ctx->hash != hash) { >>> pkt->md.ct_state |= CS_INVALID; >>> write_ct_md(pkt, zone, NULL, NULL, NULL); >>> ... >>> return; >>> } >>> >>> This would further require that subsequent lookup in the create_new_conn >>> path are restricted to CT_CONN_TYPE_DEFAULT, e.g.: >>> >>> uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis); >>> /* Only check for CT_CONN_TYPE_DEFAULT */ >>> if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) { >>> conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, >>> helper, alg_exp, ct_alg_ctl, tp_id); >>> } >>> >>> otherwise we could incur in a false positive which prevent to create a >>> new connection. >> >> I'm not really sure if what described above is more correct way of doing >> things or not... Aaron, do you have opinion on this? >> >> Another thought: Can we expire the CT_CONN_TYPE_UN_NAT connection the >> moment DEFAULT counterpart of it expires? Or that will that be against >> some logic / not possible to do? >> > > As far as I can tell, this could not be straightforward as simply > marking it as expired should not be reliable (e.g. doing it from the > sweeper), and I guess that managing the expiration time field for the > nat_conn as well would require updating the nat_conn every time the > default one gets updated, probably making it a bit unpractical. > > Another approach would be removing the nat_conn [1] altogether. > The problem in this case is backporting. Some adjustments that would add > to the patch might be needed for older branches. > > [1] > https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0320@bytedance.com/ I think that work was interesting, and maybe the best way to go forward. Backports would become difficult, though - agreed. >> >>> >>>> Best regards, Ilya Maximets. >>>> >>>>> >>>>> Reported-by: Michael Plato <michael.plato@tu-berlin.de> >>>>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") >>>>> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> >>>>> --- >>>>> In this thread [0] there are some more details. A similar >>>>> approach here could be to avoid to add the nat_conn to the cmap and >>>>> letting the sweeper release the memory for nat_conn once the whole >>>>> connection gets freed. >>>>> That approach could still be ok, but the drawback is that it could >>>>> require a different patch for older branches that don't include >>>>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with >>>>> rculists."). It still worth to be considered. >>>>> >>>>> [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html >>>>> --- >>>>> lib/conntrack.c | 21 +++++++++++++-------- >>>>> 1 file changed, 13 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/lib/conntrack.c b/lib/conntrack.c >>>>> index 7e1fc4b1f..d2ee127d9 100644 >>>>> --- a/lib/conntrack.c >>>>> +++ b/lib/conntrack.c >>>>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, >>>>> } >>>>> >>>>> nat_packet(pkt, nc, false, ctx->icmp_related); >>>>> - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >>>>> - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >>>>> - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >>>>> - nat_conn->nat_action = 0; >>>>> - nat_conn->alg = NULL; >>>>> - nat_conn->nat_conn = NULL; >>>>> - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); >>>>> - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >>>>> + uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis); >>>>> + if (nat_hash != ctx->hash) { >>>>> + memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >>>>> + memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >>>>> + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >>>>> + nat_conn->nat_action = 0; >>>>> + nat_conn->alg = NULL; >>>>> + nat_conn->nat_conn = NULL; >>>>> + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >>>>> + } else { >>>>> + free(nat_conn); >>>>> + nat_conn = NULL; >>>>> + } >>>>> } >>>>> >>>>> nc->nat_conn = nat_conn; >>>>> >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>
Hi, Paolo, IIRC, you have a revision version on these patches? I guess it should be closer to the upstream than mine? Aaron Conole <aconole@redhat.com> 于2023年5月17日周三 21:54写道: > Paolo Valerio <pvalerio@redhat.com> writes: > > > Ilya Maximets <i.maximets@ovn.org> writes: > > > >> On 5/4/23 19:21, Paolo Valerio wrote: > >>> Ilya Maximets <i.maximets@ovn.org> writes: > >>> > >>>> On 4/19/23 20:40, Paolo Valerio wrote: > >>>>> During the creation of a new connection, there's a chance both key > and > >>>>> rev_key end up having the same hash. This is more common in the case > >>>>> of all-zero snat with no collisions. In that case, once the > >>>>> connection is expired, but not cleaned up, if a new packet with the > >>>>> same 5-tuple is received, an assertion failure gets triggered in > >>>>> conn_update_state() because of a previous failure of retrieving a > >>>>> CT_CONN_TYPE_DEFAULT connection. > >>>>> > >>>>> Fix it by releasing the nat_conn during the connection creation in > the > >>>>> case of same hash for both key and rev_key. > >>>> > >>>> This sounds a bit odd. Shouldn't we treat hash collision as a normal > case? > >>>> > >>>> Looking at the code, I'm assuming that the issue comes from the > following > >>>> part in process_one(): > >>>> > >>>> if (OVS_LIKELY(conn)) { > >>>> if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { > >>>> ... > >>>> conn_key_lookup(ct, &ctx->key, hash, now, &conn, > &ctx->reply); > >>>> > >>>> And here we get the same connection again, because the default one is > already > >>>> expired. Is that correct? > >>>> > >>>> If so, maybe we should add an extra condition to conn_key_lookup() to > >>>> only look for DEFAULT connections instead, just for this case? Since > >>>> we really don't want to get the UN_NAT one here. > >>>> > >>> > >>> Hello Ilya, > >>> > >>> It's a fair point. > >>> I initially thought about the approach you're suggesting, but I had > some > >>> concerns about it that I'll try to summarize below. > >>> > >>> For sure it would fix the issue (it could require the first patch to be > >>> applied as well for the branches with rcu exp lists). > >>> > >>> Based on the current logic, new packets matching that expired > connection > >>> but not evicted will be marked as +inv and further packets will be > >>> marked so for the whole sweep interval unless an exception like this > get > >>> added: > >>> > >>> uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); > >>> /* the last flag indicates CT_CONN_TYPE_DEFAULT only */ > >>> conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true); > >>> /* special case where there's hash collision */ > >>> if (!conn && ctx->hash != hash) { > >>> pkt->md.ct_state |= CS_INVALID; > >>> write_ct_md(pkt, zone, NULL, NULL, NULL); > >>> ... > >>> return; > >>> } > >>> > >>> This would further require that subsequent lookup in the > create_new_conn > >>> path are restricted to CT_CONN_TYPE_DEFAULT, e.g.: > >>> > >>> uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis); > >>> /* Only check for CT_CONN_TYPE_DEFAULT */ > >>> if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) { > >>> conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, > >>> helper, alg_exp, ct_alg_ctl, tp_id); > >>> } > >>> > >>> otherwise we could incur in a false positive which prevent to create a > >>> new connection. > >> > >> I'm not really sure if what described above is more correct way of doing > >> things or not... Aaron, do you have opinion on this? > >> > >> Another thought: Can we expire the CT_CONN_TYPE_UN_NAT connection the > >> moment DEFAULT counterpart of it expires? Or that will that be against > >> some logic / not possible to do? > >> > > > > As far as I can tell, this could not be straightforward as simply > > marking it as expired should not be reliable (e.g. doing it from the > > sweeper), and I guess that managing the expiration time field for the > > nat_conn as well would require updating the nat_conn every time the > > default one gets updated, probably making it a bit unpractical. > > > > Another approach would be removing the nat_conn [1] altogether. > > The problem in this case is backporting. Some adjustments that would add > > to the patch might be needed for older branches. > > > > [1] > > > https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0320@bytedance.com/ > > I think that work was interesting, and maybe the best way to go > forward. Backports would become difficult, though - agreed. > > >> > >>> > >>>> Best regards, Ilya Maximets. > >>>> > >>>>> > >>>>> Reported-by: Michael Plato <michael.plato@tu-berlin.de> > >>>>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP > address.") > >>>>> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> > >>>>> --- > >>>>> In this thread [0] there are some more details. A similar > >>>>> approach here could be to avoid to add the nat_conn to the cmap and > >>>>> letting the sweeper release the memory for nat_conn once the whole > >>>>> connection gets freed. > >>>>> That approach could still be ok, but the drawback is that it could > >>>>> require a different patch for older branches that don't include > >>>>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with > >>>>> rculists."). It still worth to be considered. > >>>>> > >>>>> [0] > https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html > >>>>> --- > >>>>> lib/conntrack.c | 21 +++++++++++++-------- > >>>>> 1 file changed, 13 insertions(+), 8 deletions(-) > >>>>> > >>>>> diff --git a/lib/conntrack.c b/lib/conntrack.c > >>>>> index 7e1fc4b1f..d2ee127d9 100644 > >>>>> --- a/lib/conntrack.c > >>>>> +++ b/lib/conntrack.c > >>>>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct > dp_packet *pkt, > >>>>> } > >>>>> > >>>>> nat_packet(pkt, nc, false, ctx->icmp_related); > >>>>> - memcpy(&nat_conn->key, &nc->rev_key, sizeof > nat_conn->key); > >>>>> - memcpy(&nat_conn->rev_key, &nc->key, sizeof > nat_conn->rev_key); > >>>>> - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > >>>>> - nat_conn->nat_action = 0; > >>>>> - nat_conn->alg = NULL; > >>>>> - nat_conn->nat_conn = NULL; > >>>>> - uint32_t nat_hash = conn_key_hash(&nat_conn->key, > ct->hash_basis); > >>>>> - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); > >>>>> + uint32_t nat_hash = conn_key_hash(&nc->rev_key, > ct->hash_basis); > >>>>> + if (nat_hash != ctx->hash) { > >>>>> + memcpy(&nat_conn->key, &nc->rev_key, sizeof > nat_conn->key); > >>>>> + memcpy(&nat_conn->rev_key, &nc->key, sizeof > nat_conn->rev_key); > >>>>> + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > >>>>> + nat_conn->nat_action = 0; > >>>>> + nat_conn->alg = NULL; > >>>>> + nat_conn->nat_conn = NULL; > >>>>> + cmap_insert(&ct->conns, &nat_conn->cm_node, > nat_hash); > >>>>> + } else { > >>>>> + free(nat_conn); > >>>>> + nat_conn = NULL; > >>>>> + } > >>>>> } > >>>>> > >>>>> nc->nat_conn = nat_conn; > >>>>> > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> dev@openvswitch.org > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>> > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 5/18/23 09:37, Peng He wrote: > Hi, Paolo, > > IIRC, you have a revision version on these patches? > I guess it should be closer to the upstream than mine? I suppose, this one was the last revision of the patch: https://patchwork.ozlabs.org/project/openvswitch/patch/165668250354.1967719.13981409928749386554.stgit@fed.void/ It was part of the 'buckets' approach for multithread scalability and we didn't go with it. Best regards, Ilya Maximets. > > > Aaron Conole <aconole@redhat.com <mailto:aconole@redhat.com>> 于2023年5月17日周三 21:54写道: > > Paolo Valerio <pvalerio@redhat.com <mailto:pvalerio@redhat.com>> writes: > > > Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> writes: > > > >> On 5/4/23 19:21, Paolo Valerio wrote: > >>> Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> writes: > >>> > >>>> On 4/19/23 20:40, Paolo Valerio wrote: > >>>>> During the creation of a new connection, there's a chance both key and > >>>>> rev_key end up having the same hash. This is more common in the case > >>>>> of all-zero snat with no collisions. In that case, once the > >>>>> connection is expired, but not cleaned up, if a new packet with the > >>>>> same 5-tuple is received, an assertion failure gets triggered in > >>>>> conn_update_state() because of a previous failure of retrieving a > >>>>> CT_CONN_TYPE_DEFAULT connection. > >>>>> > >>>>> Fix it by releasing the nat_conn during the connection creation in the > >>>>> case of same hash for both key and rev_key. > >>>> > >>>> This sounds a bit odd. Shouldn't we treat hash collision as a normal case? > >>>> > >>>> Looking at the code, I'm assuming that the issue comes from the following > >>>> part in process_one(): > >>>> > >>>> if (OVS_LIKELY(conn)) { > >>>> if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { > >>>> ... > >>>> conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); > >>>> > >>>> And here we get the same connection again, because the default one is already > >>>> expired. Is that correct? > >>>> > >>>> If so, maybe we should add an extra condition to conn_key_lookup() to > >>>> only look for DEFAULT connections instead, just for this case? Since > >>>> we really don't want to get the UN_NAT one here. > >>>> > >>> > >>> Hello Ilya, > >>> > >>> It's a fair point. > >>> I initially thought about the approach you're suggesting, but I had some > >>> concerns about it that I'll try to summarize below. > >>> > >>> For sure it would fix the issue (it could require the first patch to be > >>> applied as well for the branches with rcu exp lists). > >>> > >>> Based on the current logic, new packets matching that expired connection > >>> but not evicted will be marked as +inv and further packets will be > >>> marked so for the whole sweep interval unless an exception like this get > >>> added: > >>> > >>> uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); > >>> /* the last flag indicates CT_CONN_TYPE_DEFAULT only */ > >>> conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true); > >>> /* special case where there's hash collision */ > >>> if (!conn && ctx->hash != hash) { > >>> pkt->md.ct_state |= CS_INVALID; > >>> write_ct_md(pkt, zone, NULL, NULL, NULL); > >>> ... > >>> return; > >>> } > >>> > >>> This would further require that subsequent lookup in the create_new_conn > >>> path are restricted to CT_CONN_TYPE_DEFAULT, e.g.: > >>> > >>> uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis); > >>> /* Only check for CT_CONN_TYPE_DEFAULT */ > >>> if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) { > >>> conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, > >>> helper, alg_exp, ct_alg_ctl, tp_id); > >>> } > >>> > >>> otherwise we could incur in a false positive which prevent to create a > >>> new connection. > >> > >> I'm not really sure if what described above is more correct way of doing > >> things or not... Aaron, do you have opinion on this? > >> > >> Another thought: Can we expire the CT_CONN_TYPE_UN_NAT connection the > >> moment DEFAULT counterpart of it expires? Or that will that be against > >> some logic / not possible to do? > >> > > > > As far as I can tell, this could not be straightforward as simply > > marking it as expired should not be reliable (e.g. doing it from the > > sweeper), and I guess that managing the expiration time field for the > > nat_conn as well would require updating the nat_conn every time the > > default one gets updated, probably making it a bit unpractical. > > > > Another approach would be removing the nat_conn [1] altogether. > > The problem in this case is backporting. Some adjustments that would add > > to the patch might be needed for older branches. > > > > [1] > > https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0320@bytedance.com/ <https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0320@bytedance.com/> > > I think that work was interesting, and maybe the best way to go > forward. Backports would become difficult, though - agreed. > > >> > >>> > >>>> Best regards, Ilya Maximets. > >>>> > >>>>> > >>>>> Reported-by: Michael Plato <michael.plato@tu-berlin.de <mailto:michael.plato@tu-berlin.de>> > >>>>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") > >>>>> Signed-off-by: Paolo Valerio <pvalerio@redhat.com <mailto:pvalerio@redhat.com>> > >>>>> --- > >>>>> In this thread [0] there are some more details. A similar > >>>>> approach here could be to avoid to add the nat_conn to the cmap and > >>>>> letting the sweeper release the memory for nat_conn once the whole > >>>>> connection gets freed. > >>>>> That approach could still be ok, but the drawback is that it could > >>>>> require a different patch for older branches that don't include > >>>>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with > >>>>> rculists."). It still worth to be considered. > >>>>> > >>>>> [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html <https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html> > >>>>> --- > >>>>> lib/conntrack.c | 21 +++++++++++++-------- > >>>>> 1 file changed, 13 insertions(+), 8 deletions(-) > >>>>> > >>>>> diff --git a/lib/conntrack.c b/lib/conntrack.c > >>>>> index 7e1fc4b1f..d2ee127d9 100644 > >>>>> --- a/lib/conntrack.c > >>>>> +++ b/lib/conntrack.c > >>>>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, > >>>>> } > >>>>> > >>>>> nat_packet(pkt, nc, false, ctx->icmp_related); > >>>>> - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); > >>>>> - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); > >>>>> - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > >>>>> - nat_conn->nat_action = 0; > >>>>> - nat_conn->alg = NULL; > >>>>> - nat_conn->nat_conn = NULL; > >>>>> - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); > >>>>> - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); > >>>>> + uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis); > >>>>> + if (nat_hash != ctx->hash) { > >>>>> + memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); > >>>>> + memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); > >>>>> + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > >>>>> + nat_conn->nat_action = 0; > >>>>> + nat_conn->alg = NULL; > >>>>> + nat_conn->nat_conn = NULL; > >>>>> + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); > >>>>> + } else { > >>>>> + free(nat_conn); > >>>>> + nat_conn = NULL; > >>>>> + } > >>>>> } > >>>>> > >>>>> nc->nat_conn = nat_conn; > >>>>> > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> dev@openvswitch.org <mailto:dev@openvswitch.org> > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> > >>> > > _______________________________________________ > dev mailing list > dev@openvswitch.org <mailto:dev@openvswitch.org> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> > > > > -- > hepeng
Hi Paolo, On 4/19/23 2:40 PM, Paolo Valerio wrote: > During the creation of a new connection, there's a chance both key and > rev_key end up having the same hash. This is more common in the case > of all-zero snat with no collisions. In that case, once the > connection is expired, but not cleaned up, if a new packet with the > same 5-tuple is received, an assertion failure gets triggered in > conn_update_state() because of a previous failure of retrieving a > CT_CONN_TYPE_DEFAULT connection. > > Fix it by releasing the nat_conn during the connection creation in the > case of same hash for both key and rev_key. Sorry for reviving a two month-old thread, but we recently started seeing this issue which seemed to also be related to [0], but I can't find it in patchworks or the tree. Was there a plan to update it? Thanks, -Brian [0] https://www.mail-archive.com/ovs-discuss@openvswitch.org/msg08945.html > > Reported-by: Michael Plato <michael.plato@tu-berlin.de> > Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") > Signed-off-by: Paolo Valerio <pvalerio@redhat.com> > --- > In this thread [0] there are some more details. A similar > approach here could be to avoid to add the nat_conn to the cmap and > letting the sweeper release the memory for nat_conn once the whole > connection gets freed. > That approach could still be ok, but the drawback is that it could > require a different patch for older branches that don't include > 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with > rculists."). It still worth to be considered. > > [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html > --- > lib/conntrack.c | 21 +++++++++++++-------- > 1 file changed, 13 insertions(+), 8 deletions(-) > > diff --git a/lib/conntrack.c b/lib/conntrack.c > index 7e1fc4b1f..d2ee127d9 100644 > --- a/lib/conntrack.c > +++ b/lib/conntrack.c > @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, > } > > nat_packet(pkt, nc, false, ctx->icmp_related); > - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); > - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); > - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > - nat_conn->nat_action = 0; > - nat_conn->alg = NULL; > - nat_conn->nat_conn = NULL; > - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); > - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); > + uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis); > + if (nat_hash != ctx->hash) { > + memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); > + memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); > + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; > + nat_conn->nat_action = 0; > + nat_conn->alg = NULL; > + nat_conn->nat_conn = NULL; > + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); > + } else { > + free(nat_conn); > + nat_conn = NULL; > + } > } > > nc->nat_conn = nat_conn; > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Brian Haley <haleyb.dev@gmail.com> writes: > Hi Paolo, > > On 4/19/23 2:40 PM, Paolo Valerio wrote: >> During the creation of a new connection, there's a chance both key and >> rev_key end up having the same hash. This is more common in the case >> of all-zero snat with no collisions. In that case, once the >> connection is expired, but not cleaned up, if a new packet with the >> same 5-tuple is received, an assertion failure gets triggered in >> conn_update_state() because of a previous failure of retrieving a >> CT_CONN_TYPE_DEFAULT connection. >> >> Fix it by releasing the nat_conn during the connection creation in the >> case of same hash for both key and rev_key. > > Sorry for reviving a two month-old thread, but we recently started > seeing this issue which seemed to also be related to [0], but I can't > find it in patchworks or the tree. Was there a plan to update it? > Hi Brian, It transitioned to "Changes Requested" [0]. At the moment the idea is to upstream a patch initially proposed by Peng. I'm pretty busy at the moment, and I can't look at it right away, but yes, the plan is to update it. [0] https://patchwork.ozlabs.org/project/openvswitch/list/?series=351579&state=* > Thanks, > > -Brian > > [0] https://www.mail-archive.com/ovs-discuss@openvswitch.org/msg08945.html > >> >> Reported-by: Michael Plato <michael.plato@tu-berlin.de> >> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") >> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> >> --- >> In this thread [0] there are some more details. A similar >> approach here could be to avoid to add the nat_conn to the cmap and >> letting the sweeper release the memory for nat_conn once the whole >> connection gets freed. >> That approach could still be ok, but the drawback is that it could >> require a different patch for older branches that don't include >> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with >> rculists."). It still worth to be considered. >> >> [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html >> --- >> lib/conntrack.c | 21 +++++++++++++-------- >> 1 file changed, 13 insertions(+), 8 deletions(-) >> >> diff --git a/lib/conntrack.c b/lib/conntrack.c >> index 7e1fc4b1f..d2ee127d9 100644 >> --- a/lib/conntrack.c >> +++ b/lib/conntrack.c >> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, >> } >> >> nat_packet(pkt, nc, false, ctx->icmp_related); >> - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >> - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >> - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >> - nat_conn->nat_action = 0; >> - nat_conn->alg = NULL; >> - nat_conn->nat_conn = NULL; >> - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); >> - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >> + uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis); >> + if (nat_hash != ctx->hash) { >> + memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >> + memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); >> + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >> + nat_conn->nat_action = 0; >> + nat_conn->alg = NULL; >> + nat_conn->nat_conn = NULL; >> + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >> + } else { >> + free(nat_conn); >> + nat_conn = NULL; >> + } >> } >> >> nc->nat_conn = nat_conn; >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 6/8/23 3:52 PM, Paolo Valerio wrote: > Brian Haley <haleyb.dev@gmail.com> writes: > >> Hi Paolo, >> >> On 4/19/23 2:40 PM, Paolo Valerio wrote: >>> During the creation of a new connection, there's a chance both key and >>> rev_key end up having the same hash. This is more common in the case >>> of all-zero snat with no collisions. In that case, once the >>> connection is expired, but not cleaned up, if a new packet with the >>> same 5-tuple is received, an assertion failure gets triggered in >>> conn_update_state() because of a previous failure of retrieving a >>> CT_CONN_TYPE_DEFAULT connection. >>> >>> Fix it by releasing the nat_conn during the connection creation in the >>> case of same hash for both key and rev_key. >> >> Sorry for reviving a two month-old thread, but we recently started >> seeing this issue which seemed to also be related to [0], but I can't >> find it in patchworks or the tree. Was there a plan to update it? >> > > Hi Brian, > > It transitioned to "Changes Requested" [0]. > > At the moment the idea is to upstream a patch initially proposed by > Peng. I'm pretty busy at the moment, and I can't look at it right away, > but yes, the plan is to update it. > > [0] https://patchwork.ozlabs.org/project/openvswitch/list/?series=351579&state=* Ok, let me know if you need any help with it, might be able to put it in our environment that's showing the problem. -Brian
diff --git a/lib/conntrack.c b/lib/conntrack.c index 7e1fc4b1f..d2ee127d9 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } nat_packet(pkt, nc, false, ctx->icmp_related); - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; - nat_conn->nat_action = 0; - nat_conn->alg = NULL; - nat_conn->nat_conn = NULL; - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); + uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis); + if (nat_hash != ctx->hash) { + memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); + memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; + nat_conn->nat_action = 0; + nat_conn->alg = NULL; + nat_conn->nat_conn = NULL; + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); + } else { + free(nat_conn); + nat_conn = NULL; + } } nc->nat_conn = nat_conn;
During the creation of a new connection, there's a chance both key and rev_key end up having the same hash. This is more common in the case of all-zero snat with no collisions. In that case, once the connection is expired, but not cleaned up, if a new packet with the same 5-tuple is received, an assertion failure gets triggered in conn_update_state() because of a previous failure of retrieving a CT_CONN_TYPE_DEFAULT connection. Fix it by releasing the nat_conn during the connection creation in the case of same hash for both key and rev_key. Reported-by: Michael Plato <michael.plato@tu-berlin.de> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") Signed-off-by: Paolo Valerio <pvalerio@redhat.com> --- In this thread [0] there are some more details. A similar approach here could be to avoid to add the nat_conn to the cmap and letting the sweeper release the memory for nat_conn once the whole connection gets freed. That approach could still be ok, but the drawback is that it could require a different patch for older branches that don't include 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with rculists."). It still worth to be considered. [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html --- lib/conntrack.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)