Message ID | 20240208214939.12799-1-numans@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | northd memory and CPU increase fix due to lflow-mgr. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On Thu, Feb 8, 2024 at 1:50 PM <numans@ovn.org> wrote: > > From: Ilya Maximets <i.maximets@ovn.org> > > Currently, whenever a new logical flow is created, northd allocates > a reference counter for every datapath in the datapath group for that > logical flow. Those reference counters are necessary in order to not > remove the datapath from a logical flow during incremental processing > if it was created from two different objects for the same datapath and > now one of them is removed. > > However, that creates a serious scalability problem. In a large scale > setups with tens of thousands of logical flows applied to large > datapath groups we may have hundreds of millions of these reference > counters allocated, which is many GB of RSS just for that purpose. > > For example, in ovn-heater's cluster-density 500 node test, ovn-northd > started to consume up to 8.5 GB or RAM. In the same test before the > reference counting, ovn-northd consumed only 2.5 GB. All those memory > allocation also increased CPU usage. Re-compute time went up from > 1.5 seconds to 6 seconds in the same test. In the end we have about > 4x increase on both CPU and memory usage. > > Running under valgrind --tool=massif shows: > > ------------------------------------------------------------- > total(B) useful-heap(B) extra-heap(B) stacks(B) > ------------------------------------------------------------- > 9,416,438,200 7,401,971,556 2,014,466,644 0 > > 78.61% (7,401 MB) (heap allocation functions) malloc > ->45.78% (4,311 MB) xcalloc__ (util.c:124) > | ->45.68% (4,301 MB) xzalloc__ (util.c:134) > | | ->45.68% (4,301 MB) xzalloc (util.c:168) > | | ->40.97% (3,857 MB) dp_refcnt_use (lflow-mgr.c:1348) > | | | ->40.89% (3,850 MB) lflow_table_add_lflow (lflow-mgr.c:696) > | | | | ->12.27% (1,155 MB) build_lb_rules_pre_stateful (northd.c:7180) > | | | | ->12.27% (1,155 MB) build_lb_rules (northd.c:7658) > | | | | ->12.27% (1,155,MB) build_gw_lrouter_nat_flows_for_lb (northd.c:11054) > ->28.62% (2,694 MB) xmalloc__ (util.c:140) > | ->28.62% (2,694 MB) xmalloc (util.c:175) > | ->06.71% (631 MB) resize (hmap.c:100) > | | ->06.01% (565 MB) hmap_expand_at (hmap.c:175) > | | | ->05.24% (492 MB) hmap_insert_at (hmap.h:309) > | | | | ->05.24% (492 MB) dp_refcnt_use (lflow-mgr.c:1351) > > 45% of all the memory is allocated for reference counters themselves > and another 7% is taken by hash maps to hold them. Also, there is > more than 20% of a total memory allocation overhead (extra-heap) since > all these allocated objects are very small (32B). > > This test allocates 120 M reference counters total. > > However, the vast majority of all the reference counters always has > a value of 1, i.e. these datapaths are not used more than once. > > Defer allocation of reference counters until the datapath is used > for the same logical flow for the second time. We can do that by > checking the current datapath group bitmap. > > With this change, the amount of allocated reference counters goes > down from 120 M to just 12 M. Memory consumption reduced from > 8.5 GB to 2.67 GB and the northd recompute time reduced from 6 to 2.1 > seconds. > > It is still a little higher than resource usage before introduction of > incremental processing for logical flows, but it is fairly manageable. > Also, the resource usage and memory consumption may be further improved > by reducing the number of cases where northd attempts to create the > logical flows for the same datapaths multiple times. > > Note: the cluster-density test in ovn-heater creates new port groups > on every iteration and ovn-northd doesn't handle this incrementally, > so it always re-computes. That's why there is no benefit from > northd I-P for CPU usage in this scenario. > > Fixes: a623606052ea ("northd: Refactor lflow management into a separate module.") > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- > northd/lflow-mgr.c | 39 ++++++++++++++++++++------------------- > 1 file changed, 20 insertions(+), 19 deletions(-) > > diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c > index df62cd6ab4..f70896cce7 100644 > --- a/northd/lflow-mgr.c > +++ b/northd/lflow-mgr.c > @@ -47,8 +47,7 @@ static void ovn_lflow_destroy(struct lflow_table *lflow_table, > static char *ovn_lflow_hint(const struct ovsdb_idl_row *row); > > static struct ovn_lflow *do_ovn_lflow_add( > - struct lflow_table *, const struct ovn_datapath *, > - const unsigned long *dp_bitmap, size_t dp_bitmap_len, uint32_t hash, > + struct lflow_table *, size_t dp_bitmap_len, uint32_t hash, > enum ovn_stage stage, uint16_t priority, const char *match, > const char *actions, const char *io_port, > const char *ctrl_meter, > @@ -674,9 +673,9 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, > > hash_lock = lflow_hash_lock(&lflow_table->entries, hash); > struct ovn_lflow *lflow = > - do_ovn_lflow_add(lflow_table, od, dp_bitmap, > - dp_bitmap_len, hash, stage, > - priority, match, actions, > + do_ovn_lflow_add(lflow_table, > + od ? ods_size(od->datapaths) : dp_bitmap_len, > + hash, stage, priority, match, actions, > io_port, ctrl_meter, stage_hint, where); > > if (lflow_ref) { > @@ -702,17 +701,24 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, > ovs_assert(lrn->dpgrp_bitmap_len == dp_bitmap_len); > size_t index; > BITMAP_FOR_EACH_1 (index, dp_bitmap_len, dp_bitmap) { > - dp_refcnt_use(&lflow->dp_refcnts_map, index); > + /* Allocate a reference counter only if already used. */ > + if (bitmap_is_set(lflow->dpg_bitmap, index)) { > + dp_refcnt_use(&lflow->dp_refcnts_map, index); > + } > } > } else { > - dp_refcnt_use(&lflow->dp_refcnts_map, lrn->dp_index); > + /* Allocate a reference counter only if already used. */ > + if (bitmap_is_set(lflow->dpg_bitmap, lrn->dp_index)) { > + dp_refcnt_use(&lflow->dp_refcnts_map, lrn->dp_index); > + } > } > } > lrn->linked = true; > } > > - lflow_hash_unlock(hash_lock); > + ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, dp_bitmap_len); > > + lflow_hash_unlock(hash_lock); > } > > void > @@ -946,9 +952,7 @@ ovn_lflow_destroy(struct lflow_table *lflow_table, struct ovn_lflow *lflow) > } > > static struct ovn_lflow * > -do_ovn_lflow_add(struct lflow_table *lflow_table, > - const struct ovn_datapath *od, > - const unsigned long *dp_bitmap, size_t dp_bitmap_len, > +do_ovn_lflow_add(struct lflow_table *lflow_table, size_t dp_bitmap_len, > uint32_t hash, enum ovn_stage stage, uint16_t priority, > const char *match, const char *actions, > const char *io_port, const char *ctrl_meter, > @@ -959,14 +963,11 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, > struct ovn_lflow *old_lflow; > struct ovn_lflow *lflow; > > - size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len; > - ovs_assert(bitmap_len); > + ovs_assert(dp_bitmap_len); > > old_lflow = ovn_lflow_find(&lflow_table->entries, stage, > priority, match, actions, ctrl_meter, hash); > if (old_lflow) { > - ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap, > - bitmap_len); > return old_lflow; > } > > @@ -974,14 +975,12 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, > /* While adding new logical flows we're not setting single datapath, but > * collecting a group. 'od' will be updated later for all flows with only > * one datapath in a group, so it could be hashed correctly. */ > - ovn_lflow_init(lflow, NULL, bitmap_len, stage, priority, > + ovn_lflow_init(lflow, NULL, dp_bitmap_len, stage, priority, > xstrdup(match), xstrdup(actions), > io_port ? xstrdup(io_port) : NULL, > nullable_xstrdup(ctrl_meter), > ovn_lflow_hint(stage_hint), where); > > - ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, bitmap_len); > - > if (parallelization_state != STATE_USE_PARALLELIZATION) { > hmap_insert(&lflow_table->entries, &lflow->hmap_node, hash); > } else { > @@ -1350,8 +1349,10 @@ dp_refcnt_use(struct hmap *dp_refcnts_map, size_t dp_index) > struct dp_refcnt *dp_refcnt = dp_refcnt_find(dp_refcnts_map, dp_index); > > if (!dp_refcnt) { > - dp_refcnt = xzalloc(sizeof *dp_refcnt); > + dp_refcnt = xmalloc(sizeof *dp_refcnt); > dp_refcnt->dp_index = dp_index; > + /* Allocation is happening on the second (!) use. */ > + dp_refcnt->refcnt = 1; > > hmap_insert(dp_refcnts_map, &dp_refcnt->key_node, dp_index); > } > -- > 2.43.0 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Thanks Ilya and Numan. Acked-by: Han Zhou <hzhou@ovn.org>
On 2/13/24 08:19, Han Zhou wrote: > On Thu, Feb 8, 2024 at 1:50 PM <numans@ovn.org> wrote: >> >> From: Ilya Maximets <i.maximets@ovn.org> >> >> Currently, whenever a new logical flow is created, northd allocates >> a reference counter for every datapath in the datapath group for that >> logical flow. Those reference counters are necessary in order to not >> remove the datapath from a logical flow during incremental processing >> if it was created from two different objects for the same datapath and >> now one of them is removed. >> >> However, that creates a serious scalability problem. In a large scale >> setups with tens of thousands of logical flows applied to large >> datapath groups we may have hundreds of millions of these reference >> counters allocated, which is many GB of RSS just for that purpose. >> >> For example, in ovn-heater's cluster-density 500 node test, ovn-northd >> started to consume up to 8.5 GB or RAM. In the same test before the >> reference counting, ovn-northd consumed only 2.5 GB. All those memory >> allocation also increased CPU usage. Re-compute time went up from >> 1.5 seconds to 6 seconds in the same test. In the end we have about >> 4x increase on both CPU and memory usage. >> >> Running under valgrind --tool=massif shows: >> >> ------------------------------------------------------------- >> total(B) useful-heap(B) extra-heap(B) stacks(B) >> ------------------------------------------------------------- >> 9,416,438,200 7,401,971,556 2,014,466,644 0 >> >> 78.61% (7,401 MB) (heap allocation functions) malloc >> ->45.78% (4,311 MB) xcalloc__ (util.c:124) >> | ->45.68% (4,301 MB) xzalloc__ (util.c:134) >> | | ->45.68% (4,301 MB) xzalloc (util.c:168) >> | | ->40.97% (3,857 MB) dp_refcnt_use (lflow-mgr.c:1348) >> | | | ->40.89% (3,850 MB) lflow_table_add_lflow (lflow-mgr.c:696) >> | | | | ->12.27% (1,155 MB) build_lb_rules_pre_stateful > (northd.c:7180) >> | | | | ->12.27% (1,155 MB) build_lb_rules (northd.c:7658) >> | | | | ->12.27% (1,155,MB) build_gw_lrouter_nat_flows_for_lb > (northd.c:11054) >> ->28.62% (2,694 MB) xmalloc__ (util.c:140) >> | ->28.62% (2,694 MB) xmalloc (util.c:175) >> | ->06.71% (631 MB) resize (hmap.c:100) >> | | ->06.01% (565 MB) hmap_expand_at (hmap.c:175) >> | | | ->05.24% (492 MB) hmap_insert_at (hmap.h:309) >> | | | | ->05.24% (492 MB) dp_refcnt_use (lflow-mgr.c:1351) >> >> 45% of all the memory is allocated for reference counters themselves >> and another 7% is taken by hash maps to hold them. Also, there is >> more than 20% of a total memory allocation overhead (extra-heap) since >> all these allocated objects are very small (32B). >> >> This test allocates 120 M reference counters total. >> >> However, the vast majority of all the reference counters always has >> a value of 1, i.e. these datapaths are not used more than once. >> >> Defer allocation of reference counters until the datapath is used >> for the same logical flow for the second time. We can do that by >> checking the current datapath group bitmap. >> >> With this change, the amount of allocated reference counters goes >> down from 120 M to just 12 M. Memory consumption reduced from >> 8.5 GB to 2.67 GB and the northd recompute time reduced from 6 to 2.1 >> seconds. >> >> It is still a little higher than resource usage before introduction of >> incremental processing for logical flows, but it is fairly manageable. >> Also, the resource usage and memory consumption may be further improved >> by reducing the number of cases where northd attempts to create the >> logical flows for the same datapaths multiple times. >> >> Note: the cluster-density test in ovn-heater creates new port groups >> on every iteration and ovn-northd doesn't handle this incrementally, >> so it always re-computes. That's why there is no benefit from >> northd I-P for CPU usage in this scenario. >> >> Fixes: a623606052ea ("northd: Refactor lflow management into a separate > module.") >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> --- >> northd/lflow-mgr.c | 39 ++++++++++++++++++++------------------- >> 1 file changed, 20 insertions(+), 19 deletions(-) >> >> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c >> index df62cd6ab4..f70896cce7 100644 >> --- a/northd/lflow-mgr.c >> +++ b/northd/lflow-mgr.c >> @@ -47,8 +47,7 @@ static void ovn_lflow_destroy(struct lflow_table > *lflow_table, >> static char *ovn_lflow_hint(const struct ovsdb_idl_row *row); >> >> static struct ovn_lflow *do_ovn_lflow_add( >> - struct lflow_table *, const struct ovn_datapath *, >> - const unsigned long *dp_bitmap, size_t dp_bitmap_len, uint32_t hash, >> + struct lflow_table *, size_t dp_bitmap_len, uint32_t hash, >> enum ovn_stage stage, uint16_t priority, const char *match, >> const char *actions, const char *io_port, >> const char *ctrl_meter, >> @@ -674,9 +673,9 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, >> >> hash_lock = lflow_hash_lock(&lflow_table->entries, hash); >> struct ovn_lflow *lflow = >> - do_ovn_lflow_add(lflow_table, od, dp_bitmap, >> - dp_bitmap_len, hash, stage, >> - priority, match, actions, >> + do_ovn_lflow_add(lflow_table, >> + od ? ods_size(od->datapaths) : dp_bitmap_len, >> + hash, stage, priority, match, actions, >> io_port, ctrl_meter, stage_hint, where); >> >> if (lflow_ref) { >> @@ -702,17 +701,24 @@ lflow_table_add_lflow(struct lflow_table > *lflow_table, >> ovs_assert(lrn->dpgrp_bitmap_len == dp_bitmap_len); >> size_t index; >> BITMAP_FOR_EACH_1 (index, dp_bitmap_len, dp_bitmap) { >> - dp_refcnt_use(&lflow->dp_refcnts_map, index); >> + /* Allocate a reference counter only if already > used. */ >> + if (bitmap_is_set(lflow->dpg_bitmap, index)) { >> + dp_refcnt_use(&lflow->dp_refcnts_map, index); >> + } >> } >> } else { >> - dp_refcnt_use(&lflow->dp_refcnts_map, lrn->dp_index); >> + /* Allocate a reference counter only if already used. */ >> + if (bitmap_is_set(lflow->dpg_bitmap, lrn->dp_index)) { >> + dp_refcnt_use(&lflow->dp_refcnts_map, lrn->dp_index); >> + } >> } >> } >> lrn->linked = true; >> } >> >> - lflow_hash_unlock(hash_lock); >> + ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, dp_bitmap_len); >> >> + lflow_hash_unlock(hash_lock); >> } >> >> void >> @@ -946,9 +952,7 @@ ovn_lflow_destroy(struct lflow_table *lflow_table, > struct ovn_lflow *lflow) >> } >> >> static struct ovn_lflow * >> -do_ovn_lflow_add(struct lflow_table *lflow_table, >> - const struct ovn_datapath *od, >> - const unsigned long *dp_bitmap, size_t dp_bitmap_len, >> +do_ovn_lflow_add(struct lflow_table *lflow_table, size_t dp_bitmap_len, >> uint32_t hash, enum ovn_stage stage, uint16_t priority, >> const char *match, const char *actions, >> const char *io_port, const char *ctrl_meter, >> @@ -959,14 +963,11 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, >> struct ovn_lflow *old_lflow; >> struct ovn_lflow *lflow; >> >> - size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len; >> - ovs_assert(bitmap_len); >> + ovs_assert(dp_bitmap_len); >> >> old_lflow = ovn_lflow_find(&lflow_table->entries, stage, >> priority, match, actions, ctrl_meter, > hash); >> if (old_lflow) { >> - ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap, >> - bitmap_len); >> return old_lflow; >> } >> >> @@ -974,14 +975,12 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, >> /* While adding new logical flows we're not setting single datapath, > but >> * collecting a group. 'od' will be updated later for all flows > with only >> * one datapath in a group, so it could be hashed correctly. */ >> - ovn_lflow_init(lflow, NULL, bitmap_len, stage, priority, >> + ovn_lflow_init(lflow, NULL, dp_bitmap_len, stage, priority, >> xstrdup(match), xstrdup(actions), >> io_port ? xstrdup(io_port) : NULL, >> nullable_xstrdup(ctrl_meter), >> ovn_lflow_hint(stage_hint), where); >> >> - ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, bitmap_len); >> - >> if (parallelization_state != STATE_USE_PARALLELIZATION) { >> hmap_insert(&lflow_table->entries, &lflow->hmap_node, hash); >> } else { >> @@ -1350,8 +1349,10 @@ dp_refcnt_use(struct hmap *dp_refcnts_map, size_t > dp_index) >> struct dp_refcnt *dp_refcnt = dp_refcnt_find(dp_refcnts_map, > dp_index); >> >> if (!dp_refcnt) { >> - dp_refcnt = xzalloc(sizeof *dp_refcnt); >> + dp_refcnt = xmalloc(sizeof *dp_refcnt); >> dp_refcnt->dp_index = dp_index; >> + /* Allocation is happening on the second (!) use. */ >> + dp_refcnt->refcnt = 1; >> >> hmap_insert(dp_refcnts_map, &dp_refcnt->key_node, dp_index); >> } >> -- >> 2.43.0 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Thanks Ilya and Numan. > Acked-by: Han Zhou <hzhou@ovn.org> Looks good to me too: Acked-by: Dumitru Ceara <dceara@redhat.com> Thanks!
diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c index df62cd6ab4..f70896cce7 100644 --- a/northd/lflow-mgr.c +++ b/northd/lflow-mgr.c @@ -47,8 +47,7 @@ static void ovn_lflow_destroy(struct lflow_table *lflow_table, static char *ovn_lflow_hint(const struct ovsdb_idl_row *row); static struct ovn_lflow *do_ovn_lflow_add( - struct lflow_table *, const struct ovn_datapath *, - const unsigned long *dp_bitmap, size_t dp_bitmap_len, uint32_t hash, + struct lflow_table *, size_t dp_bitmap_len, uint32_t hash, enum ovn_stage stage, uint16_t priority, const char *match, const char *actions, const char *io_port, const char *ctrl_meter, @@ -674,9 +673,9 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, hash_lock = lflow_hash_lock(&lflow_table->entries, hash); struct ovn_lflow *lflow = - do_ovn_lflow_add(lflow_table, od, dp_bitmap, - dp_bitmap_len, hash, stage, - priority, match, actions, + do_ovn_lflow_add(lflow_table, + od ? ods_size(od->datapaths) : dp_bitmap_len, + hash, stage, priority, match, actions, io_port, ctrl_meter, stage_hint, where); if (lflow_ref) { @@ -702,17 +701,24 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, ovs_assert(lrn->dpgrp_bitmap_len == dp_bitmap_len); size_t index; BITMAP_FOR_EACH_1 (index, dp_bitmap_len, dp_bitmap) { - dp_refcnt_use(&lflow->dp_refcnts_map, index); + /* Allocate a reference counter only if already used. */ + if (bitmap_is_set(lflow->dpg_bitmap, index)) { + dp_refcnt_use(&lflow->dp_refcnts_map, index); + } } } else { - dp_refcnt_use(&lflow->dp_refcnts_map, lrn->dp_index); + /* Allocate a reference counter only if already used. */ + if (bitmap_is_set(lflow->dpg_bitmap, lrn->dp_index)) { + dp_refcnt_use(&lflow->dp_refcnts_map, lrn->dp_index); + } } } lrn->linked = true; } - lflow_hash_unlock(hash_lock); + ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, dp_bitmap_len); + lflow_hash_unlock(hash_lock); } void @@ -946,9 +952,7 @@ ovn_lflow_destroy(struct lflow_table *lflow_table, struct ovn_lflow *lflow) } static struct ovn_lflow * -do_ovn_lflow_add(struct lflow_table *lflow_table, - const struct ovn_datapath *od, - const unsigned long *dp_bitmap, size_t dp_bitmap_len, +do_ovn_lflow_add(struct lflow_table *lflow_table, size_t dp_bitmap_len, uint32_t hash, enum ovn_stage stage, uint16_t priority, const char *match, const char *actions, const char *io_port, const char *ctrl_meter, @@ -959,14 +963,11 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, struct ovn_lflow *old_lflow; struct ovn_lflow *lflow; - size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len; - ovs_assert(bitmap_len); + ovs_assert(dp_bitmap_len); old_lflow = ovn_lflow_find(&lflow_table->entries, stage, priority, match, actions, ctrl_meter, hash); if (old_lflow) { - ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap, - bitmap_len); return old_lflow; } @@ -974,14 +975,12 @@ do_ovn_lflow_add(struct lflow_table *lflow_table, /* While adding new logical flows we're not setting single datapath, but * collecting a group. 'od' will be updated later for all flows with only * one datapath in a group, so it could be hashed correctly. */ - ovn_lflow_init(lflow, NULL, bitmap_len, stage, priority, + ovn_lflow_init(lflow, NULL, dp_bitmap_len, stage, priority, xstrdup(match), xstrdup(actions), io_port ? xstrdup(io_port) : NULL, nullable_xstrdup(ctrl_meter), ovn_lflow_hint(stage_hint), where); - ovn_dp_group_add_with_reference(lflow, od, dp_bitmap, bitmap_len); - if (parallelization_state != STATE_USE_PARALLELIZATION) { hmap_insert(&lflow_table->entries, &lflow->hmap_node, hash); } else { @@ -1350,8 +1349,10 @@ dp_refcnt_use(struct hmap *dp_refcnts_map, size_t dp_index) struct dp_refcnt *dp_refcnt = dp_refcnt_find(dp_refcnts_map, dp_index); if (!dp_refcnt) { - dp_refcnt = xzalloc(sizeof *dp_refcnt); + dp_refcnt = xmalloc(sizeof *dp_refcnt); dp_refcnt->dp_index = dp_index; + /* Allocation is happening on the second (!) use. */ + dp_refcnt->refcnt = 1; hmap_insert(dp_refcnts_map, &dp_refcnt->key_node, dp_index); }