Message ID | 20200508053629.210324-13-irogers@google.com |
---|---|
State | RFC |
Delegated to: | BPF Maintainers |
Headers | show |
Series | Share events between metrics | expand |
On Thu, May 07, 2020 at 10:36:27PM -0700, Ian Rogers wrote: > When adding event groups to the group list, insert them in size order. > This performs an insertion sort on the group list. By placing the > largest groups at the front of the group list it is possible to see if a > larger group contains the same events as a later group. This can make > the later group redundant - it can reuse the events from the large group. > A later patch will add this sharing. I'm not sure if size is that great an heuristic. The dedup algorithm should work in any case even if you don't order by size, right? I suppose in theory some kind of topological sort would be better. -Andi > > Signed-off-by: Ian Rogers <irogers@google.com> > --- > tools/perf/util/metricgroup.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > index 2a6456fa178b..69fbff47089f 100644 > --- a/tools/perf/util/metricgroup.c > +++ b/tools/perf/util/metricgroup.c > @@ -520,7 +520,21 @@ static int __metricgroup__add_metric(struct list_head *group_list, > return -EINVAL; > } > > - list_add_tail(&eg->nd, group_list); > + if (list_empty(group_list)) > + list_add(&eg->nd, group_list); > + else { > + struct list_head *pos; > + > + /* Place the largest groups at the front. */ > + list_for_each_prev(pos, group_list) { > + struct egroup *old = list_entry(pos, struct egroup, nd); > + > + if (hashmap__size(&eg->pctx.ids) <= > + hashmap__size(&old->pctx.ids)) > + break; > + } > + list_add(&eg->nd, pos); > + } > > return 0; > } > -- > 2.26.2.645.ge9eca65c58-goog >
On Fri, May 8, 2020 at 5:25 PM Andi Kleen <ak@linux.intel.com> wrote: > > On Thu, May 07, 2020 at 10:36:27PM -0700, Ian Rogers wrote: > > When adding event groups to the group list, insert them in size order. > > This performs an insertion sort on the group list. By placing the > > largest groups at the front of the group list it is possible to see if a > > larger group contains the same events as a later group. This can make > > the later group redundant - it can reuse the events from the large group. > > A later patch will add this sharing. > > I'm not sure if size is that great an heuristic. The dedup algorithm should > work in any case even if you don't order by size, right? Consider two metrics: - metric 1 with events {A,B} - metric 2 with events {A,B,C,D} If the list isn't sorted then as the matching takes the first group with all the events, metric 1 will match {A,B} and metric 2 {A,B,C,D}. If the order is sorted to {A,B,C,D},{A,B} then metric 1 matches within the {A,B,C,D} group as does metric 2. The events in metric 1 aren't used and are removed. The dedup algorithm is very naive :-) > I suppose in theory some kind of topological sort would be better. We could build something more complex, such as a directed acyclic graph where metrics with a subset of events are children of parent metrics. Children could have >1 parent for example {A,B,C,D},{A,B,E},{A,B} where {A,B} is a subset of both {A,B,C,D} and {A,B,E} and so a child of both. Presumably in that case it'd be better to match {A,B} with {A,B,E} to reduce multiplexing. As we're merging smaller groups into bigger, the sorting of the list is a quick and dirty approximation of this. Thanks, Ian > -Andi > > > > Signed-off-by: Ian Rogers <irogers@google.com> > > --- > > tools/perf/util/metricgroup.c | 16 +++++++++++++++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > > index 2a6456fa178b..69fbff47089f 100644 > > --- a/tools/perf/util/metricgroup.c > > +++ b/tools/perf/util/metricgroup.c > > @@ -520,7 +520,21 @@ static int __metricgroup__add_metric(struct list_head *group_list, > > return -EINVAL; > > } > > > > - list_add_tail(&eg->nd, group_list); > > + if (list_empty(group_list)) > > + list_add(&eg->nd, group_list); > > + else { > > + struct list_head *pos; > > + > > + /* Place the largest groups at the front. */ > > + list_for_each_prev(pos, group_list) { > > + struct egroup *old = list_entry(pos, struct egroup, nd); > > + > > + if (hashmap__size(&eg->pctx.ids) <= > > + hashmap__size(&old->pctx.ids)) > > + break; > > + } > > + list_add(&eg->nd, pos); > > + } > > > > return 0; > > } > > -- > > 2.26.2.645.ge9eca65c58-goog > >
> > I'm not sure if size is that great an heuristic. The dedup algorithm should > > work in any case even if you don't order by size, right? > > Consider two metrics: > - metric 1 with events {A,B} > - metric 2 with events {A,B,C,D} > If the list isn't sorted then as the matching takes the first group > with all the events, metric 1 will match {A,B} and metric 2 {A,B,C,D}. > If the order is sorted to {A,B,C,D},{A,B} then metric 1 matches within > the {A,B,C,D} group as does metric 2. The events in metric 1 aren't > used and are removed. Ok. It's better for the longer metric if they stay together. > > The dedup algorithm is very naive :-) I guess what matters is that it gives reasonable results on the current metrics. I assume it does? How much deduping is happening if you run all metrics? For toplev on my long term todo list was to compare it against a hopefully better schedule generated by or-tools, but I never got around to coding that up. -Andi
On Fri, May 8, 2020 at 7:40 PM Andi Kleen <ak@linux.intel.com> wrote: > > > > I'm not sure if size is that great an heuristic. The dedup algorithm should > > > work in any case even if you don't order by size, right? > > > > Consider two metrics: > > - metric 1 with events {A,B} > > - metric 2 with events {A,B,C,D} > > If the list isn't sorted then as the matching takes the first group > > with all the events, metric 1 will match {A,B} and metric 2 {A,B,C,D}. > > If the order is sorted to {A,B,C,D},{A,B} then metric 1 matches within > > the {A,B,C,D} group as does metric 2. The events in metric 1 aren't > > used and are removed. > > Ok. It's better for the longer metric if they stay together. > > > > > The dedup algorithm is very naive :-) > > I guess what matters is that it gives reasonable results on the current > metrics. I assume it does? > > How much deduping is happening if you run all metrics? Hi Andi, I included this data in the latest cover-letter: https://lore.kernel.org/lkml/20200520072814.128267-1-irogers@google.com/ > For toplev on my long term todo list was to compare it against > a hopefully better schedule generated by or-tools, but I never > got around to coding that up. > > -Andi Agreed, and this relates to your comments about doing the algorithm as a separate pass outside of find_evsel_group. For that, I don't disagree but would like to land what we have and then follow up with improvements. Thanks, Ian
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index 2a6456fa178b..69fbff47089f 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -520,7 +520,21 @@ static int __metricgroup__add_metric(struct list_head *group_list, return -EINVAL; } - list_add_tail(&eg->nd, group_list); + if (list_empty(group_list)) + list_add(&eg->nd, group_list); + else { + struct list_head *pos; + + /* Place the largest groups at the front. */ + list_for_each_prev(pos, group_list) { + struct egroup *old = list_entry(pos, struct egroup, nd); + + if (hashmap__size(&eg->pctx.ids) <= + hashmap__size(&old->pctx.ids)) + break; + } + list_add(&eg->nd, pos); + } return 0; }
When adding event groups to the group list, insert them in size order. This performs an insertion sort on the group list. By placing the largest groups at the front of the group list it is possible to see if a larger group contains the same events as a later group. This can make the later group redundant - it can reuse the events from the large group. A later patch will add this sharing. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/util/metricgroup.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)