Message ID | 1512418610-84032-5-git-send-email-bhanuprakash.bodireddy@intel.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev,RFC,1/5] compiler: Introduce OVS_PREFETCH variants. | expand |
> Prefetch the cacheline having the cycle stats so that we can speed up > the cycles_count_start() and cycles_count_intermediate(). Do you have any performance results? > > Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com> > --- > lib/dpif-netdev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index b74b5d7..ab13d83 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -576,7 +576,7 @@ struct dp_netdev_pmd_thread { > struct ovs_mutex flow_mutex; > /* 8 pad bytes. */ > ); > - PADDED_MEMBERS(CACHE_LINE_SIZE, > + PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cachelineC, > struct cmap flow_table OVS_GUARDED; /* Flow table. */ > > /* One classifier per in_port polled by the pmd */ > @@ -4082,6 +4082,7 @@ reload: > lc = UINT_MAX; > } > > + OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_HTW); > cycles_count_start(pmd); > for (;;) { > for (i = 0; i < poll_cnt; i++) { > -- > 2.4.11
> >> Prefetch the cacheline having the cycle stats so that we can speed up >> the cycles_count_start() and cycles_count_intermediate(). > >Do you have any performance results? I don’t have nos. for this patch alone. I was testing the overall throughput along with other patches (that were *not* part of this RFC series) to verify performance improvements. I will include in commit log when I do for individual patches. BTW, I usually look at the % of total instructions getting retired, cycles spent in front and back-end for the functions to see if prefetching does improve/degrade performance. - Bhanuprakash. > >> >> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at >> intel.com> >> --- >> lib/dpif-netdev.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >> b74b5d7..ab13d83 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -576,7 +576,7 @@ struct dp_netdev_pmd_thread { >> struct ovs_mutex flow_mutex; >> /* 8 pad bytes. */ >> ); >> - PADDED_MEMBERS(CACHE_LINE_SIZE, >> + PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, >cachelineC, >> struct cmap flow_table OVS_GUARDED; /* Flow table. */ >> >> /* One classifier per in_port polled by the pmd */ @@ -4082,6 >> +4082,7 @@ reload: >> lc = UINT_MAX; >> } >> >> + OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_HTW); >> cycles_count_start(pmd); >> for (;;) { >> for (i = 0; i < poll_cnt; i++) { >> -- >> 2.4.11
On 05.12.2017 18:11, Bodireddy, Bhanuprakash wrote: >> >>> Prefetch the cacheline having the cycle stats so that we can speed up >>> the cycles_count_start() and cycles_count_intermediate(). >> >> Do you have any performance results? > > I don’t have nos. for this patch alone. I was testing the overall throughput along with other patches (that were *not* part of this RFC series) to verify performance improvements. I will include in commit log when I do for individual patches. > > BTW, I usually look at the % of total instructions getting retired, cycles spent in front and back-end for the functions to see if prefetching does improve/degrade performance. > > - Bhanuprakash. > >> >>> >>> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at >>> intel.com> >>> --- >>> lib/dpif-netdev.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >>> b74b5d7..ab13d83 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -576,7 +576,7 @@ struct dp_netdev_pmd_thread { >>> struct ovs_mutex flow_mutex; >>> /* 8 pad bytes. */ >>> ); >>> - PADDED_MEMBERS(CACHE_LINE_SIZE, >>> + PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, >> cachelineC, >>> struct cmap flow_table OVS_GUARDED; /* Flow table. */ >>> >>> /* One classifier per in_port polled by the pmd */ @@ -4082,6 >>> +4082,7 @@ reload: >>> lc = UINT_MAX; >>> } >>> >>> + OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_HTW); How does prefetch just before the infinite loop should improve performance? I didn't test that, but IMHO, this should have zero impact. >>> cycles_count_start(pmd); >>> for (;;) { >>> for (i = 0; i < poll_cnt; i++) { >>> -- >>> 2.4.11
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index b74b5d7..ab13d83 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -576,7 +576,7 @@ struct dp_netdev_pmd_thread { struct ovs_mutex flow_mutex; /* 8 pad bytes. */ ); - PADDED_MEMBERS(CACHE_LINE_SIZE, + PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cachelineC, struct cmap flow_table OVS_GUARDED; /* Flow table. */ /* One classifier per in_port polled by the pmd */ @@ -4082,6 +4082,7 @@ reload: lc = UINT_MAX; } + OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_HTW); cycles_count_start(pmd); for (;;) { for (i = 0; i < poll_cnt; i++) {
Prefetch the cacheline having the cycle stats so that we can speed up the cycles_count_start() and cycles_count_intermediate(). Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> --- lib/dpif-netdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)