Message ID | 1515778879-60075-1-git-send-email-bhanuprakash.bodireddy@intel.com |
---|---|
State | Changes Requested |
Delegated to: | Ian Stokes |
Headers | show |
Series | [ovs-dev,1/4] compiler: Introduce OVS_PREFETCH variants. | expand |
Hi Bhanu, who do you think should review this series? Is it something that Ian should pick up for dpdk_merge?
>-----Original Message----- >From: Ben Pfaff [mailto:blp@ovn.org] >Sent: Friday, January 12, 2018 6:20 PM >To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com> >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH >variants. > >Hi Bhanu, who do you think should review this series? Is it something that Ian >should pick up for dpdk_merge? Hi Ben, I will check with Ian if he has time to review this. As the patch series doesn't change any functionality at this point it shouldn't take much time. -Bhanuprakash.
On Fri, Jan 12, 2018 at 07:38:49PM +0000, Bodireddy, Bhanuprakash wrote: > >-----Original Message----- > >From: Ben Pfaff [mailto:blp@ovn.org] > >Sent: Friday, January 12, 2018 6:20 PM > >To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com> > >Cc: dev@openvswitch.org > >Subject: Re: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH > >variants. > > > >Hi Bhanu, who do you think should review this series? Is it something that Ian > >should pick up for dpdk_merge? > > Hi Ben, > > I will check with Ian if he has time to review this. As the patch series doesn't > change any functionality at this point it shouldn't take much time. OK. Let me know if someone else should review the series.
> -----Original Message----- > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > bounces@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy > Sent: Friday, January 12, 2018 5:41 PM > To: dev@openvswitch.org > Subject: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH variants. > > This commit introduces prefetch variants by using the GCC built-in > prefetch function. > > The prefetch variants gives the user better control on designing data > caching strategy in order to increase cache efficiency and minimize cache > pollution. Data reference patterns here can be classified in to > > - Non-temporal(NT) - Data that is referenced once and not reused in > immediate future. > - Temporal - Data will be used again soon. > > The Macro variants can be used where there are > - Predictable memory access patterns. > - Execution pipeline can stall if data isn't available. > - Time consuming loops. > > For example: > > OVS_PREFETCH_CACHE(addr, OPCH_LTR) > - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ. > - __builtin_prefetch(addr, 0, 1) > - Prefetch data in to L3 cache for readonly purpose. > > OVS_PREFETCH_CACHE(addr, OPCH_HTW) > - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE. > - __builtin_prefetch(addr, 1, 3) > - Prefetch data in to all caches in anticipation of write. In doing > so it invalidates other cached copies so as to gain 'exclusive' > access. > > OVS_PREFETCH(addr) > - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ. > - __builtin_prefetch(addr, 0, 3) > - Prefetch data in to all caches in anticipation of read and that > data will be used again soon (HTR - High Temporal Read). > > Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> > --- > include/openvswitch/compiler.h | 147 > ++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 139 insertions(+), 8 deletions(-) > > diff --git a/include/openvswitch/compiler.h > b/include/openvswitch/compiler.h index c7cb930..94bb24d 100644 > --- a/include/openvswitch/compiler.h > +++ b/include/openvswitch/compiler.h > @@ -222,18 +222,149 @@ > static void f(void) > #endif > > -/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache > - * line containing the given address to a CPU cache. > - * OVS_PREFETCH_WRITE() should be used when the memory is going to be > - * written to. Depending on the target CPU, this can generate the same > - * instruction as OVS_PREFETCH(), or bring the data into the cache in an > - * exclusive state. */ > #if __GNUC__ > -#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) -#define > OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1) > +enum cache_locality { > + NON_TEMPORAL_LOCALITY, > + LOW_TEMPORAL_LOCALITY, > + MODERATE_TEMPORAL_LOCALITY, > + HIGH_TEMPORAL_LOCALITY > +}; > + > +enum cache_rw { > + PREFETCH_READ, > + PREFETCH_WRITE > +}; > + > +/* The prefetch variants gives the user better control on designing > +data > + * caching strategy in order to increase cache efficiency and minimize > + * cache pollution. Data reference patterns here can be classified in > +to > + * > + * Non-temporal(NT) - Data that is referenced once and not reused in > + * immediate future. > + * Temporal - Data will be used again soon. > + * > + * The Macro variants can be used where there are > + * o Predictable memory access patterns. > + * o Execution pipeline can stall if data isn't available. > + * o Time consuming loops. > + * > + * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the > +cache > + * line containing the given address to a CPU cache. The second > +argument > + * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is > +going > + * to be read or written to by core. > + * > + * Example Usage: > + * > + * OVS_PREFETCH_CACHE(addr, OPCH_LTR) > + * - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ. > + * - __builtin_prefetch(addr, 0, 1) > + * - Prefetch data in to L3 cache for readonly purpose. > + * > + * OVS_PREFETCH_CACHE(addr, OPCH_HTW) > + * - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE. > + * - __builtin_prefetch(addr, 1, 3) > + * - Prefetch data in to all caches in anticipation of write. In > doing > + * so it invalidates other cached copies so as to gain > 'exclusive' > + * access. > + * > + * OVS_PREFETCH(addr) > + * - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ. > + * - __builtin_prefetch(addr, 0, 3) > + * - Prefetch data in to all caches in anticipation of read and > that > + * data will be used again soon (HTR - High Temporal Read). > + * > + * Implementation details of prefetch hint instructions may vary across > + * different processors and microarchitectures. Herein lies a potential problem, have you tested this on systems that have different interpretations of the prefetch hints? What about systems that don't support it? In some cases OVS will be compiled on one system but then deployed on another, they might not be the same HW platform. What happens in that case? Will it behave as expected i.e. similar fashion to how prefetch currently behaves? > + * > + * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and > +OPCH_HTW > + * uses prefetchw instruction when available. Refer Documentation on > +how > + * to enable prefetchwt1 instruction. Just to clarify, Is it HW documentation for a user's setup they must refer to? Are there any extra setup steps for compilers etc. for these instructions? I would expect something like this to be added to the OVS docs. > + * > + * PREFETCH HINT Instruction GCC builtin function > + * ------------------------------------------------------- > + * OPCH_NTR prefetchnta __builtin_prefetch(a, 0, 0) > + * OPCH_LTR prefetcht2 __builtin_prefetch(a, 0, 1) > + * OPCH_MTR prefetcht1 __builtin_prefetch(a, 0, 2) > + * OPCH_HTR prefetcht0 __builtin_prefetch(a, 0, 3) > + * > + * OPCH_NTW prefetchwt1 __builtin_prefetch(a, 1, 0) > + * OPCH_LTW prefetchwt1 __builtin_prefetch(a, 1, 1) > + * OPCH_MTW prefetchwt1 __builtin_prefetch(a, 1, 2) > + * OPCH_HTW prefetchw __builtin_prefetch(a, 1, 3) > + * > + * */ > +#define OVS_PREFETCH_CACHE_HINT > \ > + OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY, > \ > + "Fetch data to non-temporal cache close to processor" > \ > + "to minimize cache pollution") > \ > + OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY, > \ > + "Fetch data to L2 and L3 cache") > \ > + OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY, > \ > + "Fetch data to L2 and L3 caches, same as LTR on" > \ > + "Nehalem, Westmere, Sandy Bridge and newer microarchitectures") > \ > + OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY, > \ > + "Fetch data in to all cache levels L1, L2 and L3") > \ > + OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY, > \ > + "Fetch data to L2 and L3 cache in exclusive state" > \ > + "in anticipation of write") > \ > + OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY, > \ > + "Fetch data to L2 and L3 cache in exclusive state") > \ > + OPCH(OPCH_MTW, PREFETCH_WRITE, MODERATE_TEMPORAL_LOCALITY, > \ > + "Fetch data in to L2 and L3 caches in exclusive state") > \ > + OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY, > \ > + "Fetch data in to all cache levels in exclusive state") > + > +/* Indexes for cache prefetch types. */ enum { #define OPCH(ENUM, RW, > +LOCALITY, EXPLANATION) ENUM##_INDEX, > + OVS_PREFETCH_CACHE_HINT > +#undef OPCH > +}; > + > +/* Cache prefetch types. */ > +enum ovs_prefetch_type { > +#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 << ENUM##_INDEX, > + OVS_PREFETCH_CACHE_HINT > +#undef OPCH > +}; > + > +#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE) Checkpatch caught the following: ERROR: Improper whitespace around control block #164 FILE: include/openvswitch/compiler.h:331: #define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE) \ Lines checked: 204, Warnings: 0, Errors: 1> \ > +{ > \ > + case OPCH_NTR: > \ > + __builtin_prefetch((addr), PREFETCH_READ, NON_TEMPORAL_LOCALITY); > \ > + break; > \ > + case OPCH_LTR: > \ > + __builtin_prefetch((addr), PREFETCH_READ, LOW_TEMPORAL_LOCALITY); > \ > + break; > \ > + case OPCH_MTR: > \ > + __builtin_prefetch((addr), PREFETCH_READ, > \ > + MODERATE_TEMPORAL_LOCALITY); > \ > + break; > \ > + case OPCH_HTR: > \ > + __builtin_prefetch((addr), PREFETCH_READ, > HIGH_TEMPORAL_LOCALITY); \ > + break; > \ > + case OPCH_NTW: > \ > + __builtin_prefetch((addr), PREFETCH_WRITE, > NON_TEMPORAL_LOCALITY); \ > + break; > \ > + case OPCH_LTW: > \ > + __builtin_prefetch((addr), PREFETCH_WRITE, > LOW_TEMPORAL_LOCALITY); \ > + break; > \ > + case OPCH_MTW: > \ > + __builtin_prefetch((addr), PREFETCH_WRITE, > \ > + MODERATE_TEMPORAL_LOCALITY); > \ > + break; > \ > + case OPCH_HTW: > \ > + __builtin_prefetch((addr), PREFETCH_WRITE, > HIGH_TEMPORAL_LOCALITY); \ > + break; > \ > +} > + > +/* Retain this for backward compatibility. */ #define > +OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR) #define > +OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW) > #else > #define OVS_PREFETCH(addr) > #define OVS_PREFETCH_WRITE(addr) > +#define OVS_PREFETCH_CACHE(addr, OP) > #endif > > /* Build assertions. > -- > 2.4.11 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >> -----Original Message----- >> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- >> bounces@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy >> Sent: Friday, January 12, 2018 5:41 PM >> To: dev@openvswitch.org >> Subject: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH variants. >> >> This commit introduces prefetch variants by using the GCC built-in >> prefetch function. >> >> The prefetch variants gives the user better control on designing data >> caching strategy in order to increase cache efficiency and minimize >> cache pollution. Data reference patterns here can be classified in to >> >> - Non-temporal(NT) - Data that is referenced once and not reused in >> immediate future. >> - Temporal - Data will be used again soon. >> >> The Macro variants can be used where there are >> - Predictable memory access patterns. >> - Execution pipeline can stall if data isn't available. >> - Time consuming loops. >> >> For example: >> >> OVS_PREFETCH_CACHE(addr, OPCH_LTR) >> - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ. >> - __builtin_prefetch(addr, 0, 1) >> - Prefetch data in to L3 cache for readonly purpose. >> >> OVS_PREFETCH_CACHE(addr, OPCH_HTW) >> - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE. >> - __builtin_prefetch(addr, 1, 3) >> - Prefetch data in to all caches in anticipation of write. In doing >> so it invalidates other cached copies so as to gain 'exclusive' >> access. >> >> OVS_PREFETCH(addr) >> - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ. >> - __builtin_prefetch(addr, 0, 3) >> - Prefetch data in to all caches in anticipation of read and that >> data will be used again soon (HTR - High Temporal Read). >> >> Signed-off-by: Bhanuprakash Bodireddy >> <bhanuprakash.bodireddy@intel.com> >> --- >> include/openvswitch/compiler.h | 147 >> ++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 139 insertions(+), 8 deletions(-) >> >> diff --git a/include/openvswitch/compiler.h >> b/include/openvswitch/compiler.h index c7cb930..94bb24d 100644 >> --- a/include/openvswitch/compiler.h >> +++ b/include/openvswitch/compiler.h >> @@ -222,18 +222,149 @@ >> static void f(void) >> #endif >> >> -/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache >> - * line containing the given address to a CPU cache. >> - * OVS_PREFETCH_WRITE() should be used when the memory is going to >be >> - * written to. Depending on the target CPU, this can generate the >> same >> - * instruction as OVS_PREFETCH(), or bring the data into the cache in >> an >> - * exclusive state. */ >> #if __GNUC__ >> -#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) -#define >> OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1) >> +enum cache_locality { >> + NON_TEMPORAL_LOCALITY, >> + LOW_TEMPORAL_LOCALITY, >> + MODERATE_TEMPORAL_LOCALITY, >> + HIGH_TEMPORAL_LOCALITY >> +}; >> + >> +enum cache_rw { >> + PREFETCH_READ, >> + PREFETCH_WRITE >> +}; >> + >> +/* The prefetch variants gives the user better control on designing >> +data >> + * caching strategy in order to increase cache efficiency and >> +minimize >> + * cache pollution. Data reference patterns here can be classified in >> +to >> + * >> + * Non-temporal(NT) - Data that is referenced once and not reused in >> + * immediate future. >> + * Temporal - Data will be used again soon. >> + * >> + * The Macro variants can be used where there are >> + * o Predictable memory access patterns. >> + * o Execution pipeline can stall if data isn't available. >> + * o Time consuming loops. >> + * >> + * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the >> +cache >> + * line containing the given address to a CPU cache. The second >> +argument >> + * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is >> +going >> + * to be read or written to by core. >> + * >> + * Example Usage: >> + * >> + * OVS_PREFETCH_CACHE(addr, OPCH_LTR) >> + * - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ. >> + * - __builtin_prefetch(addr, 0, 1) >> + * - Prefetch data in to L3 cache for readonly purpose. >> + * >> + * OVS_PREFETCH_CACHE(addr, OPCH_HTW) >> + * - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE. >> + * - __builtin_prefetch(addr, 1, 3) >> + * - Prefetch data in to all caches in anticipation of write. In >> doing >> + * so it invalidates other cached copies so as to gain >> 'exclusive' >> + * access. >> + * >> + * OVS_PREFETCH(addr) >> + * - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ. >> + * - __builtin_prefetch(addr, 0, 3) >> + * - Prefetch data in to all caches in anticipation of read and >> that >> + * data will be used again soon (HTR - High Temporal Read). >> + * >> + * Implementation details of prefetch hint instructions may vary >> + across >> + * different processors and microarchitectures. > >Herein lies a potential problem, have you tested this on systems that have >different interpretations of the prefetch hints? What about systems that >don't support it? [BHANU] I have tested it on different intel micro architectures(Haswell, Broadwell, skylake). I understand that you are concerned about ARM platform, I see that ARM do support prefetch variants and they have the same functionality as x86_64. For example, the below code snippet when compiled on ARM64 with gcc 5.4 void pref(void *p) { __builtin_prefetch(p,0,0); __builtin_prefetch(p,0,1); __builtin_prefetch(p,0,2); __builtin_prefetch(p,0,3); __builtin_prefetch(p,1,0); __builtin_prefetch(p,1,1); __builtin_prefetch(p,1,2); __builtin_prefetch(p,1,3); } ON ARM64 (gcc 5.4) : pref: prfm PLDL1STRM, [x0] prfm PLDL3KEEP, [x0] prfm PLDL2KEEP, [x0] prfm PLDL1KEEP, [x0] prfm PSTL1STRM, [x0] prfm PSTL3KEEP, [x0] prfm PSTL2KEEP, [x0] prfm PSTL1KEEP, [x0] ret On instruction details: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/PRFM_imm.html The best way to verify different platforms and complier versions is to use https://gcc.godbolt.org/ > >In some cases OVS will be compiled on one system but then deployed on >another, they might not be the same HW platform. What happens in that >case? If the target doesn't support the prefetch, it might be a NOP on that platform and doesn't cause any application crashes or performance penalties. > >Will it behave as expected i.e. similar fashion to how prefetch currently >behaves? Yes. > >> + * >> + * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and >> +OPCH_HTW >> + * uses prefetchw instruction when available. Refer Documentation on >> +how >> + * to enable prefetchwt1 instruction. > >Just to clarify, Is it HW documentation for a user's setup they must refer to? [BHANU] Nope, I meant the OvS Documentation in this patch. https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343101.html >Are there any extra setup steps for compilers etc. for these instructions? [BHANU] True, this has been clearly mentioned in the Documentation in the above specified link. > >I would expect something like this to be added to the OVS docs. > >> + * >> + * PREFETCH HINT Instruction GCC builtin function >> + * ------------------------------------------------------- >> + * OPCH_NTR prefetchnta __builtin_prefetch(a, 0, 0) >> + * OPCH_LTR prefetcht2 __builtin_prefetch(a, 0, 1) >> + * OPCH_MTR prefetcht1 __builtin_prefetch(a, 0, 2) >> + * OPCH_HTR prefetcht0 __builtin_prefetch(a, 0, 3) >> + * >> + * OPCH_NTW prefetchwt1 __builtin_prefetch(a, 1, 0) >> + * OPCH_LTW prefetchwt1 __builtin_prefetch(a, 1, 1) >> + * OPCH_MTW prefetchwt1 __builtin_prefetch(a, 1, 2) >> + * OPCH_HTW prefetchw __builtin_prefetch(a, 1, 3) >> + * >> + * */ >> +#define OVS_PREFETCH_CACHE_HINT >> \ >> + OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to non-temporal cache close to processor" >> \ >> + "to minimize cache pollution") >> \ >> + OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to L2 and L3 cache") >> \ >> + OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to L2 and L3 caches, same as LTR on" >> \ >> + "Nehalem, Westmere, Sandy Bridge and newer >> + microarchitectures") >> \ >> + OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY, >> \ >> + "Fetch data in to all cache levels L1, L2 and L3") >> \ >> + OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to L2 and L3 cache in exclusive state" >> \ >> + "in anticipation of write") >> \ >> + OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY, >> \ >> + "Fetch data to L2 and L3 cache in exclusive state") >> \ >> + OPCH(OPCH_MTW, PREFETCH_WRITE, >MODERATE_TEMPORAL_LOCALITY, >> \ >> + "Fetch data in to L2 and L3 caches in exclusive state") >> \ >> + OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY, >> \ >> + "Fetch data in to all cache levels in exclusive state") >> + >> +/* Indexes for cache prefetch types. */ enum { #define OPCH(ENUM, RW, >> +LOCALITY, EXPLANATION) ENUM##_INDEX, >> + OVS_PREFETCH_CACHE_HINT >> +#undef OPCH >> +}; >> + >> +/* Cache prefetch types. */ >> +enum ovs_prefetch_type { >> +#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 << >ENUM##_INDEX, >> + OVS_PREFETCH_CACHE_HINT >> +#undef OPCH >> +}; >> + >> +#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE) > >Checkpatch caught the following: > >ERROR: Improper whitespace around control block >#164 FILE: include/openvswitch/compiler.h:331: >#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE) \ > >Lines checked: 204, Warnings: 0, Errors: 1> \ [BHANU] I will fix this. >> +{ >> \ >> + case OPCH_NTR: >> \ >> + __builtin_prefetch((addr), PREFETCH_READ, >> + NON_TEMPORAL_LOCALITY); >> \ >> + break; >> \ >> + case OPCH_LTR: >> \ >> + __builtin_prefetch((addr), PREFETCH_READ, >> + LOW_TEMPORAL_LOCALITY); >> \ >> + break; >> \ >> + case OPCH_MTR: >> \ >> + __builtin_prefetch((addr), PREFETCH_READ, >> \ >> + MODERATE_TEMPORAL_LOCALITY); >> \ >> + break; >> \ >> + case OPCH_HTR: >> \ >> + __builtin_prefetch((addr), PREFETCH_READ, >> HIGH_TEMPORAL_LOCALITY); \ >> + break; >> \ >> + case OPCH_NTW: >> \ >> + __builtin_prefetch((addr), PREFETCH_WRITE, >> NON_TEMPORAL_LOCALITY); \ >> + break; >> \ >> + case OPCH_LTW: >> \ >> + __builtin_prefetch((addr), PREFETCH_WRITE, >> LOW_TEMPORAL_LOCALITY); \ >> + break; >> \ >> + case OPCH_MTW: >> \ >> + __builtin_prefetch((addr), PREFETCH_WRITE, >> \ >> + MODERATE_TEMPORAL_LOCALITY); >> \ >> + break; >> \ >> + case OPCH_HTW: >> \ >> + __builtin_prefetch((addr), PREFETCH_WRITE, >> HIGH_TEMPORAL_LOCALITY); \ >> + break; >> \ >> +} >> + >> +/* Retain this for backward compatibility. */ #define >> +OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR) #define >> +OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW) >> #else >> #define OVS_PREFETCH(addr) >> #define OVS_PREFETCH_WRITE(addr) >> +#define OVS_PREFETCH_CACHE(addr, OP) >> #endif >> >> /* Build assertions. >> -- >> 2.4.11 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h index c7cb930..94bb24d 100644 --- a/include/openvswitch/compiler.h +++ b/include/openvswitch/compiler.h @@ -222,18 +222,149 @@ static void f(void) #endif -/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache - * line containing the given address to a CPU cache. - * OVS_PREFETCH_WRITE() should be used when the memory is going to be - * written to. Depending on the target CPU, this can generate the same - * instruction as OVS_PREFETCH(), or bring the data into the cache in an - * exclusive state. */ #if __GNUC__ -#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) -#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1) +enum cache_locality { + NON_TEMPORAL_LOCALITY, + LOW_TEMPORAL_LOCALITY, + MODERATE_TEMPORAL_LOCALITY, + HIGH_TEMPORAL_LOCALITY +}; + +enum cache_rw { + PREFETCH_READ, + PREFETCH_WRITE +}; + +/* The prefetch variants gives the user better control on designing data + * caching strategy in order to increase cache efficiency and minimize + * cache pollution. Data reference patterns here can be classified in to + * + * Non-temporal(NT) - Data that is referenced once and not reused in + * immediate future. + * Temporal - Data will be used again soon. + * + * The Macro variants can be used where there are + * o Predictable memory access patterns. + * o Execution pipeline can stall if data isn't available. + * o Time consuming loops. + * + * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the cache + * line containing the given address to a CPU cache. The second argument + * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is going + * to be read or written to by core. + * + * Example Usage: + * + * OVS_PREFETCH_CACHE(addr, OPCH_LTR) + * - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ. + * - __builtin_prefetch(addr, 0, 1) + * - Prefetch data in to L3 cache for readonly purpose. + * + * OVS_PREFETCH_CACHE(addr, OPCH_HTW) + * - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE. + * - __builtin_prefetch(addr, 1, 3) + * - Prefetch data in to all caches in anticipation of write. In doing + * so it invalidates other cached copies so as to gain 'exclusive' + * access. + * + * OVS_PREFETCH(addr) + * - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ. + * - __builtin_prefetch(addr, 0, 3) + * - Prefetch data in to all caches in anticipation of read and that + * data will be used again soon (HTR - High Temporal Read). + * + * Implementation details of prefetch hint instructions may vary across + * different processors and microarchitectures. + * + * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and OPCH_HTW + * uses prefetchw instruction when available. Refer Documentation on how + * to enable prefetchwt1 instruction. + * + * PREFETCH HINT Instruction GCC builtin function + * ------------------------------------------------------- + * OPCH_NTR prefetchnta __builtin_prefetch(a, 0, 0) + * OPCH_LTR prefetcht2 __builtin_prefetch(a, 0, 1) + * OPCH_MTR prefetcht1 __builtin_prefetch(a, 0, 2) + * OPCH_HTR prefetcht0 __builtin_prefetch(a, 0, 3) + * + * OPCH_NTW prefetchwt1 __builtin_prefetch(a, 1, 0) + * OPCH_LTW prefetchwt1 __builtin_prefetch(a, 1, 1) + * OPCH_MTW prefetchwt1 __builtin_prefetch(a, 1, 2) + * OPCH_HTW prefetchw __builtin_prefetch(a, 1, 3) + * + * */ +#define OVS_PREFETCH_CACHE_HINT \ + OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY, \ + "Fetch data to non-temporal cache close to processor" \ + "to minimize cache pollution") \ + OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY, \ + "Fetch data to L2 and L3 cache") \ + OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY, \ + "Fetch data to L2 and L3 caches, same as LTR on" \ + "Nehalem, Westmere, Sandy Bridge and newer microarchitectures") \ + OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY, \ + "Fetch data in to all cache levels L1, L2 and L3") \ + OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY, \ + "Fetch data to L2 and L3 cache in exclusive state" \ + "in anticipation of write") \ + OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY, \ + "Fetch data to L2 and L3 cache in exclusive state") \ + OPCH(OPCH_MTW, PREFETCH_WRITE, MODERATE_TEMPORAL_LOCALITY, \ + "Fetch data in to L2 and L3 caches in exclusive state") \ + OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY, \ + "Fetch data in to all cache levels in exclusive state") + +/* Indexes for cache prefetch types. */ +enum { +#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM##_INDEX, + OVS_PREFETCH_CACHE_HINT +#undef OPCH +}; + +/* Cache prefetch types. */ +enum ovs_prefetch_type { +#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 << ENUM##_INDEX, + OVS_PREFETCH_CACHE_HINT +#undef OPCH +}; + +#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE) \ +{ \ + case OPCH_NTR: \ + __builtin_prefetch((addr), PREFETCH_READ, NON_TEMPORAL_LOCALITY); \ + break; \ + case OPCH_LTR: \ + __builtin_prefetch((addr), PREFETCH_READ, LOW_TEMPORAL_LOCALITY); \ + break; \ + case OPCH_MTR: \ + __builtin_prefetch((addr), PREFETCH_READ, \ + MODERATE_TEMPORAL_LOCALITY); \ + break; \ + case OPCH_HTR: \ + __builtin_prefetch((addr), PREFETCH_READ, HIGH_TEMPORAL_LOCALITY); \ + break; \ + case OPCH_NTW: \ + __builtin_prefetch((addr), PREFETCH_WRITE, NON_TEMPORAL_LOCALITY); \ + break; \ + case OPCH_LTW: \ + __builtin_prefetch((addr), PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY); \ + break; \ + case OPCH_MTW: \ + __builtin_prefetch((addr), PREFETCH_WRITE, \ + MODERATE_TEMPORAL_LOCALITY); \ + break; \ + case OPCH_HTW: \ + __builtin_prefetch((addr), PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY); \ + break; \ +} + +/* Retain this for backward compatibility. */ +#define OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR) +#define OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW) #else #define OVS_PREFETCH(addr) #define OVS_PREFETCH_WRITE(addr) +#define OVS_PREFETCH_CACHE(addr, OP) #endif /* Build assertions.
This commit introduces prefetch variants by using the GCC built-in prefetch function. The prefetch variants gives the user better control on designing data caching strategy in order to increase cache efficiency and minimize cache pollution. Data reference patterns here can be classified in to - Non-temporal(NT) - Data that is referenced once and not reused in immediate future. - Temporal - Data will be used again soon. The Macro variants can be used where there are - Predictable memory access patterns. - Execution pipeline can stall if data isn't available. - Time consuming loops. For example: OVS_PREFETCH_CACHE(addr, OPCH_LTR) - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ. - __builtin_prefetch(addr, 0, 1) - Prefetch data in to L3 cache for readonly purpose. OVS_PREFETCH_CACHE(addr, OPCH_HTW) - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE. - __builtin_prefetch(addr, 1, 3) - Prefetch data in to all caches in anticipation of write. In doing so it invalidates other cached copies so as to gain 'exclusive' access. OVS_PREFETCH(addr) - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ. - __builtin_prefetch(addr, 0, 3) - Prefetch data in to all caches in anticipation of read and that data will be used again soon (HTR - High Temporal Read). Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> --- include/openvswitch/compiler.h | 147 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 139 insertions(+), 8 deletions(-)