Message ID | 20230407093147.3646597-1-kconsul@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Rejected, archived |
Headers | show |
Series | KVM: PPC: BOOK3S: book3s_hv_nested.c: improve branch prediction for k.alloc | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 24 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > I used the unlikely() macro on the return values of the k.alloc > calls and found that it changes the code generation a bit. > Optimize all return paths of k.alloc calls by improving > branch prediction on return value of k.alloc. What about below? "Improve branch prediction on kmalloc() and kzalloc() call by using unlikely() macro to optimize their return paths." That is, try to avoid first-person construct (I). Thanks.
On Fri, Apr 07, 2023, Bagas Sanjaya wrote: > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > > I used the unlikely() macro on the return values of the k.alloc > > calls and found that it changes the code generation a bit. > > Optimize all return paths of k.alloc calls by improving > > branch prediction on return value of k.alloc. Nit, this is improving code generation, not branch prediction. > What about below? > > "Improve branch prediction on kmalloc() and kzalloc() call by using > unlikely() macro to optimize their return paths." Another nit, using unlikely() doesn't necessarily provide a measurable optimization. As above, it does often improve code generation for the happy path, but that doesn't always equate to improved performance, e.g. if the CPU can easily predict the branch and/or there is no impact on the cache footprint.
On 2023-04-07 09:01:29, Sean Christopherson wrote: > On Fri, Apr 07, 2023, Bagas Sanjaya wrote: > > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > > > I used the unlikely() macro on the return values of the k.alloc > > > calls and found that it changes the code generation a bit. > > > Optimize all return paths of k.alloc calls by improving > > > branch prediction on return value of k.alloc. > > Nit, this is improving code generation, not branch prediction. Sorry my mistake. > > > What about below? > > > > "Improve branch prediction on kmalloc() and kzalloc() call by using > > unlikely() macro to optimize their return paths." > > Another nit, using unlikely() doesn't necessarily provide a measurable optimization. > As above, it does often improve code generation for the happy path, but that doesn't > always equate to improved performance, e.g. if the CPU can easily predict the branch > and/or there is no impact on the cache footprint. I see. I will submit a v2 of the patch with a better and more accurate description. Does anyone else have any comments before I do so ?
Kautuk Consul <kconsul@linux.vnet.ibm.com> writes: > On 2023-04-07 09:01:29, Sean Christopherson wrote: >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote: >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: >> > > I used the unlikely() macro on the return values of the k.alloc >> > > calls and found that it changes the code generation a bit. >> > > Optimize all return paths of k.alloc calls by improving >> > > branch prediction on return value of k.alloc. >> >> Nit, this is improving code generation, not branch prediction. > Sorry my mistake. >> >> > What about below? >> > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using >> > unlikely() macro to optimize their return paths." >> >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization. >> As above, it does often improve code generation for the happy path, but that doesn't >> always equate to improved performance, e.g. if the CPU can easily predict the branch >> and/or there is no impact on the cache footprint. > I see. I will submit a v2 of the patch with a better and more accurate > description. Does anyone else have any comments before I do so ? In general I think unlikely should be saved for cases where either the compiler is generating terrible code, or the likelyness of the condition might be surprising to a human reader. eg. if you had some code that does a NULL check and it's *expected* that the value is NULL, then wrapping that check in likely() actually adds information for a human reader. Also please don't use unlikely in init paths or other cold paths, it clutters the code (only slightly but a little) and that's not worth the possible tiny benefit for code that only runs once or infrequently. I would expect the compilers to do the right thing in all these cases without the unlikely. But if you can demonstrate that they meaningfully improve the code generation with a before/after dissassembly then I'd be interested. cheers
Sorry, last email rejected by the mailing lists. Can you please look at the diff file attach ? On Tue, Apr 11, 2023 at 2:14 PM Kautuk Consul <kautuk.consul.80@gmail.com> wrote: > > Hi, > > Sorry Im replying back using my private gmail ID as I can't figure out > how to attach multiple files using mutt. > > On Tue, Apr 11, 2023 at 12:05 PM Michael Ellerman <mpe@ellerman.id.au> wrote: > > > > Kautuk Consul <kconsul@linux.vnet.ibm.com> writes: > > > On 2023-04-07 09:01:29, Sean Christopherson wrote: > > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote: > > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > > >> > > I used the unlikely() macro on the return values of the k.alloc > > >> > > calls and found that it changes the code generation a bit. > > >> > > Optimize all return paths of k.alloc calls by improving > > >> > > branch prediction on return value of k.alloc. > > >> > > >> Nit, this is improving code generation, not branch prediction. > > > Sorry my mistake. > > >> > > >> > What about below? > > >> > > > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using > > >> > unlikely() macro to optimize their return paths." > > >> > > >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization. > > >> As above, it does often improve code generation for the happy path, but that doesn't > > >> always equate to improved performance, e.g. if the CPU can easily predict the branch > > >> and/or there is no impact on the cache footprint. > > > > > I see. I will submit a v2 of the patch with a better and more accurate > > > description. Does anyone else have any comments before I do so ? > > > > In general I think unlikely should be saved for cases where either the > > compiler is generating terrible code, or the likelyness of the condition > > might be surprising to a human reader. > > > > eg. if you had some code that does a NULL check and it's *expected* that > > the value is NULL, then wrapping that check in likely() actually adds > > information for a human reader. > > > > Also please don't use unlikely in init paths or other cold paths, it > > clutters the code (only slightly but a little) and that's not worth the > > possible tiny benefit for code that only runs once or infrequently. > > > > I would expect the compilers to do the right thing in all > > these cases without the unlikely. But if you can demonstrate that they > > meaningfully improve the code generation with a before/after > > dissassembly then I'd be interested. > > > There are surprisingly many changes to code generation before and > after using these > instances of the unlikely macro. I couldn't really analyze all of them > to be able to state > that they are indeed improving performance in some way. I assumed the compiler > would generate optimal code for these unlikely paths. > Please find the before and after file attached to this email. > > > cheers
Hi, On 2023-04-11 16:35:10, Michael Ellerman wrote: > Kautuk Consul <kconsul@linux.vnet.ibm.com> writes: > > On 2023-04-07 09:01:29, Sean Christopherson wrote: > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote: > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > >> > > I used the unlikely() macro on the return values of the k.alloc > >> > > calls and found that it changes the code generation a bit. > >> > > Optimize all return paths of k.alloc calls by improving > >> > > branch prediction on return value of k.alloc. > >> > >> Nit, this is improving code generation, not branch prediction. > > Sorry my mistake. > >> > >> > What about below? > >> > > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using > >> > unlikely() macro to optimize their return paths." > >> > >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization. > >> As above, it does often improve code generation for the happy path, but that doesn't > >> always equate to improved performance, e.g. if the CPU can easily predict the branch > >> and/or there is no impact on the cache footprint. > > > I see. I will submit a v2 of the patch with a better and more accurate > > description. Does anyone else have any comments before I do so ? > > In general I think unlikely should be saved for cases where either the > compiler is generating terrible code, or the likelyness of the condition > might be surprising to a human reader. > > eg. if you had some code that does a NULL check and it's *expected* that > the value is NULL, then wrapping that check in likely() actually adds > information for a human reader. > > Also please don't use unlikely in init paths or other cold paths, it > clutters the code (only slightly but a little) and that's not worth the > possible tiny benefit for code that only runs once or infrequently. > > I would expect the compilers to do the right thing in all > these cases without the unlikely. But if you can demonstrate that they > meaningfully improve the code generation with a before/after > dissassembly then I'd be interested. Just FYI, the last email by kautuk.consul.80@gmail.com was by me. That last email contains a diff file attachment which compares 2 files: before my changes and after my changes. This diff file shows a lot of changes in code generation. Im assuming all those changes are made by the compiler towards optimizing all return paths to k.alloc calls. Kindly review and comment. > cheers
On 2023-04-12 12:34:13, Kautuk Consul wrote: > Hi, > > On 2023-04-11 16:35:10, Michael Ellerman wrote: > > Kautuk Consul <kconsul@linux.vnet.ibm.com> writes: > > > On 2023-04-07 09:01:29, Sean Christopherson wrote: > > >> On Fri, Apr 07, 2023, Bagas Sanjaya wrote: > > >> > On Fri, Apr 07, 2023 at 05:31:47AM -0400, Kautuk Consul wrote: > > >> > > I used the unlikely() macro on the return values of the k.alloc > > >> > > calls and found that it changes the code generation a bit. > > >> > > Optimize all return paths of k.alloc calls by improving > > >> > > branch prediction on return value of k.alloc. > > >> > > >> Nit, this is improving code generation, not branch prediction. > > > Sorry my mistake. > > >> > > >> > What about below? > > >> > > > >> > "Improve branch prediction on kmalloc() and kzalloc() call by using > > >> > unlikely() macro to optimize their return paths." > > >> > > >> Another nit, using unlikely() doesn't necessarily provide a measurable optimization. > > >> As above, it does often improve code generation for the happy path, but that doesn't > > >> always equate to improved performance, e.g. if the CPU can easily predict the branch > > >> and/or there is no impact on the cache footprint. > > > > > I see. I will submit a v2 of the patch with a better and more accurate > > > description. Does anyone else have any comments before I do so ? > > > > In general I think unlikely should be saved for cases where either the > > compiler is generating terrible code, or the likelyness of the condition > > might be surprising to a human reader. > > > > eg. if you had some code that does a NULL check and it's *expected* that > > the value is NULL, then wrapping that check in likely() actually adds > > information for a human reader. > > > > Also please don't use unlikely in init paths or other cold paths, it > > clutters the code (only slightly but a little) and that's not worth the > > possible tiny benefit for code that only runs once or infrequently. > > > > I would expect the compilers to do the right thing in all > > these cases without the unlikely. But if you can demonstrate that they > > meaningfully improve the code generation with a before/after > > dissassembly then I'd be interested. > Just FYI, the last email by kautuk.consul.80@gmail.com was by me. > That last email contains a diff file attachment which compares 2 files: > before my changes and after my changes. > This diff file shows a lot of changes in code generation. Im assuming > all those changes are made by the compiler towards optimizing all return > paths to k.alloc calls. > Kindly review and comment. Any comments on the numerous code generation changes as shown by the files I attached to this mail chain ? Sorry I don't have concrete figures of any type to prove that this leads to any measurable performance improvements. I am just assuming that the compiler's modified code generation (due to the use of the unlikely macro) would be optimal. Thanks. > > cheers
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 5a64a1341e6f..dbf2dd073e1f 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -446,7 +446,7 @@ long kvmhv_nested_init(void) ptb_order = 12; pseries_partition_tb = kmalloc(sizeof(struct patb_entry) << ptb_order, GFP_KERNEL); - if (!pseries_partition_tb) { + if (unlikely(!pseries_partition_tb)) { pr_err("kvm-hv: failed to allocated nested partition table\n"); return -ENOMEM; } @@ -575,7 +575,7 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu) return H_PARAMETER; buf = kzalloc(n, GFP_KERNEL | __GFP_NOWARN); - if (!buf) + if (unlikely(!buf)) return H_NO_MEM; gp = kvmhv_get_nested(vcpu->kvm, l1_lpid, false); @@ -689,7 +689,7 @@ static struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int long shadow_lpid; gp = kzalloc(sizeof(*gp), GFP_KERNEL); - if (!gp) + if (unlikely(!gp)) return NULL; gp->l1_host = kvm; gp->l1_lpid = lpid; @@ -1633,7 +1633,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu, /* 4. Insert the pte into our shadow_pgtable */ n_rmap = kzalloc(sizeof(*n_rmap), GFP_KERNEL); - if (!n_rmap) + if (unlikely(!n_rmap)) return RESUME_GUEST; /* Let the guest try again */ n_rmap->rmap = (n_gpa & RMAP_NESTED_GPA_MASK) | (((unsigned long) gp->l1_lpid) << RMAP_NESTED_LPID_SHIFT);
I used the unlikely() macro on the return values of the k.alloc calls and found that it changes the code generation a bit. Optimize all return paths of k.alloc calls by improving branch prediction on return value of k.alloc. Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com> --- arch/powerpc/kvm/book3s_hv_nested.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)