[v4] exec: Fix non-power-of-2 sized accesses

Message ID	20130816215706.23647.80992.stgit@bling.home
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> To: qemu-devel@nongnu.org From: Alex Williamson <alex.williamson@redhat.com> Date: Fri, 16 Aug 2013 15:58:49 -0600 Message-ID: <20130816215706.23647.80992.stgit@bling.home> User-Agent: StGit/0.16 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Cc: lersek@redhat.com, qemu-stable@nongnu.org, rth@twiddle.net Subject: [Qemu-devel] [PATCH v4] exec: Fix non-power-of-2 sized accesses Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Alex Williamson Aug. 16, 2013, 9:58 p.m. UTC

Since commit 23326164 we align access sizes to match the alignment of
the address, but we don't align the access size itself.  This means we
let illegal access sizes (ex. 3) slip through if the address is
sufficiently aligned (ex. 4).  This results in an abort which would be
easy for a guest to trigger.  Account for aligning the access size.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org
---

v4: KISS
v3: Highest power of 2, not lowest
v2: Remove unnecessary loop condition

 exec.c |   18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

Paolo Bonzini Aug. 17, 2013, 6:33 a.m. UTC | #1

Il 16/08/2013 23:58, Alex Williamson ha scritto:
> Since commit 23326164 we align access sizes to match the alignment of
> the address, but we don't align the access size itself.  This means we
> let illegal access sizes (ex. 3) slip through if the address is
> sufficiently aligned (ex. 4).  This results in an abort which would be
> easy for a guest to trigger.  Account for aligning the access size.

Is it the same as this?

http://lists.gnu.org/archive/html/qemu-devel/2013-07/msg05398.html

(which perhaps is buggy as your v1/v2/v3 :))?

Paolo

> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Cc: qemu-stable@nongnu.org
> ---
> 
> v4: KISS
> v3: Highest power of 2, not lowest
> v2: Remove unnecessary loop condition
> 
>  exec.c |   18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 3ca9381..67a822c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>          }
>      }
>  
> -    /* Don't attempt accesses larger than the maximum.  */
> -    if (l > access_size_max) {
> -        l = access_size_max;
> +    /* Don't attempt accesses larger than the maximum or unsupported sizes.  */
> +    if (l >= access_size_max) {
> +        return access_size_max;
> +    } else {
> +        if (l >= 8) {
> +            return 8;
> +        } else if (l >= 4) {
> +            return 4;
> +        } else if (l >= 2) {
> +            return 2;
> +        } else {
> +            return 1;
> +        }
>      }
> -
> -    return l;
>  }
>  
>  bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
> 
> 
>

Laszlo Ersek Aug. 17, 2013, 8:23 a.m. UTC | #2

On 08/16/13 23:58, Alex Williamson wrote:
> Since commit 23326164 we align access sizes to match the alignment of
> the address, but we don't align the access size itself.  This means we
> let illegal access sizes (ex. 3) slip through if the address is
> sufficiently aligned (ex. 4).  This results in an abort which would be
> easy for a guest to trigger.  Account for aligning the access size.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Cc: qemu-stable@nongnu.org
> ---
> 
> v4: KISS
> v3: Highest power of 2, not lowest
> v2: Remove unnecessary loop condition
> 
>  exec.c |   18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 3ca9381..67a822c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>          }
>      }
>  
> -    /* Don't attempt accesses larger than the maximum.  */
> -    if (l > access_size_max) {
> -        l = access_size_max;
> +    /* Don't attempt accesses larger than the maximum or unsupported sizes.  */
> +    if (l >= access_size_max) {
> +        return access_size_max;
> +    } else {
> +        if (l >= 8) {
> +            return 8;
> +        } else if (l >= 4) {
> +            return 4;
> +        } else if (l >= 2) {
> +            return 2;
> +        } else {
> +            return 1;
> +        }
>      }
> -
> -    return l;
>  }
>  
>  bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
> 

Considering that each block contains a return statement, I'd drop the
else's:

    if (l >= access_size_max) {
        return access_size_max;
    }
    if (l >= 8) {
        return 8;
    }
    if (l >= 4) {
        return 4;
    }
    if (l >= 2) {
        return 2;
    }
    return 1;

Or even

    return l >= access_size_max ? access_size_max :
           l >= 8               ? 8               :
           l >= 4               ? 4               :
           l >= 2               ? 2               :
           1;

But this is just bikeshedding, so I'm not suggesting it.

Regarding function... I can at least understand this code. So, you want
to find the most significant bit set in "l", and clear everything else.
If said leftmost bit is to the left of bit#3, then use bit#3 instead.

This idea should work if "l" is already a whole power of two.

    if (l >= access_size_max) {
        return access_size_max;
    }
    return 1 << max(3, lmb(l));

What Paolo posted seems almost identical.

clz32(l):                     leading zeros in "l"
qemu_fls(l) == 32 - clz32(l): position of leftmost bit set, 1-based
qemu_fls(l) - 1:              position of leftmost bit set, 0-based

Not sure if the (l & (l - 1)) check is needed in Paolo's patch. clz32()
is not generally usable when l==0, so maybe that's (too) what the check
is for. OTOH maybe l==0 is not even possible when entering
memory_access_size().

Second, Paolo's patch might lack the "max(3, ...)" part. Since you
didn't call my previous example with l==9 retarded, I guess clamping
(qemu_fls(l) - 1) at 3 would be necessary.

Third, clz32() is probably very fast when gcc has a builtin for it, and
probably slower than your open-coded version otherwsie.

I still don't know enough about this topic, but I like this patch
because I can understand the intent at least :)

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

(Bit-counting is a great complement to the Saturday morning espresso :))

Laszlo Ersek Aug. 17, 2013, 9:16 a.m. UTC | #3

(side point)

On 08/17/13 10:23, Laszlo Ersek wrote:

>     if (l >= access_size_max) {
>         return access_size_max;
>     }
>     return 1 << max(3, lmb(l));

lol, of course this should have been min()...

Alex's patch is OK of course.

Laszlo

Alex Williamson Aug. 17, 2013, 3:14 p.m. UTC | #4

On Sat, 2013-08-17 at 10:23 +0200, Laszlo Ersek wrote:
> On 08/16/13 23:58, Alex Williamson wrote:
> > Since commit 23326164 we align access sizes to match the alignment of
> > the address, but we don't align the access size itself.  This means we
> > let illegal access sizes (ex. 3) slip through if the address is
> > sufficiently aligned (ex. 4).  This results in an abort which would be
> > easy for a guest to trigger.  Account for aligning the access size.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > Cc: qemu-stable@nongnu.org
> > ---
> > 
> > v4: KISS
> > v3: Highest power of 2, not lowest
> > v2: Remove unnecessary loop condition
> > 
> >  exec.c |   18 +++++++++++++-----
> >  1 file changed, 13 insertions(+), 5 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 3ca9381..67a822c 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
> >          }
> >      }
> >  
> > -    /* Don't attempt accesses larger than the maximum.  */
> > -    if (l > access_size_max) {
> > -        l = access_size_max;
> > +    /* Don't attempt accesses larger than the maximum or unsupported sizes.  */
> > +    if (l >= access_size_max) {
> > +        return access_size_max;
> > +    } else {
> > +        if (l >= 8) {
> > +            return 8;
> > +        } else if (l >= 4) {
> > +            return 4;
> > +        } else if (l >= 2) {
> > +            return 2;
> > +        } else {
> > +            return 1;
> > +        }
> >      }
> > -
> > -    return l;
> >  }
> >  
> >  bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
> > 
> 
> Considering that each block contains a return statement, I'd drop the
> else's:
> 
>     if (l >= access_size_max) {
>         return access_size_max;
>     }
>     if (l >= 8) {
>         return 8;
>     }
>     if (l >= 4) {
>         return 4;
>     }
>     if (l >= 2) {
>         return 2;
>     }
>     return 1;
> 
> Or even
> 
>     return l >= access_size_max ? access_size_max :
>            l >= 8               ? 8               :
>            l >= 4               ? 4               :
>            l >= 2               ? 2               :
>            1;
> 
> But this is just bikeshedding, so I'm not suggesting it.
> 
> Regarding function... I can at least understand this code. So, you want
> to find the most significant bit set in "l", and clear everything else.
> If said leftmost bit is to the left of bit#3, then use bit#3 instead.
> 
> This idea should work if "l" is already a whole power of two.
> 
>     if (l >= access_size_max) {
>         return access_size_max;
>     }
>     return 1 << max(3, lmb(l));
> 
> What Paolo posted seems almost identical.
> 
> clz32(l):                     leading zeros in "l"
> qemu_fls(l) == 32 - clz32(l): position of leftmost bit set, 1-based
> qemu_fls(l) - 1:              position of leftmost bit set, 0-based
> 
> Not sure if the (l & (l - 1)) check is needed in Paolo's patch. clz32()
> is not generally usable when l==0, so maybe that's (too) what the check
> is for. OTOH maybe l==0 is not even possible when entering
> memory_access_size().
> 
> Second, Paolo's patch might lack the "max(3, ...)" part. Since you
> didn't call my previous example with l==9 retarded, I guess clamping
> (qemu_fls(l) - 1) at 3 would be necessary.

Whether we need to clamp on 3 really depends on the caller.  I'm
actually doubtful that this function ever gets called with l > 8.  So I
think Paolo's code works ok.  It's possible your example of l == 9 was a
red herring for my code, but I didn't have enough faith in it anyway.

> Third, clz32() is probably very fast when gcc has a builtin for it, and
> probably slower than your open-coded version otherwsie.

Nope, the open coded version in v4 is significantly faster.  See the
attached test programs.  On my laptop I get these results (compiled with
-O):

$ time ./test-open

real	0m7.442s
user	0m7.412s
sys	0m0.005s

$ time ./test-fls

real	0m9.202s
user	0m9.117s
sys	0m0.024s

$ time ./test-pow2floor

real	0m13.884s
user	0m13.796s
sys	0m0.013s


At higher optimization levels the race gets a lot closer, but the open
coded version still seems to have an advantage (assuming the test code
even remains relevant at higher levels).  So, I conclude that it's
faster to open code for the very limited range of a power-of-2 function
we need here.

> I still don't know enough about this topic, but I like this patch
> because I can understand the intent at least :)
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>

Thanks!
Alex

Alex Williamson Aug. 17, 2013, 3:19 p.m. UTC | #5

On Sat, 2013-08-17 at 08:33 +0200, Paolo Bonzini wrote:
> Il 16/08/2013 23:58, Alex Williamson ha scritto:
> > Since commit 23326164 we align access sizes to match the alignment of
> > the address, but we don't align the access size itself.  This means we
> > let illegal access sizes (ex. 3) slip through if the address is
> > sufficiently aligned (ex. 4).  This results in an abort which would be
> > easy for a guest to trigger.  Account for aligning the access size.
> 
> Is it the same as this?
> 
> http://lists.gnu.org/archive/html/qemu-devel/2013-07/msg05398.html
> 
> (which perhaps is buggy as your v1/v2/v3 :))?

Too bad this didn't make 1.6.  I suspect your patch is ok because I
don't think we're going to see it called with a length greater than 8.
Maybe I don't even need that test in my version, but it's reassuring to
have it.  As I note in my reply to Laszlo, using generic power-of-2
functions is quite a bit slower than the limited case we need to handle,
so while initially tempted by fancy algorithms, I actually prefer the
version below.  Thanks,

Alex

> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > Cc: qemu-stable@nongnu.org
> > ---
> > 
> > v4: KISS
> > v3: Highest power of 2, not lowest
> > v2: Remove unnecessary loop condition
> > 
> >  exec.c |   18 +++++++++++++-----
> >  1 file changed, 13 insertions(+), 5 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 3ca9381..67a822c 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1924,12 +1924,20 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
> >          }
> >      }
> >  
> > -    /* Don't attempt accesses larger than the maximum.  */
> > -    if (l > access_size_max) {
> > -        l = access_size_max;
> > +    /* Don't attempt accesses larger than the maximum or unsupported sizes.  */
> > +    if (l >= access_size_max) {
> > +        return access_size_max;
> > +    } else {
> > +        if (l >= 8) {
> > +            return 8;
> > +        } else if (l >= 4) {
> > +            return 4;
> > +        } else if (l >= 2) {
> > +            return 2;
> > +        } else {
> > +            return 1;
> > +        }
> >      }
> > -
> > -    return l;
> >  }
> >  
> >  bool address_space_rw(AddressSpace *as, hwaddr addr, uint8_t *buf,
> > 
> > 
> > 
>

Paolo Bonzini Aug. 17, 2013, 5:58 p.m. UTC | #6

Il 17/08/2013 10:23, Laszlo Ersek ha scritto:
> What Paolo posted seems almost identical.
> 
> clz32(l):                     leading zeros in "l"
> qemu_fls(l) == 32 - clz32(l): position of leftmost bit set, 1-based
> qemu_fls(l) - 1:              position of leftmost bit set, 0-based
> 
> Not sure if the (l & (l - 1)) check is needed in Paolo's patch. clz32()
> is not generally usable when l==0, so maybe that's (too) what the check
> is for. OTOH maybe l==0 is not even possible when entering
> memory_access_size().

The check was an attempt at placating complaints about possible
performance problems. :)

> Second, Paolo's patch might lack the "max(3, ...)" part. Since you
> didn't call my previous example with l==9 retarded, I guess clamping
> (qemu_fls(l) - 1) at 3 would be necessary.

That shouldn't happen, since an uint64_t is all you have for the datum.
 access_size_max should never exceed 8.

I don't really care which patch goes in, Alex's is fine as well.

Paolo

[v4] exec: Fix non-power-of-2 sized accesses

Commit Message

Comments

Patch