mbox series

[0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

Message ID 1531154299-28349-1-git-send-email-Richard.Earnshaw@arm.com
Headers show
Series Mitigation against unsafe data speculation (CVE-2017-5753) | expand

Message

Richard Earnshaw (lists) July 9, 2018, 4:38 p.m. UTC
The patches I posted earlier this year for mitigating against
CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
which it became obvious that a rethink was needed.  This mail, and the
following patches attempt to address that feedback and present a new
approach to mitigating against this form of attack surface.

There were two major issues with the original approach:

- The speculation bounds were too tightly constrained - essentially
  they had to represent and upper and lower bound on a pointer, or a
  pointer offset.
- The speculation constraints could only cover the immediately preceding
  branch, which often did not fit well with the structure of the existing
  code.

An additional criticism was that the shape of the intrinsic did not
fit particularly well with systems that used a single speculation
barrier that essentially had to wait until all preceding speculation
had to be resolved.

To address all of the above, these patches adopt a new approach, based
in part on a posting by Chandler Carruth to the LLVM developers list
(https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
but which we have extended to deal with inter-function speculation.
The patches divide the problem into two halves.

The first half is some target-specific code to track the speculation
condition through the generated code to provide an internal variable
which can tell us whether or not the CPU's control flow speculation
matches the data flow calculations.  The idea is that the internal
variable starts with the value TRUE and if the CPU's control flow
speculation ever causes a jump to the wrong block of code the variable
becomes false until such time as the incorrect control flow
speculation gets unwound.

The second half is that a new intrinsic function is introduced that is
much simpler than we had before.  The basic version of the intrinsic
is now simply:

      T var = __builtin_speculation_safe_value (T unsafe_var);

Full details of the syntax can be found in the documentation patch, in
patch 1.  In summary, when not speculating the intrinsic returns
unsafe_var; when speculating then if it can be shown that the
speculative flow has diverged from the intended control flow then zero
is returned.  An optional second argument can be used to return an
alternative value to zero.  The builtin may cause execution to pause
until the speculation state is resolved.

There are seven patches in this set, as follows.

1) Introduces the new intrinsic __builtin_sepculation_safe_value.
2) Adds a basic hard barrier implementation for AArch32 (arm) state.
3) Adds a basic hard barrier implementation for AArch64 state.
4) Adds a new command-line option -mtrack-speculation (currently a no-op).
5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
6) Adds the new speculation tracking pass for AArch64
7) Uses the new speculation tracking pass to generate CSDB-based barrier
   sequences

I haven't added a speculation-tracking pass for AArch32 at this time.
It is possible to do this, but would require quite a lot of rework for
the arm backend due to the limited number of registers that are
available.

Although patch 6 is AArch64 specific, I'd appreciate a review from
someone more familiar with the branch edge code than myself.  There
appear to be a number of tricky issues with more complex edges so I'd
like a second opinion on that code in case I've missed an important
case.

R.

  

Richard Earnshaw (7):
  Add __builtin_speculation_safe_value
  Arm - add speculation_barrier pattern
  AArch64 - add speculation barrier
  AArch64 - Add new option -mtrack-speculation
  AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
  AArch64 - new pass to add conditional-branch speculation tracking
  AArch64 - use CSDB based sequences if speculation tracking is enabled

 gcc/builtin-types.def                     |   6 +
 gcc/builtins.c                            |  57 ++++
 gcc/builtins.def                          |  20 ++
 gcc/c-family/c-common.c                   | 143 +++++++++
 gcc/c-family/c-cppbuiltin.c               |   5 +-
 gcc/config.gcc                            |   2 +-
 gcc/config/aarch64/aarch64-passes.def     |   1 +
 gcc/config/aarch64/aarch64-protos.h       |   3 +-
 gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++
 gcc/config/aarch64/aarch64.c              |  88 +++++-
 gcc/config/aarch64/aarch64.md             | 140 ++++++++-
 gcc/config/aarch64/aarch64.opt            |   4 +
 gcc/config/aarch64/iterators.md           |   3 +
 gcc/config/aarch64/t-aarch64              |  10 +
 gcc/config/arm/arm.md                     |  21 ++
 gcc/config/arm/unspecs.md                 |   1 +
 gcc/doc/cpp.texi                          |   4 +
 gcc/doc/extend.texi                       |  29 ++
 gcc/doc/invoke.texi                       |  10 +-
 gcc/doc/md.texi                           |  15 +
 gcc/doc/tm.texi                           |  20 ++
 gcc/doc/tm.texi.in                        |   2 +
 gcc/target.def                            |  23 ++
 gcc/targhooks.c                           |  27 ++
 gcc/targhooks.h                           |   2 +
 gcc/testsuite/gcc.dg/spec-barrier-1.c     |  40 +++
 gcc/testsuite/gcc.dg/spec-barrier-2.c     |  19 ++
 gcc/testsuite/gcc.dg/spec-barrier-3.c     |  13 +
 28 files changed, 1191 insertions(+), 11 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-speculation.cc
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c

Comments

Jeff Law July 9, 2018, 11:13 p.m. UTC | #1
On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
> 
> The patches I posted earlier this year for mitigating against
> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
> which it became obvious that a rethink was needed.  This mail, and the
> following patches attempt to address that feedback and present a new
> approach to mitigating against this form of attack surface.
> 
> There were two major issues with the original approach:
> 
> - The speculation bounds were too tightly constrained - essentially
>   they had to represent and upper and lower bound on a pointer, or a
>   pointer offset.
> - The speculation constraints could only cover the immediately preceding
>   branch, which often did not fit well with the structure of the existing
>   code.
> 
> An additional criticism was that the shape of the intrinsic did not
> fit particularly well with systems that used a single speculation
> barrier that essentially had to wait until all preceding speculation
> had to be resolved.
Right.  I suggest the Intel and IBM reps chime in on the updated semantics.

> 
> To address all of the above, these patches adopt a new approach, based
> in part on a posting by Chandler Carruth to the LLVM developers list
> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
> but which we have extended to deal with inter-function speculation.
> The patches divide the problem into two halves.
We're essentially turning the control dependency into a value that we
can then use to munge the pointer or the resultant data.

> 
> The first half is some target-specific code to track the speculation
> condition through the generated code to provide an internal variable
> which can tell us whether or not the CPU's control flow speculation
> matches the data flow calculations.  The idea is that the internal
> variable starts with the value TRUE and if the CPU's control flow
> speculation ever causes a jump to the wrong block of code the variable
> becomes false until such time as the incorrect control flow
> speculation gets unwound.
Right.

So one of the things that comes immediately to mind is you have to run
this early enough that you can still get to all the control flow and
build your predicates.  Otherwise you have do undo stuff like
conditional move generation.

On the flip side, the earlier you do this mitigation, the more you have
to worry about what the optimizers are going to do to the code later in
the pipeline.  It's almost guaranteed a naive implementation is going to
muck this up since we can propagate the state of the condition into the
arms which will make the predicate state a compile time constant.

In fact this seems to be running into the area of pointer providence and
some discussions we had around atomic a few years back.

I also wonder if this could be combined with taint analysis to produce a
much lower overhead solution in cases were developers have done analysis
and know what objects are potentially under attacker control.  So
instead of analyzing everything, we can have a much narrower focus.

The pointer munging could well run afoul of alias analysis engines that
don't expect to be seeing those kind of operations.

Anyway, just some initial high level thoughts.  I'm sure there'll be
more as I read the implementation.


Jeff
Richard Biener July 10, 2018, 7:19 a.m. UTC | #2
On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
<Richard.Earnshaw@arm.com> wrote:
>
>
> The patches I posted earlier this year for mitigating against
> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
> which it became obvious that a rethink was needed.  This mail, and the
> following patches attempt to address that feedback and present a new
> approach to mitigating against this form of attack surface.
>
> There were two major issues with the original approach:
>
> - The speculation bounds were too tightly constrained - essentially
>   they had to represent and upper and lower bound on a pointer, or a
>   pointer offset.
> - The speculation constraints could only cover the immediately preceding
>   branch, which often did not fit well with the structure of the existing
>   code.
>
> An additional criticism was that the shape of the intrinsic did not
> fit particularly well with systems that used a single speculation
> barrier that essentially had to wait until all preceding speculation
> had to be resolved.
>
> To address all of the above, these patches adopt a new approach, based
> in part on a posting by Chandler Carruth to the LLVM developers list
> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
> but which we have extended to deal with inter-function speculation.
> The patches divide the problem into two halves.
>
> The first half is some target-specific code to track the speculation
> condition through the generated code to provide an internal variable
> which can tell us whether or not the CPU's control flow speculation
> matches the data flow calculations.  The idea is that the internal
> variable starts with the value TRUE and if the CPU's control flow
> speculation ever causes a jump to the wrong block of code the variable
> becomes false until such time as the incorrect control flow
> speculation gets unwound.
>
> The second half is that a new intrinsic function is introduced that is
> much simpler than we had before.  The basic version of the intrinsic
> is now simply:
>
>       T var = __builtin_speculation_safe_value (T unsafe_var);
>
> Full details of the syntax can be found in the documentation patch, in
> patch 1.  In summary, when not speculating the intrinsic returns
> unsafe_var; when speculating then if it can be shown that the
> speculative flow has diverged from the intended control flow then zero
> is returned.  An optional second argument can be used to return an
> alternative value to zero.  The builtin may cause execution to pause
> until the speculation state is resolved.

So a trivial target implementation would be to emit a barrier and then
it would always return unsafe_var and never zero.  What I don't understand
fully is what users should do here, thus what the value of ever returning
"unsafe" is.  Also I wonder why the API is forcing you to single-out a
special value instead of doing

 bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
 if (!safe)
   /* what now? */

I'm only guessing that the correct way to handle "unsafe" is basically

 while (__builtin_speculation_safe_value (val) == 0)
    ;

use val, it's now safe

that is, the return value is only interesting in sofar as to whether it is equal
to val or the special value?

That said, I wonder why we don't hide that details from the user and
provide a predicate instead.

Richard.

> There are seven patches in this set, as follows.
>
> 1) Introduces the new intrinsic __builtin_sepculation_safe_value.
> 2) Adds a basic hard barrier implementation for AArch32 (arm) state.
> 3) Adds a basic hard barrier implementation for AArch64 state.
> 4) Adds a new command-line option -mtrack-speculation (currently a no-op).
> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
> 6) Adds the new speculation tracking pass for AArch64
> 7) Uses the new speculation tracking pass to generate CSDB-based barrier
>    sequences
>
> I haven't added a speculation-tracking pass for AArch32 at this time.
> It is possible to do this, but would require quite a lot of rework for
> the arm backend due to the limited number of registers that are
> available.
>
> Although patch 6 is AArch64 specific, I'd appreciate a review from
> someone more familiar with the branch edge code than myself.  There
> appear to be a number of tricky issues with more complex edges so I'd
> like a second opinion on that code in case I've missed an important
> case.
>
> R.
>
>
>
> Richard Earnshaw (7):
>   Add __builtin_speculation_safe_value
>   Arm - add speculation_barrier pattern
>   AArch64 - add speculation barrier
>   AArch64 - Add new option -mtrack-speculation
>   AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
>   AArch64 - new pass to add conditional-branch speculation tracking
>   AArch64 - use CSDB based sequences if speculation tracking is enabled
>
>  gcc/builtin-types.def                     |   6 +
>  gcc/builtins.c                            |  57 ++++
>  gcc/builtins.def                          |  20 ++
>  gcc/c-family/c-common.c                   | 143 +++++++++
>  gcc/c-family/c-cppbuiltin.c               |   5 +-
>  gcc/config.gcc                            |   2 +-
>  gcc/config/aarch64/aarch64-passes.def     |   1 +
>  gcc/config/aarch64/aarch64-protos.h       |   3 +-
>  gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++
>  gcc/config/aarch64/aarch64.c              |  88 +++++-
>  gcc/config/aarch64/aarch64.md             | 140 ++++++++-
>  gcc/config/aarch64/aarch64.opt            |   4 +
>  gcc/config/aarch64/iterators.md           |   3 +
>  gcc/config/aarch64/t-aarch64              |  10 +
>  gcc/config/arm/arm.md                     |  21 ++
>  gcc/config/arm/unspecs.md                 |   1 +
>  gcc/doc/cpp.texi                          |   4 +
>  gcc/doc/extend.texi                       |  29 ++
>  gcc/doc/invoke.texi                       |  10 +-
>  gcc/doc/md.texi                           |  15 +
>  gcc/doc/tm.texi                           |  20 ++
>  gcc/doc/tm.texi.in                        |   2 +
>  gcc/target.def                            |  23 ++
>  gcc/targhooks.c                           |  27 ++
>  gcc/targhooks.h                           |   2 +
>  gcc/testsuite/gcc.dg/spec-barrier-1.c     |  40 +++
>  gcc/testsuite/gcc.dg/spec-barrier-2.c     |  19 ++
>  gcc/testsuite/gcc.dg/spec-barrier-3.c     |  13 +
>  28 files changed, 1191 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/config/aarch64/aarch64-speculation.cc
>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c
Richard Earnshaw (lists) July 10, 2018, 8:39 a.m. UTC | #3
On 10/07/18 08:19, Richard Biener wrote:
> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
> <Richard.Earnshaw@arm.com> wrote:
>>
>>
>> The patches I posted earlier this year for mitigating against
>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>> which it became obvious that a rethink was needed.  This mail, and the
>> following patches attempt to address that feedback and present a new
>> approach to mitigating against this form of attack surface.
>>
>> There were two major issues with the original approach:
>>
>> - The speculation bounds were too tightly constrained - essentially
>>   they had to represent and upper and lower bound on a pointer, or a
>>   pointer offset.
>> - The speculation constraints could only cover the immediately preceding
>>   branch, which often did not fit well with the structure of the existing
>>   code.
>>
>> An additional criticism was that the shape of the intrinsic did not
>> fit particularly well with systems that used a single speculation
>> barrier that essentially had to wait until all preceding speculation
>> had to be resolved.
>>
>> To address all of the above, these patches adopt a new approach, based
>> in part on a posting by Chandler Carruth to the LLVM developers list
>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>> but which we have extended to deal with inter-function speculation.
>> The patches divide the problem into two halves.
>>
>> The first half is some target-specific code to track the speculation
>> condition through the generated code to provide an internal variable
>> which can tell us whether or not the CPU's control flow speculation
>> matches the data flow calculations.  The idea is that the internal
>> variable starts with the value TRUE and if the CPU's control flow
>> speculation ever causes a jump to the wrong block of code the variable
>> becomes false until such time as the incorrect control flow
>> speculation gets unwound.
>>
>> The second half is that a new intrinsic function is introduced that is
>> much simpler than we had before.  The basic version of the intrinsic
>> is now simply:
>>
>>       T var = __builtin_speculation_safe_value (T unsafe_var);
>>
>> Full details of the syntax can be found in the documentation patch, in
>> patch 1.  In summary, when not speculating the intrinsic returns
>> unsafe_var; when speculating then if it can be shown that the
>> speculative flow has diverged from the intended control flow then zero
>> is returned.  An optional second argument can be used to return an
>> alternative value to zero.  The builtin may cause execution to pause
>> until the speculation state is resolved.
> 
> So a trivial target implementation would be to emit a barrier and then
> it would always return unsafe_var and never zero.  What I don't understand
> fully is what users should do here, thus what the value of ever returning
> "unsafe" is.  Also I wonder why the API is forcing you to single-out a
> special value instead of doing
> 
>  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
>  if (!safe)
>    /* what now? */
> 
> I'm only guessing that the correct way to handle "unsafe" is basically
> 
>  while (__builtin_speculation_safe_value (val) == 0)
>     ;
> 
> use val, it's now safe

No, a safe version of val is returned, not a bool telling you it is now
safe to use the original.  You must use the sanitized version in future,
not the unprotected version.


So the usage is going to be more like:

val = __builtin_speculation_safe_value (val);  // Overwrite val with a
sanitized version.

You have to use the cleaned up version, the unclean version is still
vulnerable to incorrect speculation.

R.

> 
> that is, the return value is only interesting in sofar as to whether it is equal
> to val or the special value?
> 
> That said, I wonder why we don't hide that details from the user and
> provide a predicate instead.
> 
> Richard.
> 
>> There are seven patches in this set, as follows.
>>
>> 1) Introduces the new intrinsic __builtin_sepculation_safe_value.
>> 2) Adds a basic hard barrier implementation for AArch32 (arm) state.
>> 3) Adds a basic hard barrier implementation for AArch64 state.
>> 4) Adds a new command-line option -mtrack-speculation (currently a no-op).
>> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
>> 6) Adds the new speculation tracking pass for AArch64
>> 7) Uses the new speculation tracking pass to generate CSDB-based barrier
>>    sequences
>>
>> I haven't added a speculation-tracking pass for AArch32 at this time.
>> It is possible to do this, but would require quite a lot of rework for
>> the arm backend due to the limited number of registers that are
>> available.
>>
>> Although patch 6 is AArch64 specific, I'd appreciate a review from
>> someone more familiar with the branch edge code than myself.  There
>> appear to be a number of tricky issues with more complex edges so I'd
>> like a second opinion on that code in case I've missed an important
>> case.
>>
>> R.
>>
>>
>>
>> Richard Earnshaw (7):
>>   Add __builtin_speculation_safe_value
>>   Arm - add speculation_barrier pattern
>>   AArch64 - add speculation barrier
>>   AArch64 - Add new option -mtrack-speculation
>>   AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
>>   AArch64 - new pass to add conditional-branch speculation tracking
>>   AArch64 - use CSDB based sequences if speculation tracking is enabled
>>
>>  gcc/builtin-types.def                     |   6 +
>>  gcc/builtins.c                            |  57 ++++
>>  gcc/builtins.def                          |  20 ++
>>  gcc/c-family/c-common.c                   | 143 +++++++++
>>  gcc/c-family/c-cppbuiltin.c               |   5 +-
>>  gcc/config.gcc                            |   2 +-
>>  gcc/config/aarch64/aarch64-passes.def     |   1 +
>>  gcc/config/aarch64/aarch64-protos.h       |   3 +-
>>  gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++
>>  gcc/config/aarch64/aarch64.c              |  88 +++++-
>>  gcc/config/aarch64/aarch64.md             | 140 ++++++++-
>>  gcc/config/aarch64/aarch64.opt            |   4 +
>>  gcc/config/aarch64/iterators.md           |   3 +
>>  gcc/config/aarch64/t-aarch64              |  10 +
>>  gcc/config/arm/arm.md                     |  21 ++
>>  gcc/config/arm/unspecs.md                 |   1 +
>>  gcc/doc/cpp.texi                          |   4 +
>>  gcc/doc/extend.texi                       |  29 ++
>>  gcc/doc/invoke.texi                       |  10 +-
>>  gcc/doc/md.texi                           |  15 +
>>  gcc/doc/tm.texi                           |  20 ++
>>  gcc/doc/tm.texi.in                        |   2 +
>>  gcc/target.def                            |  23 ++
>>  gcc/targhooks.c                           |  27 ++
>>  gcc/targhooks.h                           |   2 +
>>  gcc/testsuite/gcc.dg/spec-barrier-1.c     |  40 +++
>>  gcc/testsuite/gcc.dg/spec-barrier-2.c     |  19 ++
>>  gcc/testsuite/gcc.dg/spec-barrier-3.c     |  13 +
>>  28 files changed, 1191 insertions(+), 11 deletions(-)
>>  create mode 100644 gcc/config/aarch64/aarch64-speculation.cc
>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c
Richard Earnshaw (lists) July 10, 2018, 8:49 a.m. UTC | #4
On 10/07/18 00:13, Jeff Law wrote:
> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>
>> The patches I posted earlier this year for mitigating against
>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>> which it became obvious that a rethink was needed.  This mail, and the
>> following patches attempt to address that feedback and present a new
>> approach to mitigating against this form of attack surface.
>>
>> There were two major issues with the original approach:
>>
>> - The speculation bounds were too tightly constrained - essentially
>>   they had to represent and upper and lower bound on a pointer, or a
>>   pointer offset.
>> - The speculation constraints could only cover the immediately preceding
>>   branch, which often did not fit well with the structure of the existing
>>   code.
>>
>> An additional criticism was that the shape of the intrinsic did not
>> fit particularly well with systems that used a single speculation
>> barrier that essentially had to wait until all preceding speculation
>> had to be resolved.
> Right.  I suggest the Intel and IBM reps chime in on the updated semantics.
> 

Yes, logically, this is a boolean tracker value.  In practice we use ~0
for true and 0 for false, so that we can simply use it as a mask
operation later.

I hope this intrinsic will be even more acceptable than the one that
Bill Schmidt acked previously, it's even simpler than the version we had
last time.

>>
>> To address all of the above, these patches adopt a new approach, based
>> in part on a posting by Chandler Carruth to the LLVM developers list
>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>> but which we have extended to deal with inter-function speculation.
>> The patches divide the problem into two halves.
> We're essentially turning the control dependency into a value that we
> can then use to munge the pointer or the resultant data.
> 
>>
>> The first half is some target-specific code to track the speculation
>> condition through the generated code to provide an internal variable
>> which can tell us whether or not the CPU's control flow speculation
>> matches the data flow calculations.  The idea is that the internal
>> variable starts with the value TRUE and if the CPU's control flow
>> speculation ever causes a jump to the wrong block of code the variable
>> becomes false until such time as the incorrect control flow
>> speculation gets unwound.
> Right.
> 
> So one of the things that comes immediately to mind is you have to run
> this early enough that you can still get to all the control flow and
> build your predicates.  Otherwise you have do undo stuff like
> conditional move generation.

No, the opposite, in fact.  We want to run this very late, at least on
Arm systems (AArch64 or AArch32).  Conditional move instructions are
fine - they're data-flow operations, not control flow (in fact, that's
exactly what the control flow tracker instructions are).  By running it
late we avoid disrupting any of the earlier optimization passes as well.

> 
> On the flip side, the earlier you do this mitigation, the more you have
> to worry about what the optimizers are going to do to the code later in
> the pipeline.  It's almost guaranteed a naive implementation is going to
> muck this up since we can propagate the state of the condition into the
> arms which will make the predicate state a compile time constant.
> 
> In fact this seems to be running into the area of pointer providence and
> some discussions we had around atomic a few years back.
> 
> I also wonder if this could be combined with taint analysis to produce a
> much lower overhead solution in cases were developers have done analysis
> and know what objects are potentially under attacker control.  So
> instead of analyzing everything, we can have a much narrower focus.

Automatic application of the tracker to vulnerable variables would be
nice, but I haven't attempted to go there yet: at present I still rely
on the user to annotate code with the new intrinsic.

That doesn't mean that we couldn't extend the overall approach later to
include automatic tracking.

> 
> The pointer munging could well run afoul of alias analysis engines that
> don't expect to be seeing those kind of operations.

I think the pass runs late enough that it isn't a problem.

> 
> Anyway, just some initial high level thoughts.  I'm sure there'll be
> more as I read the implementation.
> 

Thanks for starting to look at this so quickly.

R.

> 
> Jeff
>
Richard Biener July 10, 2018, 10:10 a.m. UTC | #5
On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 10/07/18 08:19, Richard Biener wrote:
> > On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
> > <Richard.Earnshaw@arm.com> wrote:
> >>
> >>
> >> The patches I posted earlier this year for mitigating against
> >> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
> >> which it became obvious that a rethink was needed.  This mail, and the
> >> following patches attempt to address that feedback and present a new
> >> approach to mitigating against this form of attack surface.
> >>
> >> There were two major issues with the original approach:
> >>
> >> - The speculation bounds were too tightly constrained - essentially
> >>   they had to represent and upper and lower bound on a pointer, or a
> >>   pointer offset.
> >> - The speculation constraints could only cover the immediately preceding
> >>   branch, which often did not fit well with the structure of the existing
> >>   code.
> >>
> >> An additional criticism was that the shape of the intrinsic did not
> >> fit particularly well with systems that used a single speculation
> >> barrier that essentially had to wait until all preceding speculation
> >> had to be resolved.
> >>
> >> To address all of the above, these patches adopt a new approach, based
> >> in part on a posting by Chandler Carruth to the LLVM developers list
> >> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
> >> but which we have extended to deal with inter-function speculation.
> >> The patches divide the problem into two halves.
> >>
> >> The first half is some target-specific code to track the speculation
> >> condition through the generated code to provide an internal variable
> >> which can tell us whether or not the CPU's control flow speculation
> >> matches the data flow calculations.  The idea is that the internal
> >> variable starts with the value TRUE and if the CPU's control flow
> >> speculation ever causes a jump to the wrong block of code the variable
> >> becomes false until such time as the incorrect control flow
> >> speculation gets unwound.
> >>
> >> The second half is that a new intrinsic function is introduced that is
> >> much simpler than we had before.  The basic version of the intrinsic
> >> is now simply:
> >>
> >>       T var = __builtin_speculation_safe_value (T unsafe_var);
> >>
> >> Full details of the syntax can be found in the documentation patch, in
> >> patch 1.  In summary, when not speculating the intrinsic returns
> >> unsafe_var; when speculating then if it can be shown that the
> >> speculative flow has diverged from the intended control flow then zero
> >> is returned.  An optional second argument can be used to return an
> >> alternative value to zero.  The builtin may cause execution to pause
> >> until the speculation state is resolved.
> >
> > So a trivial target implementation would be to emit a barrier and then
> > it would always return unsafe_var and never zero.  What I don't understand
> > fully is what users should do here, thus what the value of ever returning
> > "unsafe" is.  Also I wonder why the API is forcing you to single-out a
> > special value instead of doing
> >
> >  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
> >  if (!safe)
> >    /* what now? */
> >
> > I'm only guessing that the correct way to handle "unsafe" is basically
> >
> >  while (__builtin_speculation_safe_value (val) == 0)
> >     ;
> >
> > use val, it's now safe
>
> No, a safe version of val is returned, not a bool telling you it is now
> safe to use the original.

OK, so making the old value dead is required to preserve the desired
dataflow.

But how should I use the special value that signaled "failure"?

Obviously the user isn't supposed to simply replace 'val' with

 val = __builtin_speculation_safe_value (val);

to make it speculation-proof.  So - how should the user _use_ this
builtin?  The docs do not say anything about this but says the
very confusing

+The function may use target-dependent speculation tracking state to cause
+@var{failval} to be returned when it is known that speculative
+execution has incorrectly predicted a conditional branch operation.

because speculation is about executing instructions as if they were
supposed to be executed.  Once it is known the prediciton was wrong
no more "wrong" instructions will be executed but a previously
speculated instruction cannot know it was "falsely" speculated.

Does the above try to say that the function may return failval if the
instruction is currently executed speculatively instead?  That would
make sense to me.  And return failval independent of if the speculation
later turns out to be correct or not.

>  You must use the sanitized version in future,
> not the unprotected version.
>
>
> So the usage is going to be more like:
>
> val = __builtin_speculation_safe_value (val);  // Overwrite val with a
> sanitized version.
>
> You have to use the cleaned up version, the unclean version is still
> vulnerable to incorrect speculation.

But then where does failval come into play?  The above cannot be correct
if failval matters.

Does the builtin try to say that iff the current instruction is speculated
then it will return failval, otherwise it will return val and thus following
code needs to handle 'failval' in a way that is safe?

Again, the wording "when it is known that speculative execution has
incorrectly predicted a conditional branch operation" is very confusing...

Richard.

> R.
>
> >
> > that is, the return value is only interesting in sofar as to whether it is equal
> > to val or the special value?
> >
> > That said, I wonder why we don't hide that details from the user and
> > provide a predicate instead.
> >
> > Richard.
> >
> >> There are seven patches in this set, as follows.
> >>
> >> 1) Introduces the new intrinsic __builtin_sepculation_safe_value.
> >> 2) Adds a basic hard barrier implementation for AArch32 (arm) state.
> >> 3) Adds a basic hard barrier implementation for AArch64 state.
> >> 4) Adds a new command-line option -mtrack-speculation (currently a no-op).
> >> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
> >> 6) Adds the new speculation tracking pass for AArch64
> >> 7) Uses the new speculation tracking pass to generate CSDB-based barrier
> >>    sequences
> >>
> >> I haven't added a speculation-tracking pass for AArch32 at this time.
> >> It is possible to do this, but would require quite a lot of rework for
> >> the arm backend due to the limited number of registers that are
> >> available.
> >>
> >> Although patch 6 is AArch64 specific, I'd appreciate a review from
> >> someone more familiar with the branch edge code than myself.  There
> >> appear to be a number of tricky issues with more complex edges so I'd
> >> like a second opinion on that code in case I've missed an important
> >> case.
> >>
> >> R.
> >>
> >>
> >>
> >> Richard Earnshaw (7):
> >>   Add __builtin_speculation_safe_value
> >>   Arm - add speculation_barrier pattern
> >>   AArch64 - add speculation barrier
> >>   AArch64 - Add new option -mtrack-speculation
> >>   AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
> >>   AArch64 - new pass to add conditional-branch speculation tracking
> >>   AArch64 - use CSDB based sequences if speculation tracking is enabled
> >>
> >>  gcc/builtin-types.def                     |   6 +
> >>  gcc/builtins.c                            |  57 ++++
> >>  gcc/builtins.def                          |  20 ++
> >>  gcc/c-family/c-common.c                   | 143 +++++++++
> >>  gcc/c-family/c-cppbuiltin.c               |   5 +-
> >>  gcc/config.gcc                            |   2 +-
> >>  gcc/config/aarch64/aarch64-passes.def     |   1 +
> >>  gcc/config/aarch64/aarch64-protos.h       |   3 +-
> >>  gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++
> >>  gcc/config/aarch64/aarch64.c              |  88 +++++-
> >>  gcc/config/aarch64/aarch64.md             | 140 ++++++++-
> >>  gcc/config/aarch64/aarch64.opt            |   4 +
> >>  gcc/config/aarch64/iterators.md           |   3 +
> >>  gcc/config/aarch64/t-aarch64              |  10 +
> >>  gcc/config/arm/arm.md                     |  21 ++
> >>  gcc/config/arm/unspecs.md                 |   1 +
> >>  gcc/doc/cpp.texi                          |   4 +
> >>  gcc/doc/extend.texi                       |  29 ++
> >>  gcc/doc/invoke.texi                       |  10 +-
> >>  gcc/doc/md.texi                           |  15 +
> >>  gcc/doc/tm.texi                           |  20 ++
> >>  gcc/doc/tm.texi.in                        |   2 +
> >>  gcc/target.def                            |  23 ++
> >>  gcc/targhooks.c                           |  27 ++
> >>  gcc/targhooks.h                           |   2 +
> >>  gcc/testsuite/gcc.dg/spec-barrier-1.c     |  40 +++
> >>  gcc/testsuite/gcc.dg/spec-barrier-2.c     |  19 ++
> >>  gcc/testsuite/gcc.dg/spec-barrier-3.c     |  13 +
> >>  28 files changed, 1191 insertions(+), 11 deletions(-)
> >>  create mode 100644 gcc/config/aarch64/aarch64-speculation.cc
> >>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c
>
Richard Earnshaw (lists) July 10, 2018, 10:53 a.m. UTC | #6
On 10/07/18 11:10, Richard Biener wrote:
> On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>>
>> On 10/07/18 08:19, Richard Biener wrote:
>>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
>>> <Richard.Earnshaw@arm.com> wrote:
>>>>
>>>>
>>>> The patches I posted earlier this year for mitigating against
>>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>>>> which it became obvious that a rethink was needed.  This mail, and the
>>>> following patches attempt to address that feedback and present a new
>>>> approach to mitigating against this form of attack surface.
>>>>
>>>> There were two major issues with the original approach:
>>>>
>>>> - The speculation bounds were too tightly constrained - essentially
>>>>   they had to represent and upper and lower bound on a pointer, or a
>>>>   pointer offset.
>>>> - The speculation constraints could only cover the immediately preceding
>>>>   branch, which often did not fit well with the structure of the existing
>>>>   code.
>>>>
>>>> An additional criticism was that the shape of the intrinsic did not
>>>> fit particularly well with systems that used a single speculation
>>>> barrier that essentially had to wait until all preceding speculation
>>>> had to be resolved.
>>>>
>>>> To address all of the above, these patches adopt a new approach, based
>>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>>> but which we have extended to deal with inter-function speculation.
>>>> The patches divide the problem into two halves.
>>>>
>>>> The first half is some target-specific code to track the speculation
>>>> condition through the generated code to provide an internal variable
>>>> which can tell us whether or not the CPU's control flow speculation
>>>> matches the data flow calculations.  The idea is that the internal
>>>> variable starts with the value TRUE and if the CPU's control flow
>>>> speculation ever causes a jump to the wrong block of code the variable
>>>> becomes false until such time as the incorrect control flow
>>>> speculation gets unwound.
>>>>
>>>> The second half is that a new intrinsic function is introduced that is
>>>> much simpler than we had before.  The basic version of the intrinsic
>>>> is now simply:
>>>>
>>>>       T var = __builtin_speculation_safe_value (T unsafe_var);
>>>>
>>>> Full details of the syntax can be found in the documentation patch, in
>>>> patch 1.  In summary, when not speculating the intrinsic returns
>>>> unsafe_var; when speculating then if it can be shown that the
>>>> speculative flow has diverged from the intended control flow then zero
>>>> is returned.  An optional second argument can be used to return an
>>>> alternative value to zero.  The builtin may cause execution to pause
>>>> until the speculation state is resolved.
>>>
>>> So a trivial target implementation would be to emit a barrier and then
>>> it would always return unsafe_var and never zero.  What I don't understand
>>> fully is what users should do here, thus what the value of ever returning
>>> "unsafe" is.  Also I wonder why the API is forcing you to single-out a
>>> special value instead of doing
>>>
>>>  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
>>>  if (!safe)
>>>    /* what now? */
>>>
>>> I'm only guessing that the correct way to handle "unsafe" is basically
>>>
>>>  while (__builtin_speculation_safe_value (val) == 0)
>>>     ;
>>>
>>> use val, it's now safe
>>
>> No, a safe version of val is returned, not a bool telling you it is now
>> safe to use the original.
> 
> OK, so making the old value dead is required to preserve the desired
> dataflow.
> 
> But how should I use the special value that signaled "failure"?
> 
> Obviously the user isn't supposed to simply replace 'val' with
> 
>  val = __builtin_speculation_safe_value (val);
> 
> to make it speculation-proof.  So - how should the user _use_ this
> builtin?  The docs do not say anything about this but says the
> very confusing
> 
> +The function may use target-dependent speculation tracking state to cause
> +@var{failval} to be returned when it is known that speculative
> +execution has incorrectly predicted a conditional branch operation.
> 
> because speculation is about executing instructions as if they were
> supposed to be executed.  Once it is known the prediciton was wrong
> no more "wrong" instructions will be executed but a previously
> speculated instruction cannot know it was "falsely" speculated.
> 
> Does the above try to say that the function may return failval if the
> instruction is currently executed speculatively instead?  That would
> make sense to me.  And return failval independent of if the speculation
> later turns out to be correct or not.
> 
>>  You must use the sanitized version in future,
>> not the unprotected version.
>>
>>
>> So the usage is going to be more like:
>>
>> val = __builtin_speculation_safe_value (val);  // Overwrite val with a
>> sanitized version.
>>
>> You have to use the cleaned up version, the unclean version is still
>> vulnerable to incorrect speculation.
> 
> But then where does failval come into play?  The above cannot be correct
> if failval matters.

failval only comes into play if the CPU is speculating incorrectly.
It's just a safe value that some CPUs might use until such time as the
speculation gets unwound.  In most cases it can be zero.  Perhaps a more
concrete example would be something like.


void **mem;

void* f(unsigned untrusted)
{
  if (untrusted < 100)
    return mem[untrusted];
  return NULL;
}

This can be rewritten as either:

void *mem;

void* f(unsigned untrusted)
{
  if (untrusted < 100)
    return mem[__builtin_speculation_safe_value (untrusted)];
  return NULL;
}

if leaking mem[0] is not a problem; or if you're really paranoid:

void* f(unsigned untrusted)
{
  if (untrusted < 100)
    return *__builtin_speculation_safe_value (mem + untrusted);
  return NULL;
}

which becomes a NULL pointer dereference if the speculation is
incorrect.  The programmer doesn't need to test this code (any test
would likely be useless anyway as the CPU would predict the outcome and
speculate based on such a prediction).  They do need to think a bit
about what might leak if the CPU does miss-speculate, hence the two
examples above.

A more complex case, where you might want a different failure value can
come up if you have a mask operation.  A slightly convoluted example
(derived from a real example we found in the Linux kernel) might be

if (mode32)
  mask_from = 0x100000000;
else
  mask_from = 0;

... // some code

return mem[untrusted_val & (mask_from - 1)];

It would probably be better to place the speculation cleanup nearer the
initialization of mask_from, especially if mask_from could be used more
than once; but in that case 0 is less safe than any other (since 0-1 is
all bits 1).  In this case you might write:

if (mode32)
  mask_from = 0x100000000;
else
  mask_from = 0;

mask_from = __builtin_speculation_safe_value (mask_from, 1);

... // some code

return mem[untrusted_val & (mask_from - 1)];

And in this case if the speculation is incorrect the (mask_from - 1)
expression becomes 0 and the whole range is protected.

Note that speculation is fine if it is correct, we don't want to hold
things up in that case since it will happen anyway.  It's only a problem
if the speculation is incorrect :-)


> 
> Does the builtin try to say that iff the current instruction is speculated
> then it will return failval, otherwise it will return val and thus following
> code needs to handle 'failval' in a way that is safe?

No.  From an abstract machine sense the builtin only ever returns its
first argument.  The fail value only appears if the CPU guesses the
direction of a branch and guesses *incorrectly* (incorrect speculation)

> 
> Again, the wording "when it is known that speculative execution has
> incorrectly predicted a conditional branch operation" is very confusing...
> 
> Richard.
> 
>> R.
>>
>>>
>>> that is, the return value is only interesting in sofar as to whether it is equal
>>> to val or the special value?
>>>
>>> That said, I wonder why we don't hide that details from the user and
>>> provide a predicate instead.
>>>
>>> Richard.
>>>
>>>> There are seven patches in this set, as follows.
>>>>
>>>> 1) Introduces the new intrinsic __builtin_sepculation_safe_value.
>>>> 2) Adds a basic hard barrier implementation for AArch32 (arm) state.
>>>> 3) Adds a basic hard barrier implementation for AArch64 state.
>>>> 4) Adds a new command-line option -mtrack-speculation (currently a no-op).
>>>> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
>>>> 6) Adds the new speculation tracking pass for AArch64
>>>> 7) Uses the new speculation tracking pass to generate CSDB-based barrier
>>>>    sequences
>>>>
>>>> I haven't added a speculation-tracking pass for AArch32 at this time.
>>>> It is possible to do this, but would require quite a lot of rework for
>>>> the arm backend due to the limited number of registers that are
>>>> available.
>>>>
>>>> Although patch 6 is AArch64 specific, I'd appreciate a review from
>>>> someone more familiar with the branch edge code than myself.  There
>>>> appear to be a number of tricky issues with more complex edges so I'd
>>>> like a second opinion on that code in case I've missed an important
>>>> case.
>>>>
>>>> R.
>>>>
>>>>
>>>>
>>>> Richard Earnshaw (7):
>>>>   Add __builtin_speculation_safe_value
>>>>   Arm - add speculation_barrier pattern
>>>>   AArch64 - add speculation barrier
>>>>   AArch64 - Add new option -mtrack-speculation
>>>>   AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
>>>>   AArch64 - new pass to add conditional-branch speculation tracking
>>>>   AArch64 - use CSDB based sequences if speculation tracking is enabled
>>>>
>>>>  gcc/builtin-types.def                     |   6 +
>>>>  gcc/builtins.c                            |  57 ++++
>>>>  gcc/builtins.def                          |  20 ++
>>>>  gcc/c-family/c-common.c                   | 143 +++++++++
>>>>  gcc/c-family/c-cppbuiltin.c               |   5 +-
>>>>  gcc/config.gcc                            |   2 +-
>>>>  gcc/config/aarch64/aarch64-passes.def     |   1 +
>>>>  gcc/config/aarch64/aarch64-protos.h       |   3 +-
>>>>  gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++
>>>>  gcc/config/aarch64/aarch64.c              |  88 +++++-
>>>>  gcc/config/aarch64/aarch64.md             | 140 ++++++++-
>>>>  gcc/config/aarch64/aarch64.opt            |   4 +
>>>>  gcc/config/aarch64/iterators.md           |   3 +
>>>>  gcc/config/aarch64/t-aarch64              |  10 +
>>>>  gcc/config/arm/arm.md                     |  21 ++
>>>>  gcc/config/arm/unspecs.md                 |   1 +
>>>>  gcc/doc/cpp.texi                          |   4 +
>>>>  gcc/doc/extend.texi                       |  29 ++
>>>>  gcc/doc/invoke.texi                       |  10 +-
>>>>  gcc/doc/md.texi                           |  15 +
>>>>  gcc/doc/tm.texi                           |  20 ++
>>>>  gcc/doc/tm.texi.in                        |   2 +
>>>>  gcc/target.def                            |  23 ++
>>>>  gcc/targhooks.c                           |  27 ++
>>>>  gcc/targhooks.h                           |   2 +
>>>>  gcc/testsuite/gcc.dg/spec-barrier-1.c     |  40 +++
>>>>  gcc/testsuite/gcc.dg/spec-barrier-2.c     |  19 ++
>>>>  gcc/testsuite/gcc.dg/spec-barrier-3.c     |  13 +
>>>>  28 files changed, 1191 insertions(+), 11 deletions(-)
>>>>  create mode 100644 gcc/config/aarch64/aarch64-speculation.cc
>>>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c
>>>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c
>>>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c
>>
Richard Biener July 10, 2018, 11:21 a.m. UTC | #7
On Tue, Jul 10, 2018 at 12:53 PM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 10/07/18 11:10, Richard Biener wrote:
> > On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
> > <Richard.Earnshaw@arm.com> wrote:
> >>
> >> On 10/07/18 08:19, Richard Biener wrote:
> >>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
> >>> <Richard.Earnshaw@arm.com> wrote:
> >>>>
> >>>>
> >>>> The patches I posted earlier this year for mitigating against
> >>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
> >>>> which it became obvious that a rethink was needed.  This mail, and the
> >>>> following patches attempt to address that feedback and present a new
> >>>> approach to mitigating against this form of attack surface.
> >>>>
> >>>> There were two major issues with the original approach:
> >>>>
> >>>> - The speculation bounds were too tightly constrained - essentially
> >>>>   they had to represent and upper and lower bound on a pointer, or a
> >>>>   pointer offset.
> >>>> - The speculation constraints could only cover the immediately preceding
> >>>>   branch, which often did not fit well with the structure of the existing
> >>>>   code.
> >>>>
> >>>> An additional criticism was that the shape of the intrinsic did not
> >>>> fit particularly well with systems that used a single speculation
> >>>> barrier that essentially had to wait until all preceding speculation
> >>>> had to be resolved.
> >>>>
> >>>> To address all of the above, these patches adopt a new approach, based
> >>>> in part on a posting by Chandler Carruth to the LLVM developers list
> >>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
> >>>> but which we have extended to deal with inter-function speculation.
> >>>> The patches divide the problem into two halves.
> >>>>
> >>>> The first half is some target-specific code to track the speculation
> >>>> condition through the generated code to provide an internal variable
> >>>> which can tell us whether or not the CPU's control flow speculation
> >>>> matches the data flow calculations.  The idea is that the internal
> >>>> variable starts with the value TRUE and if the CPU's control flow
> >>>> speculation ever causes a jump to the wrong block of code the variable
> >>>> becomes false until such time as the incorrect control flow
> >>>> speculation gets unwound.
> >>>>
> >>>> The second half is that a new intrinsic function is introduced that is
> >>>> much simpler than we had before.  The basic version of the intrinsic
> >>>> is now simply:
> >>>>
> >>>>       T var = __builtin_speculation_safe_value (T unsafe_var);
> >>>>
> >>>> Full details of the syntax can be found in the documentation patch, in
> >>>> patch 1.  In summary, when not speculating the intrinsic returns
> >>>> unsafe_var; when speculating then if it can be shown that the
> >>>> speculative flow has diverged from the intended control flow then zero
> >>>> is returned.  An optional second argument can be used to return an
> >>>> alternative value to zero.  The builtin may cause execution to pause
> >>>> until the speculation state is resolved.
> >>>
> >>> So a trivial target implementation would be to emit a barrier and then
> >>> it would always return unsafe_var and never zero.  What I don't understand
> >>> fully is what users should do here, thus what the value of ever returning
> >>> "unsafe" is.  Also I wonder why the API is forcing you to single-out a
> >>> special value instead of doing
> >>>
> >>>  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
> >>>  if (!safe)
> >>>    /* what now? */
> >>>
> >>> I'm only guessing that the correct way to handle "unsafe" is basically
> >>>
> >>>  while (__builtin_speculation_safe_value (val) == 0)
> >>>     ;
> >>>
> >>> use val, it's now safe
> >>
> >> No, a safe version of val is returned, not a bool telling you it is now
> >> safe to use the original.
> >
> > OK, so making the old value dead is required to preserve the desired
> > dataflow.
> >
> > But how should I use the special value that signaled "failure"?
> >
> > Obviously the user isn't supposed to simply replace 'val' with
> >
> >  val = __builtin_speculation_safe_value (val);
> >
> > to make it speculation-proof.  So - how should the user _use_ this
> > builtin?  The docs do not say anything about this but says the
> > very confusing
> >
> > +The function may use target-dependent speculation tracking state to cause
> > +@var{failval} to be returned when it is known that speculative
> > +execution has incorrectly predicted a conditional branch operation.
> >
> > because speculation is about executing instructions as if they were
> > supposed to be executed.  Once it is known the prediciton was wrong
> > no more "wrong" instructions will be executed but a previously
> > speculated instruction cannot know it was "falsely" speculated.
> >
> > Does the above try to say that the function may return failval if the
> > instruction is currently executed speculatively instead?  That would
> > make sense to me.  And return failval independent of if the speculation
> > later turns out to be correct or not.
> >
> >>  You must use the sanitized version in future,
> >> not the unprotected version.
> >>
> >>
> >> So the usage is going to be more like:
> >>
> >> val = __builtin_speculation_safe_value (val);  // Overwrite val with a
> >> sanitized version.
> >>
> >> You have to use the cleaned up version, the unclean version is still
> >> vulnerable to incorrect speculation.
> >
> > But then where does failval come into play?  The above cannot be correct
> > if failval matters.
>
> failval only comes into play if the CPU is speculating incorrectly.
> It's just a safe value that some CPUs might use until such time as the
> speculation gets unwound.  In most cases it can be zero.  Perhaps a more
> concrete example would be something like.
>
>
> void **mem;
>
> void* f(unsigned untrusted)
> {
>   if (untrusted < 100)
>     return mem[untrusted];
>   return NULL;
> }
>
> This can be rewritten as either:
>
> void *mem;
>
> void* f(unsigned untrusted)
> {
>   if (untrusted < 100)
>     return mem[__builtin_speculation_safe_value (untrusted)];
>   return NULL;
> }
>
> if leaking mem[0] is not a problem; or if you're really paranoid:
>
> void* f(unsigned untrusted)
> {
>   if (untrusted < 100)
>     return *__builtin_speculation_safe_value (mem + untrusted);
>   return NULL;
> }
>
> which becomes a NULL pointer dereference if the speculation is
> incorrect.  The programmer doesn't need to test this code (any test
> would likely be useless anyway as the CPU would predict the outcome and
> speculate based on such a prediction).  They do need to think a bit
> about what might leak if the CPU does miss-speculate, hence the two
> examples above.
>
> A more complex case, where you might want a different failure value can
> come up if you have a mask operation.  A slightly convoluted example
> (derived from a real example we found in the Linux kernel) might be
>
> if (mode32)
>   mask_from = 0x100000000;
> else
>   mask_from = 0;
>
> ... // some code
>
> return mem[untrusted_val & (mask_from - 1)];
>
> It would probably be better to place the speculation cleanup nearer the
> initialization of mask_from, especially if mask_from could be used more
> than once; but in that case 0 is less safe than any other (since 0-1 is
> all bits 1).  In this case you might write:
>
> if (mode32)
>   mask_from = 0x100000000;
> else
>   mask_from = 0;
>
> mask_from = __builtin_speculation_safe_value (mask_from, 1);
>
> ... // some code
>
> return mem[untrusted_val & (mask_from - 1)];
>
> And in this case if the speculation is incorrect the (mask_from - 1)
> expression becomes 0 and the whole range is protected.
>
> Note that speculation is fine if it is correct, we don't want to hold
> things up in that case since it will happen anyway.  It's only a problem
> if the speculation is incorrect :-)

Ok, please amend at least the user-level documentation with one or two
examples like above.

>
> >
> > Does the builtin try to say that iff the current instruction is speculated
> > then it will return failval, otherwise it will return val and thus following
> > code needs to handle 'failval' in a way that is safe?
>
> No.  From an abstract machine sense the builtin only ever returns its
> first argument.  The fail value only appears if the CPU guesses the
> direction of a branch and guesses *incorrectly* (incorrect speculation)

I understand how it works when the implementation issues a fence.
How does the actual implementation (assembly) look like that makes
the CPU know whether it speculated wrongly or not at the time the
value is needed for further speculation given the *__bsfs (mem + offset)
instruction _is_ speculated as well.

Richard.

> >
> > Again, the wording "when it is known that speculative execution has
> > incorrectly predicted a conditional branch operation" is very confusing...
> >
> > Richard.
> >
> >> R.
> >>
> >>>
> >>> that is, the return value is only interesting in sofar as to whether it is equal
> >>> to val or the special value?
> >>>
> >>> That said, I wonder why we don't hide that details from the user and
> >>> provide a predicate instead.
> >>>
> >>> Richard.
> >>>
> >>>> There are seven patches in this set, as follows.
> >>>>
> >>>> 1) Introduces the new intrinsic __builtin_sepculation_safe_value.
> >>>> 2) Adds a basic hard barrier implementation for AArch32 (arm) state.
> >>>> 3) Adds a basic hard barrier implementation for AArch64 state.
> >>>> 4) Adds a new command-line option -mtrack-speculation (currently a no-op).
> >>>> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
> >>>> 6) Adds the new speculation tracking pass for AArch64
> >>>> 7) Uses the new speculation tracking pass to generate CSDB-based barrier
> >>>>    sequences
> >>>>
> >>>> I haven't added a speculation-tracking pass for AArch32 at this time.
> >>>> It is possible to do this, but would require quite a lot of rework for
> >>>> the arm backend due to the limited number of registers that are
> >>>> available.
> >>>>
> >>>> Although patch 6 is AArch64 specific, I'd appreciate a review from
> >>>> someone more familiar with the branch edge code than myself.  There
> >>>> appear to be a number of tricky issues with more complex edges so I'd
> >>>> like a second opinion on that code in case I've missed an important
> >>>> case.
> >>>>
> >>>> R.
> >>>>
> >>>>
> >>>>
> >>>> Richard Earnshaw (7):
> >>>>   Add __builtin_speculation_safe_value
> >>>>   Arm - add speculation_barrier pattern
> >>>>   AArch64 - add speculation barrier
> >>>>   AArch64 - Add new option -mtrack-speculation
> >>>>   AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
> >>>>   AArch64 - new pass to add conditional-branch speculation tracking
> >>>>   AArch64 - use CSDB based sequences if speculation tracking is enabled
> >>>>
> >>>>  gcc/builtin-types.def                     |   6 +
> >>>>  gcc/builtins.c                            |  57 ++++
> >>>>  gcc/builtins.def                          |  20 ++
> >>>>  gcc/c-family/c-common.c                   | 143 +++++++++
> >>>>  gcc/c-family/c-cppbuiltin.c               |   5 +-
> >>>>  gcc/config.gcc                            |   2 +-
> >>>>  gcc/config/aarch64/aarch64-passes.def     |   1 +
> >>>>  gcc/config/aarch64/aarch64-protos.h       |   3 +-
> >>>>  gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++
> >>>>  gcc/config/aarch64/aarch64.c              |  88 +++++-
> >>>>  gcc/config/aarch64/aarch64.md             | 140 ++++++++-
> >>>>  gcc/config/aarch64/aarch64.opt            |   4 +
> >>>>  gcc/config/aarch64/iterators.md           |   3 +
> >>>>  gcc/config/aarch64/t-aarch64              |  10 +
> >>>>  gcc/config/arm/arm.md                     |  21 ++
> >>>>  gcc/config/arm/unspecs.md                 |   1 +
> >>>>  gcc/doc/cpp.texi                          |   4 +
> >>>>  gcc/doc/extend.texi                       |  29 ++
> >>>>  gcc/doc/invoke.texi                       |  10 +-
> >>>>  gcc/doc/md.texi                           |  15 +
> >>>>  gcc/doc/tm.texi                           |  20 ++
> >>>>  gcc/doc/tm.texi.in                        |   2 +
> >>>>  gcc/target.def                            |  23 ++
> >>>>  gcc/targhooks.c                           |  27 ++
> >>>>  gcc/targhooks.h                           |   2 +
> >>>>  gcc/testsuite/gcc.dg/spec-barrier-1.c     |  40 +++
> >>>>  gcc/testsuite/gcc.dg/spec-barrier-2.c     |  19 ++
> >>>>  gcc/testsuite/gcc.dg/spec-barrier-3.c     |  13 +
> >>>>  28 files changed, 1191 insertions(+), 11 deletions(-)
> >>>>  create mode 100644 gcc/config/aarch64/aarch64-speculation.cc
> >>>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c
> >>>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c
> >>>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c
> >>
>
Richard Earnshaw (lists) July 10, 2018, 1:43 p.m. UTC | #8
On 10/07/18 12:21, Richard Biener wrote:
> On Tue, Jul 10, 2018 at 12:53 PM Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>>
>> On 10/07/18 11:10, Richard Biener wrote:
>>> On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
>>> <Richard.Earnshaw@arm.com> wrote:
>>>>
>>>> On 10/07/18 08:19, Richard Biener wrote:
>>>>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
>>>>> <Richard.Earnshaw@arm.com> wrote:
>>>>>>
>>>>>>
>>>>>> The patches I posted earlier this year for mitigating against
>>>>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>>>>>> which it became obvious that a rethink was needed.  This mail, and the
>>>>>> following patches attempt to address that feedback and present a new
>>>>>> approach to mitigating against this form of attack surface.
>>>>>>
>>>>>> There were two major issues with the original approach:
>>>>>>
>>>>>> - The speculation bounds were too tightly constrained - essentially
>>>>>>   they had to represent and upper and lower bound on a pointer, or a
>>>>>>   pointer offset.
>>>>>> - The speculation constraints could only cover the immediately preceding
>>>>>>   branch, which often did not fit well with the structure of the existing
>>>>>>   code.
>>>>>>
>>>>>> An additional criticism was that the shape of the intrinsic did not
>>>>>> fit particularly well with systems that used a single speculation
>>>>>> barrier that essentially had to wait until all preceding speculation
>>>>>> had to be resolved.
>>>>>>
>>>>>> To address all of the above, these patches adopt a new approach, based
>>>>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>>>>> but which we have extended to deal with inter-function speculation.
>>>>>> The patches divide the problem into two halves.
>>>>>>
>>>>>> The first half is some target-specific code to track the speculation
>>>>>> condition through the generated code to provide an internal variable
>>>>>> which can tell us whether or not the CPU's control flow speculation
>>>>>> matches the data flow calculations.  The idea is that the internal
>>>>>> variable starts with the value TRUE and if the CPU's control flow
>>>>>> speculation ever causes a jump to the wrong block of code the variable
>>>>>> becomes false until such time as the incorrect control flow
>>>>>> speculation gets unwound.
>>>>>>
>>>>>> The second half is that a new intrinsic function is introduced that is
>>>>>> much simpler than we had before.  The basic version of the intrinsic
>>>>>> is now simply:
>>>>>>
>>>>>>       T var = __builtin_speculation_safe_value (T unsafe_var);
>>>>>>
>>>>>> Full details of the syntax can be found in the documentation patch, in
>>>>>> patch 1.  In summary, when not speculating the intrinsic returns
>>>>>> unsafe_var; when speculating then if it can be shown that the
>>>>>> speculative flow has diverged from the intended control flow then zero
>>>>>> is returned.  An optional second argument can be used to return an
>>>>>> alternative value to zero.  The builtin may cause execution to pause
>>>>>> until the speculation state is resolved.
>>>>>
>>>>> So a trivial target implementation would be to emit a barrier and then
>>>>> it would always return unsafe_var and never zero.  What I don't understand
>>>>> fully is what users should do here, thus what the value of ever returning
>>>>> "unsafe" is.  Also I wonder why the API is forcing you to single-out a
>>>>> special value instead of doing
>>>>>
>>>>>  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
>>>>>  if (!safe)
>>>>>    /* what now? */
>>>>>
>>>>> I'm only guessing that the correct way to handle "unsafe" is basically
>>>>>
>>>>>  while (__builtin_speculation_safe_value (val) == 0)
>>>>>     ;
>>>>>
>>>>> use val, it's now safe
>>>>
>>>> No, a safe version of val is returned, not a bool telling you it is now
>>>> safe to use the original.
>>>
>>> OK, so making the old value dead is required to preserve the desired
>>> dataflow.
>>>
>>> But how should I use the special value that signaled "failure"?
>>>
>>> Obviously the user isn't supposed to simply replace 'val' with
>>>
>>>  val = __builtin_speculation_safe_value (val);
>>>
>>> to make it speculation-proof.  So - how should the user _use_ this
>>> builtin?  The docs do not say anything about this but says the
>>> very confusing
>>>
>>> +The function may use target-dependent speculation tracking state to cause
>>> +@var{failval} to be returned when it is known that speculative
>>> +execution has incorrectly predicted a conditional branch operation.
>>>
>>> because speculation is about executing instructions as if they were
>>> supposed to be executed.  Once it is known the prediciton was wrong
>>> no more "wrong" instructions will be executed but a previously
>>> speculated instruction cannot know it was "falsely" speculated.
>>>
>>> Does the above try to say that the function may return failval if the
>>> instruction is currently executed speculatively instead?  That would
>>> make sense to me.  And return failval independent of if the speculation
>>> later turns out to be correct or not.
>>>
>>>>  You must use the sanitized version in future,
>>>> not the unprotected version.
>>>>
>>>>
>>>> So the usage is going to be more like:
>>>>
>>>> val = __builtin_speculation_safe_value (val);  // Overwrite val with a
>>>> sanitized version.
>>>>
>>>> You have to use the cleaned up version, the unclean version is still
>>>> vulnerable to incorrect speculation.
>>>
>>> But then where does failval come into play?  The above cannot be correct
>>> if failval matters.
>>
>> failval only comes into play if the CPU is speculating incorrectly.
>> It's just a safe value that some CPUs might use until such time as the
>> speculation gets unwound.  In most cases it can be zero.  Perhaps a more
>> concrete example would be something like.
>>
>>
>> void **mem;
>>
>> void* f(unsigned untrusted)
>> {
>>   if (untrusted < 100)
>>     return mem[untrusted];
>>   return NULL;
>> }
>>
>> This can be rewritten as either:
>>
>> void *mem;
>>
>> void* f(unsigned untrusted)
>> {
>>   if (untrusted < 100)
>>     return mem[__builtin_speculation_safe_value (untrusted)];
>>   return NULL;
>> }
>>
>> if leaking mem[0] is not a problem; or if you're really paranoid:
>>
>> void* f(unsigned untrusted)
>> {
>>   if (untrusted < 100)
>>     return *__builtin_speculation_safe_value (mem + untrusted);
>>   return NULL;
>> }
>>
>> which becomes a NULL pointer dereference if the speculation is
>> incorrect.  The programmer doesn't need to test this code (any test
>> would likely be useless anyway as the CPU would predict the outcome and
>> speculate based on such a prediction).  They do need to think a bit
>> about what might leak if the CPU does miss-speculate, hence the two
>> examples above.
>>
>> A more complex case, where you might want a different failure value can
>> come up if you have a mask operation.  A slightly convoluted example
>> (derived from a real example we found in the Linux kernel) might be
>>
>> if (mode32)
>>   mask_from = 0x100000000;
>> else
>>   mask_from = 0;
>>
>> ... // some code
>>
>> return mem[untrusted_val & (mask_from - 1)];
>>
>> It would probably be better to place the speculation cleanup nearer the
>> initialization of mask_from, especially if mask_from could be used more
>> than once; but in that case 0 is less safe than any other (since 0-1 is
>> all bits 1).  In this case you might write:
>>
>> if (mode32)
>>   mask_from = 0x100000000;
>> else
>>   mask_from = 0;
>>
>> mask_from = __builtin_speculation_safe_value (mask_from, 1);
>>
>> ... // some code
>>
>> return mem[untrusted_val & (mask_from - 1)];
>>
>> And in this case if the speculation is incorrect the (mask_from - 1)
>> expression becomes 0 and the whole range is protected.
>>
>> Note that speculation is fine if it is correct, we don't want to hold
>> things up in that case since it will happen anyway.  It's only a problem
>> if the speculation is incorrect :-)
> 
> Ok, please amend at least the user-level documentation with one or two
> examples like above.
> 
>>
>>>
>>> Does the builtin try to say that iff the current instruction is speculated
>>> then it will return failval, otherwise it will return val and thus following
>>> code needs to handle 'failval' in a way that is safe?
>>
>> No.  From an abstract machine sense the builtin only ever returns its
>> first argument.  The fail value only appears if the CPU guesses the
>> direction of a branch and guesses *incorrectly* (incorrect speculation)
> 
> I understand how it works when the implementation issues a fence.
> How does the actual implementation (assembly) look like that makes
> the CPU know whether it speculated wrongly or not at the time the
> value is needed for further speculation given the *__bsfs (mem + offset)
> instruction _is_ speculated as well.
> 

If you've not already read it, then I suggest you read
https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper
and the link I posted in the original mail to Chandler's posting on this
subject.

In essence, we start with a register (tracker) initialized to ~0.  At
every conditional branch we change

	b.cond tgt
	...
tgt:
	...

into

	b.cond tgt
	// tracker == ~cond ? tracker, 0
	csel	tracker, tracker, 0, ~cond
	...
tgt:
	// tracker == cond ? tracker, 0
	csel	tracker, tracker, 0, cond


Now when we need to inhibit speculation we emit

	and	safe_val, unsafe_val, tracker
	CSDB

Assuming we want the default result of 0 if speculating unsafely.  CSDB
is a light-weight barrier that ensures that all preceding
/data-processing/ instructions are using real results from earlier
instructions: it will resolve any data-flow speculation, but not
necessarily any control-flow speculation.

Obviously the inserted instructions only appear on the edges between the
blocks, but we take care of that during the insertion process.

R.

> Richard.
> 
>>>
>>> Again, the wording "when it is known that speculative execution has
>>> incorrectly predicted a conditional branch operation" is very confusing...
>>>
>>> Richard.
>>>
>>>> R.
>>>>
>>>>>
>>>>> that is, the return value is only interesting in sofar as to whether it is equal
>>>>> to val or the special value?
>>>>>
>>>>> That said, I wonder why we don't hide that details from the user and
>>>>> provide a predicate instead.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> There are seven patches in this set, as follows.
>>>>>>
>>>>>> 1) Introduces the new intrinsic __builtin_sepculation_safe_value.
>>>>>> 2) Adds a basic hard barrier implementation for AArch32 (arm) state.
>>>>>> 3) Adds a basic hard barrier implementation for AArch64 state.
>>>>>> 4) Adds a new command-line option -mtrack-speculation (currently a no-op).
>>>>>> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
>>>>>> 6) Adds the new speculation tracking pass for AArch64
>>>>>> 7) Uses the new speculation tracking pass to generate CSDB-based barrier
>>>>>>    sequences
>>>>>>
>>>>>> I haven't added a speculation-tracking pass for AArch32 at this time.
>>>>>> It is possible to do this, but would require quite a lot of rework for
>>>>>> the arm backend due to the limited number of registers that are
>>>>>> available.
>>>>>>
>>>>>> Although patch 6 is AArch64 specific, I'd appreciate a review from
>>>>>> someone more familiar with the branch edge code than myself.  There
>>>>>> appear to be a number of tricky issues with more complex edges so I'd
>>>>>> like a second opinion on that code in case I've missed an important
>>>>>> case.
>>>>>>
>>>>>> R.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Richard Earnshaw (7):
>>>>>>   Add __builtin_speculation_safe_value
>>>>>>   Arm - add speculation_barrier pattern
>>>>>>   AArch64 - add speculation barrier
>>>>>>   AArch64 - Add new option -mtrack-speculation
>>>>>>   AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
>>>>>>   AArch64 - new pass to add conditional-branch speculation tracking
>>>>>>   AArch64 - use CSDB based sequences if speculation tracking is enabled
>>>>>>
>>>>>>  gcc/builtin-types.def                     |   6 +
>>>>>>  gcc/builtins.c                            |  57 ++++
>>>>>>  gcc/builtins.def                          |  20 ++
>>>>>>  gcc/c-family/c-common.c                   | 143 +++++++++
>>>>>>  gcc/c-family/c-cppbuiltin.c               |   5 +-
>>>>>>  gcc/config.gcc                            |   2 +-
>>>>>>  gcc/config/aarch64/aarch64-passes.def     |   1 +
>>>>>>  gcc/config/aarch64/aarch64-protos.h       |   3 +-
>>>>>>  gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++
>>>>>>  gcc/config/aarch64/aarch64.c              |  88 +++++-
>>>>>>  gcc/config/aarch64/aarch64.md             | 140 ++++++++-
>>>>>>  gcc/config/aarch64/aarch64.opt            |   4 +
>>>>>>  gcc/config/aarch64/iterators.md           |   3 +
>>>>>>  gcc/config/aarch64/t-aarch64              |  10 +
>>>>>>  gcc/config/arm/arm.md                     |  21 ++
>>>>>>  gcc/config/arm/unspecs.md                 |   1 +
>>>>>>  gcc/doc/cpp.texi                          |   4 +
>>>>>>  gcc/doc/extend.texi                       |  29 ++
>>>>>>  gcc/doc/invoke.texi                       |  10 +-
>>>>>>  gcc/doc/md.texi                           |  15 +
>>>>>>  gcc/doc/tm.texi                           |  20 ++
>>>>>>  gcc/doc/tm.texi.in                        |   2 +
>>>>>>  gcc/target.def                            |  23 ++
>>>>>>  gcc/targhooks.c                           |  27 ++
>>>>>>  gcc/targhooks.h                           |   2 +
>>>>>>  gcc/testsuite/gcc.dg/spec-barrier-1.c     |  40 +++
>>>>>>  gcc/testsuite/gcc.dg/spec-barrier-2.c     |  19 ++
>>>>>>  gcc/testsuite/gcc.dg/spec-barrier-3.c     |  13 +
>>>>>>  28 files changed, 1191 insertions(+), 11 deletions(-)
>>>>>>  create mode 100644 gcc/config/aarch64/aarch64-speculation.cc
>>>>>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c
>>>>>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c
>>>>>>  create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c
>>>>
>>
Bill Schmidt July 10, 2018, 1:48 p.m. UTC | #9
> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> On 10/07/18 00:13, Jeff Law wrote:
>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>> 
>>> The patches I posted earlier this year for mitigating against
>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>>> which it became obvious that a rethink was needed.  This mail, and the
>>> following patches attempt to address that feedback and present a new
>>> approach to mitigating against this form of attack surface.
>>> 
>>> There were two major issues with the original approach:
>>> 
>>> - The speculation bounds were too tightly constrained - essentially
>>>  they had to represent and upper and lower bound on a pointer, or a
>>>  pointer offset.
>>> - The speculation constraints could only cover the immediately preceding
>>>  branch, which often did not fit well with the structure of the existing
>>>  code.
>>> 
>>> An additional criticism was that the shape of the intrinsic did not
>>> fit particularly well with systems that used a single speculation
>>> barrier that essentially had to wait until all preceding speculation
>>> had to be resolved.
>> Right.  I suggest the Intel and IBM reps chime in on the updated semantics.
>> 
> 
> Yes, logically, this is a boolean tracker value.  In practice we use ~0
> for true and 0 for false, so that we can simply use it as a mask
> operation later.
> 
> I hope this intrinsic will be even more acceptable than the one that
> Bill Schmidt acked previously, it's even simpler than the version we had
> last time.

Yes, I think this looks quite good.  Thanks!

Thanks also for digging into the speculation tracking algorithm.  This
has good potential as a conservative opt-in approach.  The obvious
concern is whether performance will be acceptable even for apps
that really want the protection.

We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 
numbers on a Skylake box.  We saw geomean degradations of about
42% (int) and 33% (fp).  (This was just one test, so caveat emptor.)
This isn't terrible given the number of potential false positives and the
early state of the algorithm, but it's still a lot from a customer perspective.
I'll be interested in whether your interprocedural improvements are
able to reduce the conservatism a bit.

Thanks,
Bill
> 
>>> 
>>> To address all of the above, these patches adopt a new approach, based
>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>> but which we have extended to deal with inter-function speculation.
>>> The patches divide the problem into two halves.
>> We're essentially turning the control dependency into a value that we
>> can then use to munge the pointer or the resultant data.
>> 
>>> 
>>> The first half is some target-specific code to track the speculation
>>> condition through the generated code to provide an internal variable
>>> which can tell us whether or not the CPU's control flow speculation
>>> matches the data flow calculations.  The idea is that the internal
>>> variable starts with the value TRUE and if the CPU's control flow
>>> speculation ever causes a jump to the wrong block of code the variable
>>> becomes false until such time as the incorrect control flow
>>> speculation gets unwound.
>> Right.
>> 
>> So one of the things that comes immediately to mind is you have to run
>> this early enough that you can still get to all the control flow and
>> build your predicates.  Otherwise you have do undo stuff like
>> conditional move generation.
> 
> No, the opposite, in fact.  We want to run this very late, at least on
> Arm systems (AArch64 or AArch32).  Conditional move instructions are
> fine - they're data-flow operations, not control flow (in fact, that's
> exactly what the control flow tracker instructions are).  By running it
> late we avoid disrupting any of the earlier optimization passes as well.
> 
>> 
>> On the flip side, the earlier you do this mitigation, the more you have
>> to worry about what the optimizers are going to do to the code later in
>> the pipeline.  It's almost guaranteed a naive implementation is going to
>> muck this up since we can propagate the state of the condition into the
>> arms which will make the predicate state a compile time constant.
>> 
>> In fact this seems to be running into the area of pointer providence and
>> some discussions we had around atomic a few years back.
>> 
>> I also wonder if this could be combined with taint analysis to produce a
>> much lower overhead solution in cases were developers have done analysis
>> and know what objects are potentially under attacker control.  So
>> instead of analyzing everything, we can have a much narrower focus.
> 
> Automatic application of the tracker to vulnerable variables would be
> nice, but I haven't attempted to go there yet: at present I still rely
> on the user to annotate code with the new intrinsic.
> 
> That doesn't mean that we couldn't extend the overall approach later to
> include automatic tracking.
> 
>> 
>> The pointer munging could well run afoul of alias analysis engines that
>> don't expect to be seeing those kind of operations.
> 
> I think the pass runs late enough that it isn't a problem.
> 
>> 
>> Anyway, just some initial high level thoughts.  I'm sure there'll be
>> more as I read the implementation.
>> 
> 
> Thanks for starting to look at this so quickly.
> 
> R.
> 
>> 
>> Jeff
Richard Earnshaw (lists) July 10, 2018, 2:14 p.m. UTC | #10
On 10/07/18 14:48, Bill Schmidt wrote:
> 
>> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>
>> On 10/07/18 00:13, Jeff Law wrote:
>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>>>
>>>> The patches I posted earlier this year for mitigating against
>>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>>>> which it became obvious that a rethink was needed.  This mail, and the
>>>> following patches attempt to address that feedback and present a new
>>>> approach to mitigating against this form of attack surface.
>>>>
>>>> There were two major issues with the original approach:
>>>>
>>>> - The speculation bounds were too tightly constrained - essentially
>>>>  they had to represent and upper and lower bound on a pointer, or a
>>>>  pointer offset.
>>>> - The speculation constraints could only cover the immediately preceding
>>>>  branch, which often did not fit well with the structure of the existing
>>>>  code.
>>>>
>>>> An additional criticism was that the shape of the intrinsic did not
>>>> fit particularly well with systems that used a single speculation
>>>> barrier that essentially had to wait until all preceding speculation
>>>> had to be resolved.
>>> Right.  I suggest the Intel and IBM reps chime in on the updated semantics.
>>>
>>
>> Yes, logically, this is a boolean tracker value.  In practice we use ~0
>> for true and 0 for false, so that we can simply use it as a mask
>> operation later.
>>
>> I hope this intrinsic will be even more acceptable than the one that
>> Bill Schmidt acked previously, it's even simpler than the version we had
>> last time.
> 
> Yes, I think this looks quite good.  Thanks!
> 
> Thanks also for digging into the speculation tracking algorithm.  This
> has good potential as a conservative opt-in approach.  The obvious
> concern is whether performance will be acceptable even for apps
> that really want the protection.
> 
> We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 
> numbers on a Skylake box.  We saw geomean degradations of about
> 42% (int) and 33% (fp).  (This was just one test, so caveat emptor.)
> This isn't terrible given the number of potential false positives and the
> early state of the algorithm, but it's still a lot from a customer perspective.
> I'll be interested in whether your interprocedural improvements are
> able to reduce the conservatism a bit.
> 

So I don't have any numbers for SPEC2006.  I have some initial numbers
for SPEC2000 when just adding the tracking code (so not applying the
second part of the mitigation).  In that case INT2000 is down by ~13%
and FP2000 was by comparison almost in the noise (~2.4%).

Applying the tracker value to all memory loads would push those numbers
up significantly, I suspect.  That's part of the reason for preferring
the intrinsic rather than automatic mitigation: the intrinsic is much
more targeted.

R.


> Thanks,
> Bill
>>
>>>>
>>>> To address all of the above, these patches adopt a new approach, based
>>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>>> but which we have extended to deal with inter-function speculation.
>>>> The patches divide the problem into two halves.
>>> We're essentially turning the control dependency into a value that we
>>> can then use to munge the pointer or the resultant data.
>>>
>>>>
>>>> The first half is some target-specific code to track the speculation
>>>> condition through the generated code to provide an internal variable
>>>> which can tell us whether or not the CPU's control flow speculation
>>>> matches the data flow calculations.  The idea is that the internal
>>>> variable starts with the value TRUE and if the CPU's control flow
>>>> speculation ever causes a jump to the wrong block of code the variable
>>>> becomes false until such time as the incorrect control flow
>>>> speculation gets unwound.
>>> Right.
>>>
>>> So one of the things that comes immediately to mind is you have to run
>>> this early enough that you can still get to all the control flow and
>>> build your predicates.  Otherwise you have do undo stuff like
>>> conditional move generation.
>>
>> No, the opposite, in fact.  We want to run this very late, at least on
>> Arm systems (AArch64 or AArch32).  Conditional move instructions are
>> fine - they're data-flow operations, not control flow (in fact, that's
>> exactly what the control flow tracker instructions are).  By running it
>> late we avoid disrupting any of the earlier optimization passes as well.
>>
>>>
>>> On the flip side, the earlier you do this mitigation, the more you have
>>> to worry about what the optimizers are going to do to the code later in
>>> the pipeline.  It's almost guaranteed a naive implementation is going to
>>> muck this up since we can propagate the state of the condition into the
>>> arms which will make the predicate state a compile time constant.
>>>
>>> In fact this seems to be running into the area of pointer providence and
>>> some discussions we had around atomic a few years back.
>>>
>>> I also wonder if this could be combined with taint analysis to produce a
>>> much lower overhead solution in cases were developers have done analysis
>>> and know what objects are potentially under attacker control.  So
>>> instead of analyzing everything, we can have a much narrower focus.
>>
>> Automatic application of the tracker to vulnerable variables would be
>> nice, but I haven't attempted to go there yet: at present I still rely
>> on the user to annotate code with the new intrinsic.
>>
>> That doesn't mean that we couldn't extend the overall approach later to
>> include automatic tracking.
>>
>>>
>>> The pointer munging could well run afoul of alias analysis engines that
>>> don't expect to be seeing those kind of operations.
>>
>> I think the pass runs late enough that it isn't a problem.
>>
>>>
>>> Anyway, just some initial high level thoughts.  I'm sure there'll be
>>> more as I read the implementation.
>>>
>>
>> Thanks for starting to look at this so quickly.
>>
>> R.
>>
>>>
>>> Jeff
>
Jeff Law July 10, 2018, 3:42 p.m. UTC | #11
On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote:
> On 10/07/18 00:13, Jeff Law wrote:
>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>>
>>> To address all of the above, these patches adopt a new approach, based
>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>> but which we have extended to deal with inter-function speculation.
>>> The patches divide the problem into two halves.
>> We're essentially turning the control dependency into a value that we
>> can then use to munge the pointer or the resultant data.
>>
>>>
>>> The first half is some target-specific code to track the speculation
>>> condition through the generated code to provide an internal variable
>>> which can tell us whether or not the CPU's control flow speculation
>>> matches the data flow calculations.  The idea is that the internal
>>> variable starts with the value TRUE and if the CPU's control flow
>>> speculation ever causes a jump to the wrong block of code the variable
>>> becomes false until such time as the incorrect control flow
>>> speculation gets unwound.
>> Right.
>>
>> So one of the things that comes immediately to mind is you have to run
>> this early enough that you can still get to all the control flow and
>> build your predicates.  Otherwise you have do undo stuff like
>> conditional move generation.
> 
> No, the opposite, in fact.  We want to run this very late, at least on
> Arm systems (AArch64 or AArch32).  Conditional move instructions are
> fine - they're data-flow operations, not control flow (in fact, that's
> exactly what the control flow tracker instructions are).  By running it
> late we avoid disrupting any of the earlier optimization passes as well.
Ack.  I looked at the aarch64 implementation after sending my message
and it clearly runs very late.

I haven't convinced myself that all the work generic parts of the
compiler to rewrite and eliminate conditionals is safe.  But even if it
isn't, you're probably getting enough coverage to drastically reduce the
attack surface.  I'm going to have to think about the early
transformations we make and how they interact here harder.  But I think
the general approach can dramatically reduce the attack surface.

With running very late, as you noted, the big concern is edge
insertions.  I'm going to have to re-familiarize myself with all the
rules there :-)    I did note you stumbled on some of the issues in that
space (what to do with calls that throw exceptions).

Placement before the final bbro pass probably avoids a lot of pain.  So
the basic placement seems reasonable.  And again, if we're missing
something due to the effects of earlier passes, I still think you're
reducing the attack surface in a meaningful way.



> 
>>
>> On the flip side, the earlier you do this mitigation, the more you have
>> to worry about what the optimizers are going to do to the code later in
>> the pipeline.  It's almost guaranteed a naive implementation is going to
>> muck this up since we can propagate the state of the condition into the
>> arms which will make the predicate state a compile time constant.
>>
>> In fact this seems to be running into the area of pointer providence and
>> some discussions we had around atomic a few years back.
>>
>> I also wonder if this could be combined with taint analysis to produce a
>> much lower overhead solution in cases were developers have done analysis
>> and know what objects are potentially under attacker control.  So
>> instead of analyzing everything, we can have a much narrower focus.
> 
> Automatic application of the tracker to vulnerable variables would be
> nice, but I haven't attempted to go there yet: at present I still rely
> on the user to annotate code with the new intrinsic.
ACK.  My sense is we are going to want taint analysis.  I think it'd be
useful here and in other contexts.  However, I don't think it
necessarily needs to be a requirement to go forward.

I'm going to review the atomic discussion we had a while back with the
kernel folks as well as some pointer providence discussions I've had
with Martin S.  I can't put my finger on it yet, but I still have the
sense there's some interactions here we want to at least be aware of.

> 
> That doesn't mean that we couldn't extend the overall approach later to
> include automatic tracking.
Absolutely.

> 
>>
>> The pointer munging could well run afoul of alias analysis engines that
>> don't expect to be seeing those kind of operations.
> 
> I think the pass runs late enough that it isn't a problem.
Yea, I think you're right.


> 
>>
>> Anyway, just some initial high level thoughts.  I'm sure there'll be
>> more as I read the implementation.
>>
> 
> Thanks for starting to look at this so quickly.
NP.  Your timing to come back to this is good.

Jeff
Jeff Law July 10, 2018, 3:44 p.m. UTC | #12
On 07/10/2018 08:14 AM, Richard Earnshaw (lists) wrote:
> On 10/07/18 14:48, Bill Schmidt wrote:
>>
>>> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>>
>>> On 10/07/18 00:13, Jeff Law wrote:
>>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>>>>
>>>>> The patches I posted earlier this year for mitigating against
>>>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>>>>> which it became obvious that a rethink was needed.  This mail, and the
>>>>> following patches attempt to address that feedback and present a new
>>>>> approach to mitigating against this form of attack surface.
>>>>>
>>>>> There were two major issues with the original approach:
>>>>>
>>>>> - The speculation bounds were too tightly constrained - essentially
>>>>>  they had to represent and upper and lower bound on a pointer, or a
>>>>>  pointer offset.
>>>>> - The speculation constraints could only cover the immediately preceding
>>>>>  branch, which often did not fit well with the structure of the existing
>>>>>  code.
>>>>>
>>>>> An additional criticism was that the shape of the intrinsic did not
>>>>> fit particularly well with systems that used a single speculation
>>>>> barrier that essentially had to wait until all preceding speculation
>>>>> had to be resolved.
>>>> Right.  I suggest the Intel and IBM reps chime in on the updated semantics.
>>>>
>>>
>>> Yes, logically, this is a boolean tracker value.  In practice we use ~0
>>> for true and 0 for false, so that we can simply use it as a mask
>>> operation later.
>>>
>>> I hope this intrinsic will be even more acceptable than the one that
>>> Bill Schmidt acked previously, it's even simpler than the version we had
>>> last time.
>>
>> Yes, I think this looks quite good.  Thanks!
>>
>> Thanks also for digging into the speculation tracking algorithm.  This
>> has good potential as a conservative opt-in approach.  The obvious
>> concern is whether performance will be acceptable even for apps
>> that really want the protection.
>>
>> We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 
>> numbers on a Skylake box.  We saw geomean degradations of about
>> 42% (int) and 33% (fp).  (This was just one test, so caveat emptor.)
>> This isn't terrible given the number of potential false positives and the
>> early state of the algorithm, but it's still a lot from a customer perspective.
>> I'll be interested in whether your interprocedural improvements are
>> able to reduce the conservatism a bit.
>>
> 
> So I don't have any numbers for SPEC2006.  I have some initial numbers
> for SPEC2000 when just adding the tracking code (so not applying the
> second part of the mitigation).  In that case INT2000 is down by ~13%
> and FP2000 was by comparison almost in the noise (~2.4%).
> 
> Applying the tracker value to all memory loads would push those numbers
> up significantly, I suspect.  That's part of the reason for preferring
> the intrinsic rather than automatic mitigation: the intrinsic is much
> more targeted.
Right.  Fully automatic without any "hints" is going to be very
expensive, possibly prohibitively expensive.

Using the intrinsic or exploiting some kind of taint analysis has the
potential to drastically reduce the overhead.  At least it seems like
they should :-)

jeff
Jeff Law July 10, 2018, 3:55 p.m. UTC | #13
On 07/10/2018 04:53 AM, Richard Earnshaw (lists) wrote:
> On 10/07/18 11:10, Richard Biener wrote:
>> On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
>> <Richard.Earnshaw@arm.com> wrote:
>>>
>>> On 10/07/18 08:19, Richard Biener wrote:
>>>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
>>>> <Richard.Earnshaw@arm.com> wrote:
>>>>>
>>>>>
>>>>> The patches I posted earlier this year for mitigating against
>>>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>>>>> which it became obvious that a rethink was needed.  This mail, and the
>>>>> following patches attempt to address that feedback and present a new
>>>>> approach to mitigating against this form of attack surface.
>>>>>
>>>>> There were two major issues with the original approach:
>>>>>
>>>>> - The speculation bounds were too tightly constrained - essentially
>>>>>   they had to represent and upper and lower bound on a pointer, or a
>>>>>   pointer offset.
>>>>> - The speculation constraints could only cover the immediately preceding
>>>>>   branch, which often did not fit well with the structure of the existing
>>>>>   code.
>>>>>
>>>>> An additional criticism was that the shape of the intrinsic did not
>>>>> fit particularly well with systems that used a single speculation
>>>>> barrier that essentially had to wait until all preceding speculation
>>>>> had to be resolved.
>>>>>
>>>>> To address all of the above, these patches adopt a new approach, based
>>>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>>>> but which we have extended to deal with inter-function speculation.
>>>>> The patches divide the problem into two halves.
>>>>>
>>>>> The first half is some target-specific code to track the speculation
>>>>> condition through the generated code to provide an internal variable
>>>>> which can tell us whether or not the CPU's control flow speculation
>>>>> matches the data flow calculations.  The idea is that the internal
>>>>> variable starts with the value TRUE and if the CPU's control flow
>>>>> speculation ever causes a jump to the wrong block of code the variable
>>>>> becomes false until such time as the incorrect control flow
>>>>> speculation gets unwound.
>>>>>
>>>>> The second half is that a new intrinsic function is introduced that is
>>>>> much simpler than we had before.  The basic version of the intrinsic
>>>>> is now simply:
>>>>>
>>>>>       T var = __builtin_speculation_safe_value (T unsafe_var);
>>>>>
>>>>> Full details of the syntax can be found in the documentation patch, in
>>>>> patch 1.  In summary, when not speculating the intrinsic returns
>>>>> unsafe_var; when speculating then if it can be shown that the
>>>>> speculative flow has diverged from the intended control flow then zero
>>>>> is returned.  An optional second argument can be used to return an
>>>>> alternative value to zero.  The builtin may cause execution to pause
>>>>> until the speculation state is resolved.
>>>>
>>>> So a trivial target implementation would be to emit a barrier and then
>>>> it would always return unsafe_var and never zero.  What I don't understand
>>>> fully is what users should do here, thus what the value of ever returning
>>>> "unsafe" is.  Also I wonder why the API is forcing you to single-out a
>>>> special value instead of doing
>>>>
>>>>  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
>>>>  if (!safe)
>>>>    /* what now? */
>>>>
>>>> I'm only guessing that the correct way to handle "unsafe" is basically
>>>>
>>>>  while (__builtin_speculation_safe_value (val) == 0)
>>>>     ;
>>>>
>>>> use val, it's now safe
>>>
>>> No, a safe version of val is returned, not a bool telling you it is now
>>> safe to use the original.
>>
>> OK, so making the old value dead is required to preserve the desired
>> dataflow.
>>
>> But how should I use the special value that signaled "failure"?
>>
>> Obviously the user isn't supposed to simply replace 'val' with
>>
>>  val = __builtin_speculation_safe_value (val);
>>
>> to make it speculation-proof.  So - how should the user _use_ this
>> builtin?  The docs do not say anything about this but says the
>> very confusing
>>
>> +The function may use target-dependent speculation tracking state to cause
>> +@var{failval} to be returned when it is known that speculative
>> +execution has incorrectly predicted a conditional branch operation.
>>
>> because speculation is about executing instructions as if they were
>> supposed to be executed.  Once it is known the prediciton was wrong
>> no more "wrong" instructions will be executed but a previously
>> speculated instruction cannot know it was "falsely" speculated.
>>
>> Does the above try to say that the function may return failval if the
>> instruction is currently executed speculatively instead?  That would
>> make sense to me.  And return failval independent of if the speculation
>> later turns out to be correct or not.
>>
>>>  You must use the sanitized version in future,
>>> not the unprotected version.
>>>
>>>
>>> So the usage is going to be more like:
>>>
>>> val = __builtin_speculation_safe_value (val);  // Overwrite val with a
>>> sanitized version.
>>>
>>> You have to use the cleaned up version, the unclean version is still
>>> vulnerable to incorrect speculation.
>>
>> But then where does failval come into play?  The above cannot be correct
>> if failval matters.
> 
> failval only comes into play if the CPU is speculating incorrectly.
> It's just a safe value that some CPUs might use until such time as the
> speculation gets unwound.  In most cases it can be zero.  Perhaps a more
> concrete example would be something like.
The value really only matters in the sense that if the processor is
speculating and using that value that the uses don't lead information leaks.

That's why 0 is so useful here as safeval.  Reading/writing *0
speculatively isn't going to populate the cache in a way that is useful
to the attacker.

Jeff
Richard Earnshaw (lists) July 10, 2018, 4:43 p.m. UTC | #14
On 10/07/18 16:42, Jeff Law wrote:
> On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote:
>> On 10/07/18 00:13, Jeff Law wrote:
>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>>>
>>>> To address all of the above, these patches adopt a new approach, based
>>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>>> but which we have extended to deal with inter-function speculation.
>>>> The patches divide the problem into two halves.
>>> We're essentially turning the control dependency into a value that we
>>> can then use to munge the pointer or the resultant data.
>>>
>>>>
>>>> The first half is some target-specific code to track the speculation
>>>> condition through the generated code to provide an internal variable
>>>> which can tell us whether or not the CPU's control flow speculation
>>>> matches the data flow calculations.  The idea is that the internal
>>>> variable starts with the value TRUE and if the CPU's control flow
>>>> speculation ever causes a jump to the wrong block of code the variable
>>>> becomes false until such time as the incorrect control flow
>>>> speculation gets unwound.
>>> Right.
>>>
>>> So one of the things that comes immediately to mind is you have to run
>>> this early enough that you can still get to all the control flow and
>>> build your predicates.  Otherwise you have do undo stuff like
>>> conditional move generation.
>>
>> No, the opposite, in fact.  We want to run this very late, at least on
>> Arm systems (AArch64 or AArch32).  Conditional move instructions are
>> fine - they're data-flow operations, not control flow (in fact, that's
>> exactly what the control flow tracker instructions are).  By running it
>> late we avoid disrupting any of the earlier optimization passes as well.
> Ack.  I looked at the aarch64 implementation after sending my message
> and it clearly runs very late.
> 
> I haven't convinced myself that all the work generic parts of the
> compiler to rewrite and eliminate conditionals is safe.  But even if it
> isn't, you're probably getting enough coverage to drastically reduce the
> attack surface.  I'm going to have to think about the early
> transformations we make and how they interact here harder.  But I think
> the general approach can dramatically reduce the attack surface.

My argument here would be that we are concerned about speculation that
the CPU does with the generated program.  We're not particularly
bothered about the abstract machine description it's based upon.  As
long as the earlier transforms lead to a valid translation (it hasn't
removed a necessary bounds check) then running late is fine.

I can't currently conceive a situation where the compiler would be able
to remove a /necessary/ bounds check that could lead to unsafe
speculation later on.  A redundant bounds check removal shouldn't be a
problem as the non-redundant check should remain and that will still get
tracking code added.

> 
> With running very late, as you noted, the big concern is edge
> insertions.  I'm going to have to re-familiarize myself with all the
> rules there :-)    I did note you stumbled on some of the issues in that
> space (what to do with calls that throw exceptions).
> 
> Placement before the final bbro pass probably avoids a lot of pain.  So
> the basic placement seems reasonable.  And again, if we're missing
> something due to the effects of earlier passes, I still think you're
> reducing the attack surface in a meaningful way.
> 
> 
> 
>>
>>>
>>> On the flip side, the earlier you do this mitigation, the more you have
>>> to worry about what the optimizers are going to do to the code later in
>>> the pipeline.  It's almost guaranteed a naive implementation is going to
>>> muck this up since we can propagate the state of the condition into the
>>> arms which will make the predicate state a compile time constant.
>>>
>>> In fact this seems to be running into the area of pointer providence and
>>> some discussions we had around atomic a few years back.
>>>
>>> I also wonder if this could be combined with taint analysis to produce a
>>> much lower overhead solution in cases were developers have done analysis
>>> and know what objects are potentially under attacker control.  So
>>> instead of analyzing everything, we can have a much narrower focus.
>>
>> Automatic application of the tracker to vulnerable variables would be
>> nice, but I haven't attempted to go there yet: at present I still rely
>> on the user to annotate code with the new intrinsic.
> ACK.  My sense is we are going to want taint analysis.  I think it'd be
> useful here and in other contexts.  However, I don't think it
> necessarily needs to be a requirement to go forward.
> 
> I'm going to review the atomic discussion we had a while back with the
> kernel folks as well as some pointer providence discussions I've had
> with Martin S.  I can't put my finger on it yet, but I still have the
> sense there's some interactions here we want to at least be aware of.
> 
>>
>> That doesn't mean that we couldn't extend the overall approach later to
>> include automatic tracking.
> Absolutely.
> 
>>
>>>
>>> The pointer munging could well run afoul of alias analysis engines that
>>> don't expect to be seeing those kind of operations.
>>
>> I think the pass runs late enough that it isn't a problem.
> Yea, I think you're right.
> 
> 
>>
>>>
>>> Anyway, just some initial high level thoughts.  I'm sure there'll be
>>> more as I read the implementation.
>>>
>>
>> Thanks for starting to look at this so quickly.
> NP.  Your timing to come back to this is good.
> 
> Jeff
>
Jeff Law July 11, 2018, 8:46 p.m. UTC | #15
On 07/10/2018 10:43 AM, Richard Earnshaw (lists) wrote:
> On 10/07/18 16:42, Jeff Law wrote:
>> On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote:
>>> On 10/07/18 00:13, Jeff Law wrote:
>>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>>>>
>>>>> To address all of the above, these patches adopt a new approach, based
>>>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>>>> but which we have extended to deal with inter-function speculation.
>>>>> The patches divide the problem into two halves.
>>>> We're essentially turning the control dependency into a value that we
>>>> can then use to munge the pointer or the resultant data.
>>>>
>>>>>
>>>>> The first half is some target-specific code to track the speculation
>>>>> condition through the generated code to provide an internal variable
>>>>> which can tell us whether or not the CPU's control flow speculation
>>>>> matches the data flow calculations.  The idea is that the internal
>>>>> variable starts with the value TRUE and if the CPU's control flow
>>>>> speculation ever causes a jump to the wrong block of code the variable
>>>>> becomes false until such time as the incorrect control flow
>>>>> speculation gets unwound.
>>>> Right.
>>>>
>>>> So one of the things that comes immediately to mind is you have to run
>>>> this early enough that you can still get to all the control flow and
>>>> build your predicates.  Otherwise you have do undo stuff like
>>>> conditional move generation.
>>>
>>> No, the opposite, in fact.  We want to run this very late, at least on
>>> Arm systems (AArch64 or AArch32).  Conditional move instructions are
>>> fine - they're data-flow operations, not control flow (in fact, that's
>>> exactly what the control flow tracker instructions are).  By running it
>>> late we avoid disrupting any of the earlier optimization passes as well.
>> Ack.  I looked at the aarch64 implementation after sending my message
>> and it clearly runs very late.
>>
>> I haven't convinced myself that all the work generic parts of the
>> compiler to rewrite and eliminate conditionals is safe.  But even if it
>> isn't, you're probably getting enough coverage to drastically reduce the
>> attack surface.  I'm going to have to think about the early
>> transformations we make and how they interact here harder.  But I think
>> the general approach can dramatically reduce the attack surface.
> 
> My argument here would be that we are concerned about speculation that
> the CPU does with the generated program.  We're not particularly
> bothered about the abstract machine description it's based upon.  As
> long as the earlier transforms lead to a valid translation (it hasn't
> removed a necessary bounds check) then running late is fine.
I'm thinking about obfuscation of the bounds check or the pointer or
turning branchy into straightline code, possibly doing some speculation
in the process, if-conversion and the like.

For example hoist_adjacent_loads which results in speculative loads and
likely a conditional move to select between the two loaded values.

Or what if we've done something like

if (x < maxval)
   res = *p;

And we've turned that into


t = *p;
res = (x < maxval) ? t : res;


That may be implemented as a conditional move at the RTL level, so
protecting that may be nontrivial.

In those examples the compiler itself has introduced the speculation.

I can't find the conditional obfuscation I was looking for, so it's hard
to rule it in our out as potentially problematical.

WRT pointer obfuscation, we no longer propagate conditional equivalences
very agressively, so it may be a non-issue in the end.

But again, even with these concerns I think what you're doing cuts down
the attack surface in meaningful ways.



> 
> I can't currently conceive a situation where the compiler would be able
> to remove a /necessary/ bounds check that could lead to unsafe
> speculation later on.  A redundant bounds check removal shouldn't be a
> problem as the non-redundant check should remain and that will still get
> tracking code added.
It's less about removal and more about either compiler-generated
speculation or obfuscation of the patterns you're looking for.


jeff
Richard Earnshaw (lists) July 11, 2018, 10:30 p.m. UTC | #16
On 11/07/18 21:46, Jeff Law wrote:
> On 07/10/2018 10:43 AM, Richard Earnshaw (lists) wrote:
>> On 10/07/18 16:42, Jeff Law wrote:
>>> On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote:
>>>> On 10/07/18 00:13, Jeff Law wrote:
>>>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>>>>>
>>>>>> To address all of the above, these patches adopt a new approach, based
>>>>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>>>>> but which we have extended to deal with inter-function speculation.
>>>>>> The patches divide the problem into two halves.
>>>>> We're essentially turning the control dependency into a value that we
>>>>> can then use to munge the pointer or the resultant data.
>>>>>
>>>>>>
>>>>>> The first half is some target-specific code to track the speculation
>>>>>> condition through the generated code to provide an internal variable
>>>>>> which can tell us whether or not the CPU's control flow speculation
>>>>>> matches the data flow calculations.  The idea is that the internal
>>>>>> variable starts with the value TRUE and if the CPU's control flow
>>>>>> speculation ever causes a jump to the wrong block of code the variable
>>>>>> becomes false until such time as the incorrect control flow
>>>>>> speculation gets unwound.
>>>>> Right.
>>>>>
>>>>> So one of the things that comes immediately to mind is you have to run
>>>>> this early enough that you can still get to all the control flow and
>>>>> build your predicates.  Otherwise you have do undo stuff like
>>>>> conditional move generation.
>>>>
>>>> No, the opposite, in fact.  We want to run this very late, at least on
>>>> Arm systems (AArch64 or AArch32).  Conditional move instructions are
>>>> fine - they're data-flow operations, not control flow (in fact, that's
>>>> exactly what the control flow tracker instructions are).  By running it
>>>> late we avoid disrupting any of the earlier optimization passes as well.
>>> Ack.  I looked at the aarch64 implementation after sending my message
>>> and it clearly runs very late.
>>>
>>> I haven't convinced myself that all the work generic parts of the
>>> compiler to rewrite and eliminate conditionals is safe.  But even if it
>>> isn't, you're probably getting enough coverage to drastically reduce the
>>> attack surface.  I'm going to have to think about the early
>>> transformations we make and how they interact here harder.  But I think
>>> the general approach can dramatically reduce the attack surface.
>>
>> My argument here would be that we are concerned about speculation that
>> the CPU does with the generated program.  We're not particularly
>> bothered about the abstract machine description it's based upon.  As
>> long as the earlier transforms lead to a valid translation (it hasn't
>> removed a necessary bounds check) then running late is fine.
> I'm thinking about obfuscation of the bounds check or the pointer or
> turning branchy into straightline code, possibly doing some speculation
> in the process, if-conversion and the like.
> 
> For example hoist_adjacent_loads which results in speculative loads and
> likely a conditional move to select between the two loaded values.
> 
> Or what if we've done something like
> 
> if (x < maxval)
>    res = *p;
> 
> And we've turned that into
> 
> 
> t = *p;
> res = (x < maxval) ? t : res;

Hmm, interesting.  But for that to be safe, the compiler would have to
be able to prove that dereferencing p was safe even if x >= maxval,
otherwise the run-time code could fault (so if there's any chance that
it could point to something vulnerable, then there must also be a chance
that it points to unmapped memory).  Given that requirement, I don't
think this case can be a specific concern, since the requirement implies
that p must already be within some known bounds for the type of object
it points to.

R.

> 
> 
> That may be implemented as a conditional move at the RTL level, so
> protecting that may be nontrivial.
> 
> In those examples the compiler itself has introduced the speculation.
> 
> I can't find the conditional obfuscation I was looking for, so it's hard
> to rule it in our out as potentially problematical.
> 
> WRT pointer obfuscation, we no longer propagate conditional equivalences
> very agressively, so it may be a non-issue in the end.
> 
> But again, even with these concerns I think what you're doing cuts down
> the attack surface in meaningful ways.
> 
> 
> 
>>
>> I can't currently conceive a situation where the compiler would be able
>> to remove a /necessary/ bounds check that could lead to unsafe
>> speculation later on.  A redundant bounds check removal shouldn't be a
>> problem as the non-redundant check should remain and that will still get
>> tracking code added.
> It's less about removal and more about either compiler-generated
> speculation or obfuscation of the patterns you're looking for.
> 
> 
> jeff
> 
> 
> 
>