Message ID | 1531154299-28349-1-git-send-email-Richard.Earnshaw@arm.com |
---|---|
Headers | show |
Series | Mitigation against unsafe data speculation (CVE-2017-5753) | expand |
On 07/09/2018 10:38 AM, Richard Earnshaw wrote: > > The patches I posted earlier this year for mitigating against > CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from > which it became obvious that a rethink was needed. This mail, and the > following patches attempt to address that feedback and present a new > approach to mitigating against this form of attack surface. > > There were two major issues with the original approach: > > - The speculation bounds were too tightly constrained - essentially > they had to represent and upper and lower bound on a pointer, or a > pointer offset. > - The speculation constraints could only cover the immediately preceding > branch, which often did not fit well with the structure of the existing > code. > > An additional criticism was that the shape of the intrinsic did not > fit particularly well with systems that used a single speculation > barrier that essentially had to wait until all preceding speculation > had to be resolved. Right. I suggest the Intel and IBM reps chime in on the updated semantics. > > To address all of the above, these patches adopt a new approach, based > in part on a posting by Chandler Carruth to the LLVM developers list > (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), > but which we have extended to deal with inter-function speculation. > The patches divide the problem into two halves. We're essentially turning the control dependency into a value that we can then use to munge the pointer or the resultant data. > > The first half is some target-specific code to track the speculation > condition through the generated code to provide an internal variable > which can tell us whether or not the CPU's control flow speculation > matches the data flow calculations. The idea is that the internal > variable starts with the value TRUE and if the CPU's control flow > speculation ever causes a jump to the wrong block of code the variable > becomes false until such time as the incorrect control flow > speculation gets unwound. Right. So one of the things that comes immediately to mind is you have to run this early enough that you can still get to all the control flow and build your predicates. Otherwise you have do undo stuff like conditional move generation. On the flip side, the earlier you do this mitigation, the more you have to worry about what the optimizers are going to do to the code later in the pipeline. It's almost guaranteed a naive implementation is going to muck this up since we can propagate the state of the condition into the arms which will make the predicate state a compile time constant. In fact this seems to be running into the area of pointer providence and some discussions we had around atomic a few years back. I also wonder if this could be combined with taint analysis to produce a much lower overhead solution in cases were developers have done analysis and know what objects are potentially under attacker control. So instead of analyzing everything, we can have a much narrower focus. The pointer munging could well run afoul of alias analysis engines that don't expect to be seeing those kind of operations. Anyway, just some initial high level thoughts. I'm sure there'll be more as I read the implementation. Jeff
On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw <Richard.Earnshaw@arm.com> wrote: > > > The patches I posted earlier this year for mitigating against > CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from > which it became obvious that a rethink was needed. This mail, and the > following patches attempt to address that feedback and present a new > approach to mitigating against this form of attack surface. > > There were two major issues with the original approach: > > - The speculation bounds were too tightly constrained - essentially > they had to represent and upper and lower bound on a pointer, or a > pointer offset. > - The speculation constraints could only cover the immediately preceding > branch, which often did not fit well with the structure of the existing > code. > > An additional criticism was that the shape of the intrinsic did not > fit particularly well with systems that used a single speculation > barrier that essentially had to wait until all preceding speculation > had to be resolved. > > To address all of the above, these patches adopt a new approach, based > in part on a posting by Chandler Carruth to the LLVM developers list > (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), > but which we have extended to deal with inter-function speculation. > The patches divide the problem into two halves. > > The first half is some target-specific code to track the speculation > condition through the generated code to provide an internal variable > which can tell us whether or not the CPU's control flow speculation > matches the data flow calculations. The idea is that the internal > variable starts with the value TRUE and if the CPU's control flow > speculation ever causes a jump to the wrong block of code the variable > becomes false until such time as the incorrect control flow > speculation gets unwound. > > The second half is that a new intrinsic function is introduced that is > much simpler than we had before. The basic version of the intrinsic > is now simply: > > T var = __builtin_speculation_safe_value (T unsafe_var); > > Full details of the syntax can be found in the documentation patch, in > patch 1. In summary, when not speculating the intrinsic returns > unsafe_var; when speculating then if it can be shown that the > speculative flow has diverged from the intended control flow then zero > is returned. An optional second argument can be used to return an > alternative value to zero. The builtin may cause execution to pause > until the speculation state is resolved. So a trivial target implementation would be to emit a barrier and then it would always return unsafe_var and never zero. What I don't understand fully is what users should do here, thus what the value of ever returning "unsafe" is. Also I wonder why the API is forcing you to single-out a special value instead of doing bool safe = __builtin_speculation_safe_value_p (T unsafe_value); if (!safe) /* what now? */ I'm only guessing that the correct way to handle "unsafe" is basically while (__builtin_speculation_safe_value (val) == 0) ; use val, it's now safe that is, the return value is only interesting in sofar as to whether it is equal to val or the special value? That said, I wonder why we don't hide that details from the user and provide a predicate instead. Richard. > There are seven patches in this set, as follows. > > 1) Introduces the new intrinsic __builtin_sepculation_safe_value. > 2) Adds a basic hard barrier implementation for AArch32 (arm) state. > 3) Adds a basic hard barrier implementation for AArch64 state. > 4) Adds a new command-line option -mtrack-speculation (currently a no-op). > 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation. > 6) Adds the new speculation tracking pass for AArch64 > 7) Uses the new speculation tracking pass to generate CSDB-based barrier > sequences > > I haven't added a speculation-tracking pass for AArch32 at this time. > It is possible to do this, but would require quite a lot of rework for > the arm backend due to the limited number of registers that are > available. > > Although patch 6 is AArch64 specific, I'd appreciate a review from > someone more familiar with the branch edge code than myself. There > appear to be a number of tricky issues with more complex edges so I'd > like a second opinion on that code in case I've missed an important > case. > > R. > > > > Richard Earnshaw (7): > Add __builtin_speculation_safe_value > Arm - add speculation_barrier pattern > AArch64 - add speculation barrier > AArch64 - Add new option -mtrack-speculation > AArch64 - disable CB[N]Z TB[N]Z when tracking speculation > AArch64 - new pass to add conditional-branch speculation tracking > AArch64 - use CSDB based sequences if speculation tracking is enabled > > gcc/builtin-types.def | 6 + > gcc/builtins.c | 57 ++++ > gcc/builtins.def | 20 ++ > gcc/c-family/c-common.c | 143 +++++++++ > gcc/c-family/c-cppbuiltin.c | 5 +- > gcc/config.gcc | 2 +- > gcc/config/aarch64/aarch64-passes.def | 1 + > gcc/config/aarch64/aarch64-protos.h | 3 +- > gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++ > gcc/config/aarch64/aarch64.c | 88 +++++- > gcc/config/aarch64/aarch64.md | 140 ++++++++- > gcc/config/aarch64/aarch64.opt | 4 + > gcc/config/aarch64/iterators.md | 3 + > gcc/config/aarch64/t-aarch64 | 10 + > gcc/config/arm/arm.md | 21 ++ > gcc/config/arm/unspecs.md | 1 + > gcc/doc/cpp.texi | 4 + > gcc/doc/extend.texi | 29 ++ > gcc/doc/invoke.texi | 10 +- > gcc/doc/md.texi | 15 + > gcc/doc/tm.texi | 20 ++ > gcc/doc/tm.texi.in | 2 + > gcc/target.def | 23 ++ > gcc/targhooks.c | 27 ++ > gcc/targhooks.h | 2 + > gcc/testsuite/gcc.dg/spec-barrier-1.c | 40 +++ > gcc/testsuite/gcc.dg/spec-barrier-2.c | 19 ++ > gcc/testsuite/gcc.dg/spec-barrier-3.c | 13 + > 28 files changed, 1191 insertions(+), 11 deletions(-) > create mode 100644 gcc/config/aarch64/aarch64-speculation.cc > create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c > create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c > create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c
On 10/07/18 08:19, Richard Biener wrote: > On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw > <Richard.Earnshaw@arm.com> wrote: >> >> >> The patches I posted earlier this year for mitigating against >> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >> which it became obvious that a rethink was needed. This mail, and the >> following patches attempt to address that feedback and present a new >> approach to mitigating against this form of attack surface. >> >> There were two major issues with the original approach: >> >> - The speculation bounds were too tightly constrained - essentially >> they had to represent and upper and lower bound on a pointer, or a >> pointer offset. >> - The speculation constraints could only cover the immediately preceding >> branch, which often did not fit well with the structure of the existing >> code. >> >> An additional criticism was that the shape of the intrinsic did not >> fit particularly well with systems that used a single speculation >> barrier that essentially had to wait until all preceding speculation >> had to be resolved. >> >> To address all of the above, these patches adopt a new approach, based >> in part on a posting by Chandler Carruth to the LLVM developers list >> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >> but which we have extended to deal with inter-function speculation. >> The patches divide the problem into two halves. >> >> The first half is some target-specific code to track the speculation >> condition through the generated code to provide an internal variable >> which can tell us whether or not the CPU's control flow speculation >> matches the data flow calculations. The idea is that the internal >> variable starts with the value TRUE and if the CPU's control flow >> speculation ever causes a jump to the wrong block of code the variable >> becomes false until such time as the incorrect control flow >> speculation gets unwound. >> >> The second half is that a new intrinsic function is introduced that is >> much simpler than we had before. The basic version of the intrinsic >> is now simply: >> >> T var = __builtin_speculation_safe_value (T unsafe_var); >> >> Full details of the syntax can be found in the documentation patch, in >> patch 1. In summary, when not speculating the intrinsic returns >> unsafe_var; when speculating then if it can be shown that the >> speculative flow has diverged from the intended control flow then zero >> is returned. An optional second argument can be used to return an >> alternative value to zero. The builtin may cause execution to pause >> until the speculation state is resolved. > > So a trivial target implementation would be to emit a barrier and then > it would always return unsafe_var and never zero. What I don't understand > fully is what users should do here, thus what the value of ever returning > "unsafe" is. Also I wonder why the API is forcing you to single-out a > special value instead of doing > > bool safe = __builtin_speculation_safe_value_p (T unsafe_value); > if (!safe) > /* what now? */ > > I'm only guessing that the correct way to handle "unsafe" is basically > > while (__builtin_speculation_safe_value (val) == 0) > ; > > use val, it's now safe No, a safe version of val is returned, not a bool telling you it is now safe to use the original. You must use the sanitized version in future, not the unprotected version. So the usage is going to be more like: val = __builtin_speculation_safe_value (val); // Overwrite val with a sanitized version. You have to use the cleaned up version, the unclean version is still vulnerable to incorrect speculation. R. > > that is, the return value is only interesting in sofar as to whether it is equal > to val or the special value? > > That said, I wonder why we don't hide that details from the user and > provide a predicate instead. > > Richard. > >> There are seven patches in this set, as follows. >> >> 1) Introduces the new intrinsic __builtin_sepculation_safe_value. >> 2) Adds a basic hard barrier implementation for AArch32 (arm) state. >> 3) Adds a basic hard barrier implementation for AArch64 state. >> 4) Adds a new command-line option -mtrack-speculation (currently a no-op). >> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation. >> 6) Adds the new speculation tracking pass for AArch64 >> 7) Uses the new speculation tracking pass to generate CSDB-based barrier >> sequences >> >> I haven't added a speculation-tracking pass for AArch32 at this time. >> It is possible to do this, but would require quite a lot of rework for >> the arm backend due to the limited number of registers that are >> available. >> >> Although patch 6 is AArch64 specific, I'd appreciate a review from >> someone more familiar with the branch edge code than myself. There >> appear to be a number of tricky issues with more complex edges so I'd >> like a second opinion on that code in case I've missed an important >> case. >> >> R. >> >> >> >> Richard Earnshaw (7): >> Add __builtin_speculation_safe_value >> Arm - add speculation_barrier pattern >> AArch64 - add speculation barrier >> AArch64 - Add new option -mtrack-speculation >> AArch64 - disable CB[N]Z TB[N]Z when tracking speculation >> AArch64 - new pass to add conditional-branch speculation tracking >> AArch64 - use CSDB based sequences if speculation tracking is enabled >> >> gcc/builtin-types.def | 6 + >> gcc/builtins.c | 57 ++++ >> gcc/builtins.def | 20 ++ >> gcc/c-family/c-common.c | 143 +++++++++ >> gcc/c-family/c-cppbuiltin.c | 5 +- >> gcc/config.gcc | 2 +- >> gcc/config/aarch64/aarch64-passes.def | 1 + >> gcc/config/aarch64/aarch64-protos.h | 3 +- >> gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++ >> gcc/config/aarch64/aarch64.c | 88 +++++- >> gcc/config/aarch64/aarch64.md | 140 ++++++++- >> gcc/config/aarch64/aarch64.opt | 4 + >> gcc/config/aarch64/iterators.md | 3 + >> gcc/config/aarch64/t-aarch64 | 10 + >> gcc/config/arm/arm.md | 21 ++ >> gcc/config/arm/unspecs.md | 1 + >> gcc/doc/cpp.texi | 4 + >> gcc/doc/extend.texi | 29 ++ >> gcc/doc/invoke.texi | 10 +- >> gcc/doc/md.texi | 15 + >> gcc/doc/tm.texi | 20 ++ >> gcc/doc/tm.texi.in | 2 + >> gcc/target.def | 23 ++ >> gcc/targhooks.c | 27 ++ >> gcc/targhooks.h | 2 + >> gcc/testsuite/gcc.dg/spec-barrier-1.c | 40 +++ >> gcc/testsuite/gcc.dg/spec-barrier-2.c | 19 ++ >> gcc/testsuite/gcc.dg/spec-barrier-3.c | 13 + >> 28 files changed, 1191 insertions(+), 11 deletions(-) >> create mode 100644 gcc/config/aarch64/aarch64-speculation.cc >> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c >> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c >> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c
On 10/07/18 00:13, Jeff Law wrote: > On 07/09/2018 10:38 AM, Richard Earnshaw wrote: >> >> The patches I posted earlier this year for mitigating against >> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >> which it became obvious that a rethink was needed. This mail, and the >> following patches attempt to address that feedback and present a new >> approach to mitigating against this form of attack surface. >> >> There were two major issues with the original approach: >> >> - The speculation bounds were too tightly constrained - essentially >> they had to represent and upper and lower bound on a pointer, or a >> pointer offset. >> - The speculation constraints could only cover the immediately preceding >> branch, which often did not fit well with the structure of the existing >> code. >> >> An additional criticism was that the shape of the intrinsic did not >> fit particularly well with systems that used a single speculation >> barrier that essentially had to wait until all preceding speculation >> had to be resolved. > Right. I suggest the Intel and IBM reps chime in on the updated semantics. > Yes, logically, this is a boolean tracker value. In practice we use ~0 for true and 0 for false, so that we can simply use it as a mask operation later. I hope this intrinsic will be even more acceptable than the one that Bill Schmidt acked previously, it's even simpler than the version we had last time. >> >> To address all of the above, these patches adopt a new approach, based >> in part on a posting by Chandler Carruth to the LLVM developers list >> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >> but which we have extended to deal with inter-function speculation. >> The patches divide the problem into two halves. > We're essentially turning the control dependency into a value that we > can then use to munge the pointer or the resultant data. > >> >> The first half is some target-specific code to track the speculation >> condition through the generated code to provide an internal variable >> which can tell us whether or not the CPU's control flow speculation >> matches the data flow calculations. The idea is that the internal >> variable starts with the value TRUE and if the CPU's control flow >> speculation ever causes a jump to the wrong block of code the variable >> becomes false until such time as the incorrect control flow >> speculation gets unwound. > Right. > > So one of the things that comes immediately to mind is you have to run > this early enough that you can still get to all the control flow and > build your predicates. Otherwise you have do undo stuff like > conditional move generation. No, the opposite, in fact. We want to run this very late, at least on Arm systems (AArch64 or AArch32). Conditional move instructions are fine - they're data-flow operations, not control flow (in fact, that's exactly what the control flow tracker instructions are). By running it late we avoid disrupting any of the earlier optimization passes as well. > > On the flip side, the earlier you do this mitigation, the more you have > to worry about what the optimizers are going to do to the code later in > the pipeline. It's almost guaranteed a naive implementation is going to > muck this up since we can propagate the state of the condition into the > arms which will make the predicate state a compile time constant. > > In fact this seems to be running into the area of pointer providence and > some discussions we had around atomic a few years back. > > I also wonder if this could be combined with taint analysis to produce a > much lower overhead solution in cases were developers have done analysis > and know what objects are potentially under attacker control. So > instead of analyzing everything, we can have a much narrower focus. Automatic application of the tracker to vulnerable variables would be nice, but I haven't attempted to go there yet: at present I still rely on the user to annotate code with the new intrinsic. That doesn't mean that we couldn't extend the overall approach later to include automatic tracking. > > The pointer munging could well run afoul of alias analysis engines that > don't expect to be seeing those kind of operations. I think the pass runs late enough that it isn't a problem. > > Anyway, just some initial high level thoughts. I'm sure there'll be > more as I read the implementation. > Thanks for starting to look at this so quickly. R. > > Jeff >
On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote: > > On 10/07/18 08:19, Richard Biener wrote: > > On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw > > <Richard.Earnshaw@arm.com> wrote: > >> > >> > >> The patches I posted earlier this year for mitigating against > >> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from > >> which it became obvious that a rethink was needed. This mail, and the > >> following patches attempt to address that feedback and present a new > >> approach to mitigating against this form of attack surface. > >> > >> There were two major issues with the original approach: > >> > >> - The speculation bounds were too tightly constrained - essentially > >> they had to represent and upper and lower bound on a pointer, or a > >> pointer offset. > >> - The speculation constraints could only cover the immediately preceding > >> branch, which often did not fit well with the structure of the existing > >> code. > >> > >> An additional criticism was that the shape of the intrinsic did not > >> fit particularly well with systems that used a single speculation > >> barrier that essentially had to wait until all preceding speculation > >> had to be resolved. > >> > >> To address all of the above, these patches adopt a new approach, based > >> in part on a posting by Chandler Carruth to the LLVM developers list > >> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), > >> but which we have extended to deal with inter-function speculation. > >> The patches divide the problem into two halves. > >> > >> The first half is some target-specific code to track the speculation > >> condition through the generated code to provide an internal variable > >> which can tell us whether or not the CPU's control flow speculation > >> matches the data flow calculations. The idea is that the internal > >> variable starts with the value TRUE and if the CPU's control flow > >> speculation ever causes a jump to the wrong block of code the variable > >> becomes false until such time as the incorrect control flow > >> speculation gets unwound. > >> > >> The second half is that a new intrinsic function is introduced that is > >> much simpler than we had before. The basic version of the intrinsic > >> is now simply: > >> > >> T var = __builtin_speculation_safe_value (T unsafe_var); > >> > >> Full details of the syntax can be found in the documentation patch, in > >> patch 1. In summary, when not speculating the intrinsic returns > >> unsafe_var; when speculating then if it can be shown that the > >> speculative flow has diverged from the intended control flow then zero > >> is returned. An optional second argument can be used to return an > >> alternative value to zero. The builtin may cause execution to pause > >> until the speculation state is resolved. > > > > So a trivial target implementation would be to emit a barrier and then > > it would always return unsafe_var and never zero. What I don't understand > > fully is what users should do here, thus what the value of ever returning > > "unsafe" is. Also I wonder why the API is forcing you to single-out a > > special value instead of doing > > > > bool safe = __builtin_speculation_safe_value_p (T unsafe_value); > > if (!safe) > > /* what now? */ > > > > I'm only guessing that the correct way to handle "unsafe" is basically > > > > while (__builtin_speculation_safe_value (val) == 0) > > ; > > > > use val, it's now safe > > No, a safe version of val is returned, not a bool telling you it is now > safe to use the original. OK, so making the old value dead is required to preserve the desired dataflow. But how should I use the special value that signaled "failure"? Obviously the user isn't supposed to simply replace 'val' with val = __builtin_speculation_safe_value (val); to make it speculation-proof. So - how should the user _use_ this builtin? The docs do not say anything about this but says the very confusing +The function may use target-dependent speculation tracking state to cause +@var{failval} to be returned when it is known that speculative +execution has incorrectly predicted a conditional branch operation. because speculation is about executing instructions as if they were supposed to be executed. Once it is known the prediciton was wrong no more "wrong" instructions will be executed but a previously speculated instruction cannot know it was "falsely" speculated. Does the above try to say that the function may return failval if the instruction is currently executed speculatively instead? That would make sense to me. And return failval independent of if the speculation later turns out to be correct or not. > You must use the sanitized version in future, > not the unprotected version. > > > So the usage is going to be more like: > > val = __builtin_speculation_safe_value (val); // Overwrite val with a > sanitized version. > > You have to use the cleaned up version, the unclean version is still > vulnerable to incorrect speculation. But then where does failval come into play? The above cannot be correct if failval matters. Does the builtin try to say that iff the current instruction is speculated then it will return failval, otherwise it will return val and thus following code needs to handle 'failval' in a way that is safe? Again, the wording "when it is known that speculative execution has incorrectly predicted a conditional branch operation" is very confusing... Richard. > R. > > > > > that is, the return value is only interesting in sofar as to whether it is equal > > to val or the special value? > > > > That said, I wonder why we don't hide that details from the user and > > provide a predicate instead. > > > > Richard. > > > >> There are seven patches in this set, as follows. > >> > >> 1) Introduces the new intrinsic __builtin_sepculation_safe_value. > >> 2) Adds a basic hard barrier implementation for AArch32 (arm) state. > >> 3) Adds a basic hard barrier implementation for AArch64 state. > >> 4) Adds a new command-line option -mtrack-speculation (currently a no-op). > >> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation. > >> 6) Adds the new speculation tracking pass for AArch64 > >> 7) Uses the new speculation tracking pass to generate CSDB-based barrier > >> sequences > >> > >> I haven't added a speculation-tracking pass for AArch32 at this time. > >> It is possible to do this, but would require quite a lot of rework for > >> the arm backend due to the limited number of registers that are > >> available. > >> > >> Although patch 6 is AArch64 specific, I'd appreciate a review from > >> someone more familiar with the branch edge code than myself. There > >> appear to be a number of tricky issues with more complex edges so I'd > >> like a second opinion on that code in case I've missed an important > >> case. > >> > >> R. > >> > >> > >> > >> Richard Earnshaw (7): > >> Add __builtin_speculation_safe_value > >> Arm - add speculation_barrier pattern > >> AArch64 - add speculation barrier > >> AArch64 - Add new option -mtrack-speculation > >> AArch64 - disable CB[N]Z TB[N]Z when tracking speculation > >> AArch64 - new pass to add conditional-branch speculation tracking > >> AArch64 - use CSDB based sequences if speculation tracking is enabled > >> > >> gcc/builtin-types.def | 6 + > >> gcc/builtins.c | 57 ++++ > >> gcc/builtins.def | 20 ++ > >> gcc/c-family/c-common.c | 143 +++++++++ > >> gcc/c-family/c-cppbuiltin.c | 5 +- > >> gcc/config.gcc | 2 +- > >> gcc/config/aarch64/aarch64-passes.def | 1 + > >> gcc/config/aarch64/aarch64-protos.h | 3 +- > >> gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++ > >> gcc/config/aarch64/aarch64.c | 88 +++++- > >> gcc/config/aarch64/aarch64.md | 140 ++++++++- > >> gcc/config/aarch64/aarch64.opt | 4 + > >> gcc/config/aarch64/iterators.md | 3 + > >> gcc/config/aarch64/t-aarch64 | 10 + > >> gcc/config/arm/arm.md | 21 ++ > >> gcc/config/arm/unspecs.md | 1 + > >> gcc/doc/cpp.texi | 4 + > >> gcc/doc/extend.texi | 29 ++ > >> gcc/doc/invoke.texi | 10 +- > >> gcc/doc/md.texi | 15 + > >> gcc/doc/tm.texi | 20 ++ > >> gcc/doc/tm.texi.in | 2 + > >> gcc/target.def | 23 ++ > >> gcc/targhooks.c | 27 ++ > >> gcc/targhooks.h | 2 + > >> gcc/testsuite/gcc.dg/spec-barrier-1.c | 40 +++ > >> gcc/testsuite/gcc.dg/spec-barrier-2.c | 19 ++ > >> gcc/testsuite/gcc.dg/spec-barrier-3.c | 13 + > >> 28 files changed, 1191 insertions(+), 11 deletions(-) > >> create mode 100644 gcc/config/aarch64/aarch64-speculation.cc > >> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c > >> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c > >> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c >
On 10/07/18 11:10, Richard Biener wrote: > On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists) > <Richard.Earnshaw@arm.com> wrote: >> >> On 10/07/18 08:19, Richard Biener wrote: >>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw >>> <Richard.Earnshaw@arm.com> wrote: >>>> >>>> >>>> The patches I posted earlier this year for mitigating against >>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >>>> which it became obvious that a rethink was needed. This mail, and the >>>> following patches attempt to address that feedback and present a new >>>> approach to mitigating against this form of attack surface. >>>> >>>> There were two major issues with the original approach: >>>> >>>> - The speculation bounds were too tightly constrained - essentially >>>> they had to represent and upper and lower bound on a pointer, or a >>>> pointer offset. >>>> - The speculation constraints could only cover the immediately preceding >>>> branch, which often did not fit well with the structure of the existing >>>> code. >>>> >>>> An additional criticism was that the shape of the intrinsic did not >>>> fit particularly well with systems that used a single speculation >>>> barrier that essentially had to wait until all preceding speculation >>>> had to be resolved. >>>> >>>> To address all of the above, these patches adopt a new approach, based >>>> in part on a posting by Chandler Carruth to the LLVM developers list >>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>>> but which we have extended to deal with inter-function speculation. >>>> The patches divide the problem into two halves. >>>> >>>> The first half is some target-specific code to track the speculation >>>> condition through the generated code to provide an internal variable >>>> which can tell us whether or not the CPU's control flow speculation >>>> matches the data flow calculations. The idea is that the internal >>>> variable starts with the value TRUE and if the CPU's control flow >>>> speculation ever causes a jump to the wrong block of code the variable >>>> becomes false until such time as the incorrect control flow >>>> speculation gets unwound. >>>> >>>> The second half is that a new intrinsic function is introduced that is >>>> much simpler than we had before. The basic version of the intrinsic >>>> is now simply: >>>> >>>> T var = __builtin_speculation_safe_value (T unsafe_var); >>>> >>>> Full details of the syntax can be found in the documentation patch, in >>>> patch 1. In summary, when not speculating the intrinsic returns >>>> unsafe_var; when speculating then if it can be shown that the >>>> speculative flow has diverged from the intended control flow then zero >>>> is returned. An optional second argument can be used to return an >>>> alternative value to zero. The builtin may cause execution to pause >>>> until the speculation state is resolved. >>> >>> So a trivial target implementation would be to emit a barrier and then >>> it would always return unsafe_var and never zero. What I don't understand >>> fully is what users should do here, thus what the value of ever returning >>> "unsafe" is. Also I wonder why the API is forcing you to single-out a >>> special value instead of doing >>> >>> bool safe = __builtin_speculation_safe_value_p (T unsafe_value); >>> if (!safe) >>> /* what now? */ >>> >>> I'm only guessing that the correct way to handle "unsafe" is basically >>> >>> while (__builtin_speculation_safe_value (val) == 0) >>> ; >>> >>> use val, it's now safe >> >> No, a safe version of val is returned, not a bool telling you it is now >> safe to use the original. > > OK, so making the old value dead is required to preserve the desired > dataflow. > > But how should I use the special value that signaled "failure"? > > Obviously the user isn't supposed to simply replace 'val' with > > val = __builtin_speculation_safe_value (val); > > to make it speculation-proof. So - how should the user _use_ this > builtin? The docs do not say anything about this but says the > very confusing > > +The function may use target-dependent speculation tracking state to cause > +@var{failval} to be returned when it is known that speculative > +execution has incorrectly predicted a conditional branch operation. > > because speculation is about executing instructions as if they were > supposed to be executed. Once it is known the prediciton was wrong > no more "wrong" instructions will be executed but a previously > speculated instruction cannot know it was "falsely" speculated. > > Does the above try to say that the function may return failval if the > instruction is currently executed speculatively instead? That would > make sense to me. And return failval independent of if the speculation > later turns out to be correct or not. > >> You must use the sanitized version in future, >> not the unprotected version. >> >> >> So the usage is going to be more like: >> >> val = __builtin_speculation_safe_value (val); // Overwrite val with a >> sanitized version. >> >> You have to use the cleaned up version, the unclean version is still >> vulnerable to incorrect speculation. > > But then where does failval come into play? The above cannot be correct > if failval matters. failval only comes into play if the CPU is speculating incorrectly. It's just a safe value that some CPUs might use until such time as the speculation gets unwound. In most cases it can be zero. Perhaps a more concrete example would be something like. void **mem; void* f(unsigned untrusted) { if (untrusted < 100) return mem[untrusted]; return NULL; } This can be rewritten as either: void *mem; void* f(unsigned untrusted) { if (untrusted < 100) return mem[__builtin_speculation_safe_value (untrusted)]; return NULL; } if leaking mem[0] is not a problem; or if you're really paranoid: void* f(unsigned untrusted) { if (untrusted < 100) return *__builtin_speculation_safe_value (mem + untrusted); return NULL; } which becomes a NULL pointer dereference if the speculation is incorrect. The programmer doesn't need to test this code (any test would likely be useless anyway as the CPU would predict the outcome and speculate based on such a prediction). They do need to think a bit about what might leak if the CPU does miss-speculate, hence the two examples above. A more complex case, where you might want a different failure value can come up if you have a mask operation. A slightly convoluted example (derived from a real example we found in the Linux kernel) might be if (mode32) mask_from = 0x100000000; else mask_from = 0; ... // some code return mem[untrusted_val & (mask_from - 1)]; It would probably be better to place the speculation cleanup nearer the initialization of mask_from, especially if mask_from could be used more than once; but in that case 0 is less safe than any other (since 0-1 is all bits 1). In this case you might write: if (mode32) mask_from = 0x100000000; else mask_from = 0; mask_from = __builtin_speculation_safe_value (mask_from, 1); ... // some code return mem[untrusted_val & (mask_from - 1)]; And in this case if the speculation is incorrect the (mask_from - 1) expression becomes 0 and the whole range is protected. Note that speculation is fine if it is correct, we don't want to hold things up in that case since it will happen anyway. It's only a problem if the speculation is incorrect :-) > > Does the builtin try to say that iff the current instruction is speculated > then it will return failval, otherwise it will return val and thus following > code needs to handle 'failval' in a way that is safe? No. From an abstract machine sense the builtin only ever returns its first argument. The fail value only appears if the CPU guesses the direction of a branch and guesses *incorrectly* (incorrect speculation) > > Again, the wording "when it is known that speculative execution has > incorrectly predicted a conditional branch operation" is very confusing... > > Richard. > >> R. >> >>> >>> that is, the return value is only interesting in sofar as to whether it is equal >>> to val or the special value? >>> >>> That said, I wonder why we don't hide that details from the user and >>> provide a predicate instead. >>> >>> Richard. >>> >>>> There are seven patches in this set, as follows. >>>> >>>> 1) Introduces the new intrinsic __builtin_sepculation_safe_value. >>>> 2) Adds a basic hard barrier implementation for AArch32 (arm) state. >>>> 3) Adds a basic hard barrier implementation for AArch64 state. >>>> 4) Adds a new command-line option -mtrack-speculation (currently a no-op). >>>> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation. >>>> 6) Adds the new speculation tracking pass for AArch64 >>>> 7) Uses the new speculation tracking pass to generate CSDB-based barrier >>>> sequences >>>> >>>> I haven't added a speculation-tracking pass for AArch32 at this time. >>>> It is possible to do this, but would require quite a lot of rework for >>>> the arm backend due to the limited number of registers that are >>>> available. >>>> >>>> Although patch 6 is AArch64 specific, I'd appreciate a review from >>>> someone more familiar with the branch edge code than myself. There >>>> appear to be a number of tricky issues with more complex edges so I'd >>>> like a second opinion on that code in case I've missed an important >>>> case. >>>> >>>> R. >>>> >>>> >>>> >>>> Richard Earnshaw (7): >>>> Add __builtin_speculation_safe_value >>>> Arm - add speculation_barrier pattern >>>> AArch64 - add speculation barrier >>>> AArch64 - Add new option -mtrack-speculation >>>> AArch64 - disable CB[N]Z TB[N]Z when tracking speculation >>>> AArch64 - new pass to add conditional-branch speculation tracking >>>> AArch64 - use CSDB based sequences if speculation tracking is enabled >>>> >>>> gcc/builtin-types.def | 6 + >>>> gcc/builtins.c | 57 ++++ >>>> gcc/builtins.def | 20 ++ >>>> gcc/c-family/c-common.c | 143 +++++++++ >>>> gcc/c-family/c-cppbuiltin.c | 5 +- >>>> gcc/config.gcc | 2 +- >>>> gcc/config/aarch64/aarch64-passes.def | 1 + >>>> gcc/config/aarch64/aarch64-protos.h | 3 +- >>>> gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++ >>>> gcc/config/aarch64/aarch64.c | 88 +++++- >>>> gcc/config/aarch64/aarch64.md | 140 ++++++++- >>>> gcc/config/aarch64/aarch64.opt | 4 + >>>> gcc/config/aarch64/iterators.md | 3 + >>>> gcc/config/aarch64/t-aarch64 | 10 + >>>> gcc/config/arm/arm.md | 21 ++ >>>> gcc/config/arm/unspecs.md | 1 + >>>> gcc/doc/cpp.texi | 4 + >>>> gcc/doc/extend.texi | 29 ++ >>>> gcc/doc/invoke.texi | 10 +- >>>> gcc/doc/md.texi | 15 + >>>> gcc/doc/tm.texi | 20 ++ >>>> gcc/doc/tm.texi.in | 2 + >>>> gcc/target.def | 23 ++ >>>> gcc/targhooks.c | 27 ++ >>>> gcc/targhooks.h | 2 + >>>> gcc/testsuite/gcc.dg/spec-barrier-1.c | 40 +++ >>>> gcc/testsuite/gcc.dg/spec-barrier-2.c | 19 ++ >>>> gcc/testsuite/gcc.dg/spec-barrier-3.c | 13 + >>>> 28 files changed, 1191 insertions(+), 11 deletions(-) >>>> create mode 100644 gcc/config/aarch64/aarch64-speculation.cc >>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c >>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c >>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c >>
On Tue, Jul 10, 2018 at 12:53 PM Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote: > > On 10/07/18 11:10, Richard Biener wrote: > > On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists) > > <Richard.Earnshaw@arm.com> wrote: > >> > >> On 10/07/18 08:19, Richard Biener wrote: > >>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw > >>> <Richard.Earnshaw@arm.com> wrote: > >>>> > >>>> > >>>> The patches I posted earlier this year for mitigating against > >>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from > >>>> which it became obvious that a rethink was needed. This mail, and the > >>>> following patches attempt to address that feedback and present a new > >>>> approach to mitigating against this form of attack surface. > >>>> > >>>> There were two major issues with the original approach: > >>>> > >>>> - The speculation bounds were too tightly constrained - essentially > >>>> they had to represent and upper and lower bound on a pointer, or a > >>>> pointer offset. > >>>> - The speculation constraints could only cover the immediately preceding > >>>> branch, which often did not fit well with the structure of the existing > >>>> code. > >>>> > >>>> An additional criticism was that the shape of the intrinsic did not > >>>> fit particularly well with systems that used a single speculation > >>>> barrier that essentially had to wait until all preceding speculation > >>>> had to be resolved. > >>>> > >>>> To address all of the above, these patches adopt a new approach, based > >>>> in part on a posting by Chandler Carruth to the LLVM developers list > >>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), > >>>> but which we have extended to deal with inter-function speculation. > >>>> The patches divide the problem into two halves. > >>>> > >>>> The first half is some target-specific code to track the speculation > >>>> condition through the generated code to provide an internal variable > >>>> which can tell us whether or not the CPU's control flow speculation > >>>> matches the data flow calculations. The idea is that the internal > >>>> variable starts with the value TRUE and if the CPU's control flow > >>>> speculation ever causes a jump to the wrong block of code the variable > >>>> becomes false until such time as the incorrect control flow > >>>> speculation gets unwound. > >>>> > >>>> The second half is that a new intrinsic function is introduced that is > >>>> much simpler than we had before. The basic version of the intrinsic > >>>> is now simply: > >>>> > >>>> T var = __builtin_speculation_safe_value (T unsafe_var); > >>>> > >>>> Full details of the syntax can be found in the documentation patch, in > >>>> patch 1. In summary, when not speculating the intrinsic returns > >>>> unsafe_var; when speculating then if it can be shown that the > >>>> speculative flow has diverged from the intended control flow then zero > >>>> is returned. An optional second argument can be used to return an > >>>> alternative value to zero. The builtin may cause execution to pause > >>>> until the speculation state is resolved. > >>> > >>> So a trivial target implementation would be to emit a barrier and then > >>> it would always return unsafe_var and never zero. What I don't understand > >>> fully is what users should do here, thus what the value of ever returning > >>> "unsafe" is. Also I wonder why the API is forcing you to single-out a > >>> special value instead of doing > >>> > >>> bool safe = __builtin_speculation_safe_value_p (T unsafe_value); > >>> if (!safe) > >>> /* what now? */ > >>> > >>> I'm only guessing that the correct way to handle "unsafe" is basically > >>> > >>> while (__builtin_speculation_safe_value (val) == 0) > >>> ; > >>> > >>> use val, it's now safe > >> > >> No, a safe version of val is returned, not a bool telling you it is now > >> safe to use the original. > > > > OK, so making the old value dead is required to preserve the desired > > dataflow. > > > > But how should I use the special value that signaled "failure"? > > > > Obviously the user isn't supposed to simply replace 'val' with > > > > val = __builtin_speculation_safe_value (val); > > > > to make it speculation-proof. So - how should the user _use_ this > > builtin? The docs do not say anything about this but says the > > very confusing > > > > +The function may use target-dependent speculation tracking state to cause > > +@var{failval} to be returned when it is known that speculative > > +execution has incorrectly predicted a conditional branch operation. > > > > because speculation is about executing instructions as if they were > > supposed to be executed. Once it is known the prediciton was wrong > > no more "wrong" instructions will be executed but a previously > > speculated instruction cannot know it was "falsely" speculated. > > > > Does the above try to say that the function may return failval if the > > instruction is currently executed speculatively instead? That would > > make sense to me. And return failval independent of if the speculation > > later turns out to be correct or not. > > > >> You must use the sanitized version in future, > >> not the unprotected version. > >> > >> > >> So the usage is going to be more like: > >> > >> val = __builtin_speculation_safe_value (val); // Overwrite val with a > >> sanitized version. > >> > >> You have to use the cleaned up version, the unclean version is still > >> vulnerable to incorrect speculation. > > > > But then where does failval come into play? The above cannot be correct > > if failval matters. > > failval only comes into play if the CPU is speculating incorrectly. > It's just a safe value that some CPUs might use until such time as the > speculation gets unwound. In most cases it can be zero. Perhaps a more > concrete example would be something like. > > > void **mem; > > void* f(unsigned untrusted) > { > if (untrusted < 100) > return mem[untrusted]; > return NULL; > } > > This can be rewritten as either: > > void *mem; > > void* f(unsigned untrusted) > { > if (untrusted < 100) > return mem[__builtin_speculation_safe_value (untrusted)]; > return NULL; > } > > if leaking mem[0] is not a problem; or if you're really paranoid: > > void* f(unsigned untrusted) > { > if (untrusted < 100) > return *__builtin_speculation_safe_value (mem + untrusted); > return NULL; > } > > which becomes a NULL pointer dereference if the speculation is > incorrect. The programmer doesn't need to test this code (any test > would likely be useless anyway as the CPU would predict the outcome and > speculate based on such a prediction). They do need to think a bit > about what might leak if the CPU does miss-speculate, hence the two > examples above. > > A more complex case, where you might want a different failure value can > come up if you have a mask operation. A slightly convoluted example > (derived from a real example we found in the Linux kernel) might be > > if (mode32) > mask_from = 0x100000000; > else > mask_from = 0; > > ... // some code > > return mem[untrusted_val & (mask_from - 1)]; > > It would probably be better to place the speculation cleanup nearer the > initialization of mask_from, especially if mask_from could be used more > than once; but in that case 0 is less safe than any other (since 0-1 is > all bits 1). In this case you might write: > > if (mode32) > mask_from = 0x100000000; > else > mask_from = 0; > > mask_from = __builtin_speculation_safe_value (mask_from, 1); > > ... // some code > > return mem[untrusted_val & (mask_from - 1)]; > > And in this case if the speculation is incorrect the (mask_from - 1) > expression becomes 0 and the whole range is protected. > > Note that speculation is fine if it is correct, we don't want to hold > things up in that case since it will happen anyway. It's only a problem > if the speculation is incorrect :-) Ok, please amend at least the user-level documentation with one or two examples like above. > > > > > Does the builtin try to say that iff the current instruction is speculated > > then it will return failval, otherwise it will return val and thus following > > code needs to handle 'failval' in a way that is safe? > > No. From an abstract machine sense the builtin only ever returns its > first argument. The fail value only appears if the CPU guesses the > direction of a branch and guesses *incorrectly* (incorrect speculation) I understand how it works when the implementation issues a fence. How does the actual implementation (assembly) look like that makes the CPU know whether it speculated wrongly or not at the time the value is needed for further speculation given the *__bsfs (mem + offset) instruction _is_ speculated as well. Richard. > > > > Again, the wording "when it is known that speculative execution has > > incorrectly predicted a conditional branch operation" is very confusing... > > > > Richard. > > > >> R. > >> > >>> > >>> that is, the return value is only interesting in sofar as to whether it is equal > >>> to val or the special value? > >>> > >>> That said, I wonder why we don't hide that details from the user and > >>> provide a predicate instead. > >>> > >>> Richard. > >>> > >>>> There are seven patches in this set, as follows. > >>>> > >>>> 1) Introduces the new intrinsic __builtin_sepculation_safe_value. > >>>> 2) Adds a basic hard barrier implementation for AArch32 (arm) state. > >>>> 3) Adds a basic hard barrier implementation for AArch64 state. > >>>> 4) Adds a new command-line option -mtrack-speculation (currently a no-op). > >>>> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation. > >>>> 6) Adds the new speculation tracking pass for AArch64 > >>>> 7) Uses the new speculation tracking pass to generate CSDB-based barrier > >>>> sequences > >>>> > >>>> I haven't added a speculation-tracking pass for AArch32 at this time. > >>>> It is possible to do this, but would require quite a lot of rework for > >>>> the arm backend due to the limited number of registers that are > >>>> available. > >>>> > >>>> Although patch 6 is AArch64 specific, I'd appreciate a review from > >>>> someone more familiar with the branch edge code than myself. There > >>>> appear to be a number of tricky issues with more complex edges so I'd > >>>> like a second opinion on that code in case I've missed an important > >>>> case. > >>>> > >>>> R. > >>>> > >>>> > >>>> > >>>> Richard Earnshaw (7): > >>>> Add __builtin_speculation_safe_value > >>>> Arm - add speculation_barrier pattern > >>>> AArch64 - add speculation barrier > >>>> AArch64 - Add new option -mtrack-speculation > >>>> AArch64 - disable CB[N]Z TB[N]Z when tracking speculation > >>>> AArch64 - new pass to add conditional-branch speculation tracking > >>>> AArch64 - use CSDB based sequences if speculation tracking is enabled > >>>> > >>>> gcc/builtin-types.def | 6 + > >>>> gcc/builtins.c | 57 ++++ > >>>> gcc/builtins.def | 20 ++ > >>>> gcc/c-family/c-common.c | 143 +++++++++ > >>>> gcc/c-family/c-cppbuiltin.c | 5 +- > >>>> gcc/config.gcc | 2 +- > >>>> gcc/config/aarch64/aarch64-passes.def | 1 + > >>>> gcc/config/aarch64/aarch64-protos.h | 3 +- > >>>> gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++ > >>>> gcc/config/aarch64/aarch64.c | 88 +++++- > >>>> gcc/config/aarch64/aarch64.md | 140 ++++++++- > >>>> gcc/config/aarch64/aarch64.opt | 4 + > >>>> gcc/config/aarch64/iterators.md | 3 + > >>>> gcc/config/aarch64/t-aarch64 | 10 + > >>>> gcc/config/arm/arm.md | 21 ++ > >>>> gcc/config/arm/unspecs.md | 1 + > >>>> gcc/doc/cpp.texi | 4 + > >>>> gcc/doc/extend.texi | 29 ++ > >>>> gcc/doc/invoke.texi | 10 +- > >>>> gcc/doc/md.texi | 15 + > >>>> gcc/doc/tm.texi | 20 ++ > >>>> gcc/doc/tm.texi.in | 2 + > >>>> gcc/target.def | 23 ++ > >>>> gcc/targhooks.c | 27 ++ > >>>> gcc/targhooks.h | 2 + > >>>> gcc/testsuite/gcc.dg/spec-barrier-1.c | 40 +++ > >>>> gcc/testsuite/gcc.dg/spec-barrier-2.c | 19 ++ > >>>> gcc/testsuite/gcc.dg/spec-barrier-3.c | 13 + > >>>> 28 files changed, 1191 insertions(+), 11 deletions(-) > >>>> create mode 100644 gcc/config/aarch64/aarch64-speculation.cc > >>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c > >>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c > >>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c > >> >
On 10/07/18 12:21, Richard Biener wrote: > On Tue, Jul 10, 2018 at 12:53 PM Richard Earnshaw (lists) > <Richard.Earnshaw@arm.com> wrote: >> >> On 10/07/18 11:10, Richard Biener wrote: >>> On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists) >>> <Richard.Earnshaw@arm.com> wrote: >>>> >>>> On 10/07/18 08:19, Richard Biener wrote: >>>>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw >>>>> <Richard.Earnshaw@arm.com> wrote: >>>>>> >>>>>> >>>>>> The patches I posted earlier this year for mitigating against >>>>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >>>>>> which it became obvious that a rethink was needed. This mail, and the >>>>>> following patches attempt to address that feedback and present a new >>>>>> approach to mitigating against this form of attack surface. >>>>>> >>>>>> There were two major issues with the original approach: >>>>>> >>>>>> - The speculation bounds were too tightly constrained - essentially >>>>>> they had to represent and upper and lower bound on a pointer, or a >>>>>> pointer offset. >>>>>> - The speculation constraints could only cover the immediately preceding >>>>>> branch, which often did not fit well with the structure of the existing >>>>>> code. >>>>>> >>>>>> An additional criticism was that the shape of the intrinsic did not >>>>>> fit particularly well with systems that used a single speculation >>>>>> barrier that essentially had to wait until all preceding speculation >>>>>> had to be resolved. >>>>>> >>>>>> To address all of the above, these patches adopt a new approach, based >>>>>> in part on a posting by Chandler Carruth to the LLVM developers list >>>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>>>>> but which we have extended to deal with inter-function speculation. >>>>>> The patches divide the problem into two halves. >>>>>> >>>>>> The first half is some target-specific code to track the speculation >>>>>> condition through the generated code to provide an internal variable >>>>>> which can tell us whether or not the CPU's control flow speculation >>>>>> matches the data flow calculations. The idea is that the internal >>>>>> variable starts with the value TRUE and if the CPU's control flow >>>>>> speculation ever causes a jump to the wrong block of code the variable >>>>>> becomes false until such time as the incorrect control flow >>>>>> speculation gets unwound. >>>>>> >>>>>> The second half is that a new intrinsic function is introduced that is >>>>>> much simpler than we had before. The basic version of the intrinsic >>>>>> is now simply: >>>>>> >>>>>> T var = __builtin_speculation_safe_value (T unsafe_var); >>>>>> >>>>>> Full details of the syntax can be found in the documentation patch, in >>>>>> patch 1. In summary, when not speculating the intrinsic returns >>>>>> unsafe_var; when speculating then if it can be shown that the >>>>>> speculative flow has diverged from the intended control flow then zero >>>>>> is returned. An optional second argument can be used to return an >>>>>> alternative value to zero. The builtin may cause execution to pause >>>>>> until the speculation state is resolved. >>>>> >>>>> So a trivial target implementation would be to emit a barrier and then >>>>> it would always return unsafe_var and never zero. What I don't understand >>>>> fully is what users should do here, thus what the value of ever returning >>>>> "unsafe" is. Also I wonder why the API is forcing you to single-out a >>>>> special value instead of doing >>>>> >>>>> bool safe = __builtin_speculation_safe_value_p (T unsafe_value); >>>>> if (!safe) >>>>> /* what now? */ >>>>> >>>>> I'm only guessing that the correct way to handle "unsafe" is basically >>>>> >>>>> while (__builtin_speculation_safe_value (val) == 0) >>>>> ; >>>>> >>>>> use val, it's now safe >>>> >>>> No, a safe version of val is returned, not a bool telling you it is now >>>> safe to use the original. >>> >>> OK, so making the old value dead is required to preserve the desired >>> dataflow. >>> >>> But how should I use the special value that signaled "failure"? >>> >>> Obviously the user isn't supposed to simply replace 'val' with >>> >>> val = __builtin_speculation_safe_value (val); >>> >>> to make it speculation-proof. So - how should the user _use_ this >>> builtin? The docs do not say anything about this but says the >>> very confusing >>> >>> +The function may use target-dependent speculation tracking state to cause >>> +@var{failval} to be returned when it is known that speculative >>> +execution has incorrectly predicted a conditional branch operation. >>> >>> because speculation is about executing instructions as if they were >>> supposed to be executed. Once it is known the prediciton was wrong >>> no more "wrong" instructions will be executed but a previously >>> speculated instruction cannot know it was "falsely" speculated. >>> >>> Does the above try to say that the function may return failval if the >>> instruction is currently executed speculatively instead? That would >>> make sense to me. And return failval independent of if the speculation >>> later turns out to be correct or not. >>> >>>> You must use the sanitized version in future, >>>> not the unprotected version. >>>> >>>> >>>> So the usage is going to be more like: >>>> >>>> val = __builtin_speculation_safe_value (val); // Overwrite val with a >>>> sanitized version. >>>> >>>> You have to use the cleaned up version, the unclean version is still >>>> vulnerable to incorrect speculation. >>> >>> But then where does failval come into play? The above cannot be correct >>> if failval matters. >> >> failval only comes into play if the CPU is speculating incorrectly. >> It's just a safe value that some CPUs might use until such time as the >> speculation gets unwound. In most cases it can be zero. Perhaps a more >> concrete example would be something like. >> >> >> void **mem; >> >> void* f(unsigned untrusted) >> { >> if (untrusted < 100) >> return mem[untrusted]; >> return NULL; >> } >> >> This can be rewritten as either: >> >> void *mem; >> >> void* f(unsigned untrusted) >> { >> if (untrusted < 100) >> return mem[__builtin_speculation_safe_value (untrusted)]; >> return NULL; >> } >> >> if leaking mem[0] is not a problem; or if you're really paranoid: >> >> void* f(unsigned untrusted) >> { >> if (untrusted < 100) >> return *__builtin_speculation_safe_value (mem + untrusted); >> return NULL; >> } >> >> which becomes a NULL pointer dereference if the speculation is >> incorrect. The programmer doesn't need to test this code (any test >> would likely be useless anyway as the CPU would predict the outcome and >> speculate based on such a prediction). They do need to think a bit >> about what might leak if the CPU does miss-speculate, hence the two >> examples above. >> >> A more complex case, where you might want a different failure value can >> come up if you have a mask operation. A slightly convoluted example >> (derived from a real example we found in the Linux kernel) might be >> >> if (mode32) >> mask_from = 0x100000000; >> else >> mask_from = 0; >> >> ... // some code >> >> return mem[untrusted_val & (mask_from - 1)]; >> >> It would probably be better to place the speculation cleanup nearer the >> initialization of mask_from, especially if mask_from could be used more >> than once; but in that case 0 is less safe than any other (since 0-1 is >> all bits 1). In this case you might write: >> >> if (mode32) >> mask_from = 0x100000000; >> else >> mask_from = 0; >> >> mask_from = __builtin_speculation_safe_value (mask_from, 1); >> >> ... // some code >> >> return mem[untrusted_val & (mask_from - 1)]; >> >> And in this case if the speculation is incorrect the (mask_from - 1) >> expression becomes 0 and the whole range is protected. >> >> Note that speculation is fine if it is correct, we don't want to hold >> things up in that case since it will happen anyway. It's only a problem >> if the speculation is incorrect :-) > > Ok, please amend at least the user-level documentation with one or two > examples like above. > >> >>> >>> Does the builtin try to say that iff the current instruction is speculated >>> then it will return failval, otherwise it will return val and thus following >>> code needs to handle 'failval' in a way that is safe? >> >> No. From an abstract machine sense the builtin only ever returns its >> first argument. The fail value only appears if the CPU guesses the >> direction of a branch and guesses *incorrectly* (incorrect speculation) > > I understand how it works when the implementation issues a fence. > How does the actual implementation (assembly) look like that makes > the CPU know whether it speculated wrongly or not at the time the > value is needed for further speculation given the *__bsfs (mem + offset) > instruction _is_ speculated as well. > If you've not already read it, then I suggest you read https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper and the link I posted in the original mail to Chandler's posting on this subject. In essence, we start with a register (tracker) initialized to ~0. At every conditional branch we change b.cond tgt ... tgt: ... into b.cond tgt // tracker == ~cond ? tracker, 0 csel tracker, tracker, 0, ~cond ... tgt: // tracker == cond ? tracker, 0 csel tracker, tracker, 0, cond Now when we need to inhibit speculation we emit and safe_val, unsafe_val, tracker CSDB Assuming we want the default result of 0 if speculating unsafely. CSDB is a light-weight barrier that ensures that all preceding /data-processing/ instructions are using real results from earlier instructions: it will resolve any data-flow speculation, but not necessarily any control-flow speculation. Obviously the inserted instructions only appear on the edges between the blocks, but we take care of that during the insertion process. R. > Richard. > >>> >>> Again, the wording "when it is known that speculative execution has >>> incorrectly predicted a conditional branch operation" is very confusing... >>> >>> Richard. >>> >>>> R. >>>> >>>>> >>>>> that is, the return value is only interesting in sofar as to whether it is equal >>>>> to val or the special value? >>>>> >>>>> That said, I wonder why we don't hide that details from the user and >>>>> provide a predicate instead. >>>>> >>>>> Richard. >>>>> >>>>>> There are seven patches in this set, as follows. >>>>>> >>>>>> 1) Introduces the new intrinsic __builtin_sepculation_safe_value. >>>>>> 2) Adds a basic hard barrier implementation for AArch32 (arm) state. >>>>>> 3) Adds a basic hard barrier implementation for AArch64 state. >>>>>> 4) Adds a new command-line option -mtrack-speculation (currently a no-op). >>>>>> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation. >>>>>> 6) Adds the new speculation tracking pass for AArch64 >>>>>> 7) Uses the new speculation tracking pass to generate CSDB-based barrier >>>>>> sequences >>>>>> >>>>>> I haven't added a speculation-tracking pass for AArch32 at this time. >>>>>> It is possible to do this, but would require quite a lot of rework for >>>>>> the arm backend due to the limited number of registers that are >>>>>> available. >>>>>> >>>>>> Although patch 6 is AArch64 specific, I'd appreciate a review from >>>>>> someone more familiar with the branch edge code than myself. There >>>>>> appear to be a number of tricky issues with more complex edges so I'd >>>>>> like a second opinion on that code in case I've missed an important >>>>>> case. >>>>>> >>>>>> R. >>>>>> >>>>>> >>>>>> >>>>>> Richard Earnshaw (7): >>>>>> Add __builtin_speculation_safe_value >>>>>> Arm - add speculation_barrier pattern >>>>>> AArch64 - add speculation barrier >>>>>> AArch64 - Add new option -mtrack-speculation >>>>>> AArch64 - disable CB[N]Z TB[N]Z when tracking speculation >>>>>> AArch64 - new pass to add conditional-branch speculation tracking >>>>>> AArch64 - use CSDB based sequences if speculation tracking is enabled >>>>>> >>>>>> gcc/builtin-types.def | 6 + >>>>>> gcc/builtins.c | 57 ++++ >>>>>> gcc/builtins.def | 20 ++ >>>>>> gcc/c-family/c-common.c | 143 +++++++++ >>>>>> gcc/c-family/c-cppbuiltin.c | 5 +- >>>>>> gcc/config.gcc | 2 +- >>>>>> gcc/config/aarch64/aarch64-passes.def | 1 + >>>>>> gcc/config/aarch64/aarch64-protos.h | 3 +- >>>>>> gcc/config/aarch64/aarch64-speculation.cc | 494 ++++++++++++++++++++++++++++++ >>>>>> gcc/config/aarch64/aarch64.c | 88 +++++- >>>>>> gcc/config/aarch64/aarch64.md | 140 ++++++++- >>>>>> gcc/config/aarch64/aarch64.opt | 4 + >>>>>> gcc/config/aarch64/iterators.md | 3 + >>>>>> gcc/config/aarch64/t-aarch64 | 10 + >>>>>> gcc/config/arm/arm.md | 21 ++ >>>>>> gcc/config/arm/unspecs.md | 1 + >>>>>> gcc/doc/cpp.texi | 4 + >>>>>> gcc/doc/extend.texi | 29 ++ >>>>>> gcc/doc/invoke.texi | 10 +- >>>>>> gcc/doc/md.texi | 15 + >>>>>> gcc/doc/tm.texi | 20 ++ >>>>>> gcc/doc/tm.texi.in | 2 + >>>>>> gcc/target.def | 23 ++ >>>>>> gcc/targhooks.c | 27 ++ >>>>>> gcc/targhooks.h | 2 + >>>>>> gcc/testsuite/gcc.dg/spec-barrier-1.c | 40 +++ >>>>>> gcc/testsuite/gcc.dg/spec-barrier-2.c | 19 ++ >>>>>> gcc/testsuite/gcc.dg/spec-barrier-3.c | 13 + >>>>>> 28 files changed, 1191 insertions(+), 11 deletions(-) >>>>>> create mode 100644 gcc/config/aarch64/aarch64-speculation.cc >>>>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c >>>>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c >>>>>> create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c >>>> >>
> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote: > > On 10/07/18 00:13, Jeff Law wrote: >> On 07/09/2018 10:38 AM, Richard Earnshaw wrote: >>> >>> The patches I posted earlier this year for mitigating against >>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >>> which it became obvious that a rethink was needed. This mail, and the >>> following patches attempt to address that feedback and present a new >>> approach to mitigating against this form of attack surface. >>> >>> There were two major issues with the original approach: >>> >>> - The speculation bounds were too tightly constrained - essentially >>> they had to represent and upper and lower bound on a pointer, or a >>> pointer offset. >>> - The speculation constraints could only cover the immediately preceding >>> branch, which often did not fit well with the structure of the existing >>> code. >>> >>> An additional criticism was that the shape of the intrinsic did not >>> fit particularly well with systems that used a single speculation >>> barrier that essentially had to wait until all preceding speculation >>> had to be resolved. >> Right. I suggest the Intel and IBM reps chime in on the updated semantics. >> > > Yes, logically, this is a boolean tracker value. In practice we use ~0 > for true and 0 for false, so that we can simply use it as a mask > operation later. > > I hope this intrinsic will be even more acceptable than the one that > Bill Schmidt acked previously, it's even simpler than the version we had > last time. Yes, I think this looks quite good. Thanks! Thanks also for digging into the speculation tracking algorithm. This has good potential as a conservative opt-in approach. The obvious concern is whether performance will be acceptable even for apps that really want the protection. We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 numbers on a Skylake box. We saw geomean degradations of about 42% (int) and 33% (fp). (This was just one test, so caveat emptor.) This isn't terrible given the number of potential false positives and the early state of the algorithm, but it's still a lot from a customer perspective. I'll be interested in whether your interprocedural improvements are able to reduce the conservatism a bit. Thanks, Bill > >>> >>> To address all of the above, these patches adopt a new approach, based >>> in part on a posting by Chandler Carruth to the LLVM developers list >>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>> but which we have extended to deal with inter-function speculation. >>> The patches divide the problem into two halves. >> We're essentially turning the control dependency into a value that we >> can then use to munge the pointer or the resultant data. >> >>> >>> The first half is some target-specific code to track the speculation >>> condition through the generated code to provide an internal variable >>> which can tell us whether or not the CPU's control flow speculation >>> matches the data flow calculations. The idea is that the internal >>> variable starts with the value TRUE and if the CPU's control flow >>> speculation ever causes a jump to the wrong block of code the variable >>> becomes false until such time as the incorrect control flow >>> speculation gets unwound. >> Right. >> >> So one of the things that comes immediately to mind is you have to run >> this early enough that you can still get to all the control flow and >> build your predicates. Otherwise you have do undo stuff like >> conditional move generation. > > No, the opposite, in fact. We want to run this very late, at least on > Arm systems (AArch64 or AArch32). Conditional move instructions are > fine - they're data-flow operations, not control flow (in fact, that's > exactly what the control flow tracker instructions are). By running it > late we avoid disrupting any of the earlier optimization passes as well. > >> >> On the flip side, the earlier you do this mitigation, the more you have >> to worry about what the optimizers are going to do to the code later in >> the pipeline. It's almost guaranteed a naive implementation is going to >> muck this up since we can propagate the state of the condition into the >> arms which will make the predicate state a compile time constant. >> >> In fact this seems to be running into the area of pointer providence and >> some discussions we had around atomic a few years back. >> >> I also wonder if this could be combined with taint analysis to produce a >> much lower overhead solution in cases were developers have done analysis >> and know what objects are potentially under attacker control. So >> instead of analyzing everything, we can have a much narrower focus. > > Automatic application of the tracker to vulnerable variables would be > nice, but I haven't attempted to go there yet: at present I still rely > on the user to annotate code with the new intrinsic. > > That doesn't mean that we couldn't extend the overall approach later to > include automatic tracking. > >> >> The pointer munging could well run afoul of alias analysis engines that >> don't expect to be seeing those kind of operations. > > I think the pass runs late enough that it isn't a problem. > >> >> Anyway, just some initial high level thoughts. I'm sure there'll be >> more as I read the implementation. >> > > Thanks for starting to look at this so quickly. > > R. > >> >> Jeff
On 10/07/18 14:48, Bill Schmidt wrote: > >> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote: >> >> On 10/07/18 00:13, Jeff Law wrote: >>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote: >>>> >>>> The patches I posted earlier this year for mitigating against >>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >>>> which it became obvious that a rethink was needed. This mail, and the >>>> following patches attempt to address that feedback and present a new >>>> approach to mitigating against this form of attack surface. >>>> >>>> There were two major issues with the original approach: >>>> >>>> - The speculation bounds were too tightly constrained - essentially >>>> they had to represent and upper and lower bound on a pointer, or a >>>> pointer offset. >>>> - The speculation constraints could only cover the immediately preceding >>>> branch, which often did not fit well with the structure of the existing >>>> code. >>>> >>>> An additional criticism was that the shape of the intrinsic did not >>>> fit particularly well with systems that used a single speculation >>>> barrier that essentially had to wait until all preceding speculation >>>> had to be resolved. >>> Right. I suggest the Intel and IBM reps chime in on the updated semantics. >>> >> >> Yes, logically, this is a boolean tracker value. In practice we use ~0 >> for true and 0 for false, so that we can simply use it as a mask >> operation later. >> >> I hope this intrinsic will be even more acceptable than the one that >> Bill Schmidt acked previously, it's even simpler than the version we had >> last time. > > Yes, I think this looks quite good. Thanks! > > Thanks also for digging into the speculation tracking algorithm. This > has good potential as a conservative opt-in approach. The obvious > concern is whether performance will be acceptable even for apps > that really want the protection. > > We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 > numbers on a Skylake box. We saw geomean degradations of about > 42% (int) and 33% (fp). (This was just one test, so caveat emptor.) > This isn't terrible given the number of potential false positives and the > early state of the algorithm, but it's still a lot from a customer perspective. > I'll be interested in whether your interprocedural improvements are > able to reduce the conservatism a bit. > So I don't have any numbers for SPEC2006. I have some initial numbers for SPEC2000 when just adding the tracking code (so not applying the second part of the mitigation). In that case INT2000 is down by ~13% and FP2000 was by comparison almost in the noise (~2.4%). Applying the tracker value to all memory loads would push those numbers up significantly, I suspect. That's part of the reason for preferring the intrinsic rather than automatic mitigation: the intrinsic is much more targeted. R. > Thanks, > Bill >> >>>> >>>> To address all of the above, these patches adopt a new approach, based >>>> in part on a posting by Chandler Carruth to the LLVM developers list >>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>>> but which we have extended to deal with inter-function speculation. >>>> The patches divide the problem into two halves. >>> We're essentially turning the control dependency into a value that we >>> can then use to munge the pointer or the resultant data. >>> >>>> >>>> The first half is some target-specific code to track the speculation >>>> condition through the generated code to provide an internal variable >>>> which can tell us whether or not the CPU's control flow speculation >>>> matches the data flow calculations. The idea is that the internal >>>> variable starts with the value TRUE and if the CPU's control flow >>>> speculation ever causes a jump to the wrong block of code the variable >>>> becomes false until such time as the incorrect control flow >>>> speculation gets unwound. >>> Right. >>> >>> So one of the things that comes immediately to mind is you have to run >>> this early enough that you can still get to all the control flow and >>> build your predicates. Otherwise you have do undo stuff like >>> conditional move generation. >> >> No, the opposite, in fact. We want to run this very late, at least on >> Arm systems (AArch64 or AArch32). Conditional move instructions are >> fine - they're data-flow operations, not control flow (in fact, that's >> exactly what the control flow tracker instructions are). By running it >> late we avoid disrupting any of the earlier optimization passes as well. >> >>> >>> On the flip side, the earlier you do this mitigation, the more you have >>> to worry about what the optimizers are going to do to the code later in >>> the pipeline. It's almost guaranteed a naive implementation is going to >>> muck this up since we can propagate the state of the condition into the >>> arms which will make the predicate state a compile time constant. >>> >>> In fact this seems to be running into the area of pointer providence and >>> some discussions we had around atomic a few years back. >>> >>> I also wonder if this could be combined with taint analysis to produce a >>> much lower overhead solution in cases were developers have done analysis >>> and know what objects are potentially under attacker control. So >>> instead of analyzing everything, we can have a much narrower focus. >> >> Automatic application of the tracker to vulnerable variables would be >> nice, but I haven't attempted to go there yet: at present I still rely >> on the user to annotate code with the new intrinsic. >> >> That doesn't mean that we couldn't extend the overall approach later to >> include automatic tracking. >> >>> >>> The pointer munging could well run afoul of alias analysis engines that >>> don't expect to be seeing those kind of operations. >> >> I think the pass runs late enough that it isn't a problem. >> >>> >>> Anyway, just some initial high level thoughts. I'm sure there'll be >>> more as I read the implementation. >>> >> >> Thanks for starting to look at this so quickly. >> >> R. >> >>> >>> Jeff >
On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote: > On 10/07/18 00:13, Jeff Law wrote: >> On 07/09/2018 10:38 AM, Richard Earnshaw wrote: >>> >>> To address all of the above, these patches adopt a new approach, based >>> in part on a posting by Chandler Carruth to the LLVM developers list >>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>> but which we have extended to deal with inter-function speculation. >>> The patches divide the problem into two halves. >> We're essentially turning the control dependency into a value that we >> can then use to munge the pointer or the resultant data. >> >>> >>> The first half is some target-specific code to track the speculation >>> condition through the generated code to provide an internal variable >>> which can tell us whether or not the CPU's control flow speculation >>> matches the data flow calculations. The idea is that the internal >>> variable starts with the value TRUE and if the CPU's control flow >>> speculation ever causes a jump to the wrong block of code the variable >>> becomes false until such time as the incorrect control flow >>> speculation gets unwound. >> Right. >> >> So one of the things that comes immediately to mind is you have to run >> this early enough that you can still get to all the control flow and >> build your predicates. Otherwise you have do undo stuff like >> conditional move generation. > > No, the opposite, in fact. We want to run this very late, at least on > Arm systems (AArch64 or AArch32). Conditional move instructions are > fine - they're data-flow operations, not control flow (in fact, that's > exactly what the control flow tracker instructions are). By running it > late we avoid disrupting any of the earlier optimization passes as well. Ack. I looked at the aarch64 implementation after sending my message and it clearly runs very late. I haven't convinced myself that all the work generic parts of the compiler to rewrite and eliminate conditionals is safe. But even if it isn't, you're probably getting enough coverage to drastically reduce the attack surface. I'm going to have to think about the early transformations we make and how they interact here harder. But I think the general approach can dramatically reduce the attack surface. With running very late, as you noted, the big concern is edge insertions. I'm going to have to re-familiarize myself with all the rules there :-) I did note you stumbled on some of the issues in that space (what to do with calls that throw exceptions). Placement before the final bbro pass probably avoids a lot of pain. So the basic placement seems reasonable. And again, if we're missing something due to the effects of earlier passes, I still think you're reducing the attack surface in a meaningful way. > >> >> On the flip side, the earlier you do this mitigation, the more you have >> to worry about what the optimizers are going to do to the code later in >> the pipeline. It's almost guaranteed a naive implementation is going to >> muck this up since we can propagate the state of the condition into the >> arms which will make the predicate state a compile time constant. >> >> In fact this seems to be running into the area of pointer providence and >> some discussions we had around atomic a few years back. >> >> I also wonder if this could be combined with taint analysis to produce a >> much lower overhead solution in cases were developers have done analysis >> and know what objects are potentially under attacker control. So >> instead of analyzing everything, we can have a much narrower focus. > > Automatic application of the tracker to vulnerable variables would be > nice, but I haven't attempted to go there yet: at present I still rely > on the user to annotate code with the new intrinsic. ACK. My sense is we are going to want taint analysis. I think it'd be useful here and in other contexts. However, I don't think it necessarily needs to be a requirement to go forward. I'm going to review the atomic discussion we had a while back with the kernel folks as well as some pointer providence discussions I've had with Martin S. I can't put my finger on it yet, but I still have the sense there's some interactions here we want to at least be aware of. > > That doesn't mean that we couldn't extend the overall approach later to > include automatic tracking. Absolutely. > >> >> The pointer munging could well run afoul of alias analysis engines that >> don't expect to be seeing those kind of operations. > > I think the pass runs late enough that it isn't a problem. Yea, I think you're right. > >> >> Anyway, just some initial high level thoughts. I'm sure there'll be >> more as I read the implementation. >> > > Thanks for starting to look at this so quickly. NP. Your timing to come back to this is good. Jeff
On 07/10/2018 08:14 AM, Richard Earnshaw (lists) wrote: > On 10/07/18 14:48, Bill Schmidt wrote: >> >>> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote: >>> >>> On 10/07/18 00:13, Jeff Law wrote: >>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote: >>>>> >>>>> The patches I posted earlier this year for mitigating against >>>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >>>>> which it became obvious that a rethink was needed. This mail, and the >>>>> following patches attempt to address that feedback and present a new >>>>> approach to mitigating against this form of attack surface. >>>>> >>>>> There were two major issues with the original approach: >>>>> >>>>> - The speculation bounds were too tightly constrained - essentially >>>>> they had to represent and upper and lower bound on a pointer, or a >>>>> pointer offset. >>>>> - The speculation constraints could only cover the immediately preceding >>>>> branch, which often did not fit well with the structure of the existing >>>>> code. >>>>> >>>>> An additional criticism was that the shape of the intrinsic did not >>>>> fit particularly well with systems that used a single speculation >>>>> barrier that essentially had to wait until all preceding speculation >>>>> had to be resolved. >>>> Right. I suggest the Intel and IBM reps chime in on the updated semantics. >>>> >>> >>> Yes, logically, this is a boolean tracker value. In practice we use ~0 >>> for true and 0 for false, so that we can simply use it as a mask >>> operation later. >>> >>> I hope this intrinsic will be even more acceptable than the one that >>> Bill Schmidt acked previously, it's even simpler than the version we had >>> last time. >> >> Yes, I think this looks quite good. Thanks! >> >> Thanks also for digging into the speculation tracking algorithm. This >> has good potential as a conservative opt-in approach. The obvious >> concern is whether performance will be acceptable even for apps >> that really want the protection. >> >> We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 >> numbers on a Skylake box. We saw geomean degradations of about >> 42% (int) and 33% (fp). (This was just one test, so caveat emptor.) >> This isn't terrible given the number of potential false positives and the >> early state of the algorithm, but it's still a lot from a customer perspective. >> I'll be interested in whether your interprocedural improvements are >> able to reduce the conservatism a bit. >> > > So I don't have any numbers for SPEC2006. I have some initial numbers > for SPEC2000 when just adding the tracking code (so not applying the > second part of the mitigation). In that case INT2000 is down by ~13% > and FP2000 was by comparison almost in the noise (~2.4%). > > Applying the tracker value to all memory loads would push those numbers > up significantly, I suspect. That's part of the reason for preferring > the intrinsic rather than automatic mitigation: the intrinsic is much > more targeted. Right. Fully automatic without any "hints" is going to be very expensive, possibly prohibitively expensive. Using the intrinsic or exploiting some kind of taint analysis has the potential to drastically reduce the overhead. At least it seems like they should :-) jeff
On 07/10/2018 04:53 AM, Richard Earnshaw (lists) wrote: > On 10/07/18 11:10, Richard Biener wrote: >> On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists) >> <Richard.Earnshaw@arm.com> wrote: >>> >>> On 10/07/18 08:19, Richard Biener wrote: >>>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw >>>> <Richard.Earnshaw@arm.com> wrote: >>>>> >>>>> >>>>> The patches I posted earlier this year for mitigating against >>>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from >>>>> which it became obvious that a rethink was needed. This mail, and the >>>>> following patches attempt to address that feedback and present a new >>>>> approach to mitigating against this form of attack surface. >>>>> >>>>> There were two major issues with the original approach: >>>>> >>>>> - The speculation bounds were too tightly constrained - essentially >>>>> they had to represent and upper and lower bound on a pointer, or a >>>>> pointer offset. >>>>> - The speculation constraints could only cover the immediately preceding >>>>> branch, which often did not fit well with the structure of the existing >>>>> code. >>>>> >>>>> An additional criticism was that the shape of the intrinsic did not >>>>> fit particularly well with systems that used a single speculation >>>>> barrier that essentially had to wait until all preceding speculation >>>>> had to be resolved. >>>>> >>>>> To address all of the above, these patches adopt a new approach, based >>>>> in part on a posting by Chandler Carruth to the LLVM developers list >>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>>>> but which we have extended to deal with inter-function speculation. >>>>> The patches divide the problem into two halves. >>>>> >>>>> The first half is some target-specific code to track the speculation >>>>> condition through the generated code to provide an internal variable >>>>> which can tell us whether or not the CPU's control flow speculation >>>>> matches the data flow calculations. The idea is that the internal >>>>> variable starts with the value TRUE and if the CPU's control flow >>>>> speculation ever causes a jump to the wrong block of code the variable >>>>> becomes false until such time as the incorrect control flow >>>>> speculation gets unwound. >>>>> >>>>> The second half is that a new intrinsic function is introduced that is >>>>> much simpler than we had before. The basic version of the intrinsic >>>>> is now simply: >>>>> >>>>> T var = __builtin_speculation_safe_value (T unsafe_var); >>>>> >>>>> Full details of the syntax can be found in the documentation patch, in >>>>> patch 1. In summary, when not speculating the intrinsic returns >>>>> unsafe_var; when speculating then if it can be shown that the >>>>> speculative flow has diverged from the intended control flow then zero >>>>> is returned. An optional second argument can be used to return an >>>>> alternative value to zero. The builtin may cause execution to pause >>>>> until the speculation state is resolved. >>>> >>>> So a trivial target implementation would be to emit a barrier and then >>>> it would always return unsafe_var and never zero. What I don't understand >>>> fully is what users should do here, thus what the value of ever returning >>>> "unsafe" is. Also I wonder why the API is forcing you to single-out a >>>> special value instead of doing >>>> >>>> bool safe = __builtin_speculation_safe_value_p (T unsafe_value); >>>> if (!safe) >>>> /* what now? */ >>>> >>>> I'm only guessing that the correct way to handle "unsafe" is basically >>>> >>>> while (__builtin_speculation_safe_value (val) == 0) >>>> ; >>>> >>>> use val, it's now safe >>> >>> No, a safe version of val is returned, not a bool telling you it is now >>> safe to use the original. >> >> OK, so making the old value dead is required to preserve the desired >> dataflow. >> >> But how should I use the special value that signaled "failure"? >> >> Obviously the user isn't supposed to simply replace 'val' with >> >> val = __builtin_speculation_safe_value (val); >> >> to make it speculation-proof. So - how should the user _use_ this >> builtin? The docs do not say anything about this but says the >> very confusing >> >> +The function may use target-dependent speculation tracking state to cause >> +@var{failval} to be returned when it is known that speculative >> +execution has incorrectly predicted a conditional branch operation. >> >> because speculation is about executing instructions as if they were >> supposed to be executed. Once it is known the prediciton was wrong >> no more "wrong" instructions will be executed but a previously >> speculated instruction cannot know it was "falsely" speculated. >> >> Does the above try to say that the function may return failval if the >> instruction is currently executed speculatively instead? That would >> make sense to me. And return failval independent of if the speculation >> later turns out to be correct or not. >> >>> You must use the sanitized version in future, >>> not the unprotected version. >>> >>> >>> So the usage is going to be more like: >>> >>> val = __builtin_speculation_safe_value (val); // Overwrite val with a >>> sanitized version. >>> >>> You have to use the cleaned up version, the unclean version is still >>> vulnerable to incorrect speculation. >> >> But then where does failval come into play? The above cannot be correct >> if failval matters. > > failval only comes into play if the CPU is speculating incorrectly. > It's just a safe value that some CPUs might use until such time as the > speculation gets unwound. In most cases it can be zero. Perhaps a more > concrete example would be something like. The value really only matters in the sense that if the processor is speculating and using that value that the uses don't lead information leaks. That's why 0 is so useful here as safeval. Reading/writing *0 speculatively isn't going to populate the cache in a way that is useful to the attacker. Jeff
On 10/07/18 16:42, Jeff Law wrote: > On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote: >> On 10/07/18 00:13, Jeff Law wrote: >>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote: >>>> >>>> To address all of the above, these patches adopt a new approach, based >>>> in part on a posting by Chandler Carruth to the LLVM developers list >>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>>> but which we have extended to deal with inter-function speculation. >>>> The patches divide the problem into two halves. >>> We're essentially turning the control dependency into a value that we >>> can then use to munge the pointer or the resultant data. >>> >>>> >>>> The first half is some target-specific code to track the speculation >>>> condition through the generated code to provide an internal variable >>>> which can tell us whether or not the CPU's control flow speculation >>>> matches the data flow calculations. The idea is that the internal >>>> variable starts with the value TRUE and if the CPU's control flow >>>> speculation ever causes a jump to the wrong block of code the variable >>>> becomes false until such time as the incorrect control flow >>>> speculation gets unwound. >>> Right. >>> >>> So one of the things that comes immediately to mind is you have to run >>> this early enough that you can still get to all the control flow and >>> build your predicates. Otherwise you have do undo stuff like >>> conditional move generation. >> >> No, the opposite, in fact. We want to run this very late, at least on >> Arm systems (AArch64 or AArch32). Conditional move instructions are >> fine - they're data-flow operations, not control flow (in fact, that's >> exactly what the control flow tracker instructions are). By running it >> late we avoid disrupting any of the earlier optimization passes as well. > Ack. I looked at the aarch64 implementation after sending my message > and it clearly runs very late. > > I haven't convinced myself that all the work generic parts of the > compiler to rewrite and eliminate conditionals is safe. But even if it > isn't, you're probably getting enough coverage to drastically reduce the > attack surface. I'm going to have to think about the early > transformations we make and how they interact here harder. But I think > the general approach can dramatically reduce the attack surface. My argument here would be that we are concerned about speculation that the CPU does with the generated program. We're not particularly bothered about the abstract machine description it's based upon. As long as the earlier transforms lead to a valid translation (it hasn't removed a necessary bounds check) then running late is fine. I can't currently conceive a situation where the compiler would be able to remove a /necessary/ bounds check that could lead to unsafe speculation later on. A redundant bounds check removal shouldn't be a problem as the non-redundant check should remain and that will still get tracking code added. > > With running very late, as you noted, the big concern is edge > insertions. I'm going to have to re-familiarize myself with all the > rules there :-) I did note you stumbled on some of the issues in that > space (what to do with calls that throw exceptions). > > Placement before the final bbro pass probably avoids a lot of pain. So > the basic placement seems reasonable. And again, if we're missing > something due to the effects of earlier passes, I still think you're > reducing the attack surface in a meaningful way. > > > >> >>> >>> On the flip side, the earlier you do this mitigation, the more you have >>> to worry about what the optimizers are going to do to the code later in >>> the pipeline. It's almost guaranteed a naive implementation is going to >>> muck this up since we can propagate the state of the condition into the >>> arms which will make the predicate state a compile time constant. >>> >>> In fact this seems to be running into the area of pointer providence and >>> some discussions we had around atomic a few years back. >>> >>> I also wonder if this could be combined with taint analysis to produce a >>> much lower overhead solution in cases were developers have done analysis >>> and know what objects are potentially under attacker control. So >>> instead of analyzing everything, we can have a much narrower focus. >> >> Automatic application of the tracker to vulnerable variables would be >> nice, but I haven't attempted to go there yet: at present I still rely >> on the user to annotate code with the new intrinsic. > ACK. My sense is we are going to want taint analysis. I think it'd be > useful here and in other contexts. However, I don't think it > necessarily needs to be a requirement to go forward. > > I'm going to review the atomic discussion we had a while back with the > kernel folks as well as some pointer providence discussions I've had > with Martin S. I can't put my finger on it yet, but I still have the > sense there's some interactions here we want to at least be aware of. > >> >> That doesn't mean that we couldn't extend the overall approach later to >> include automatic tracking. > Absolutely. > >> >>> >>> The pointer munging could well run afoul of alias analysis engines that >>> don't expect to be seeing those kind of operations. >> >> I think the pass runs late enough that it isn't a problem. > Yea, I think you're right. > > >> >>> >>> Anyway, just some initial high level thoughts. I'm sure there'll be >>> more as I read the implementation. >>> >> >> Thanks for starting to look at this so quickly. > NP. Your timing to come back to this is good. > > Jeff >
On 07/10/2018 10:43 AM, Richard Earnshaw (lists) wrote: > On 10/07/18 16:42, Jeff Law wrote: >> On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote: >>> On 10/07/18 00:13, Jeff Law wrote: >>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote: >>>>> >>>>> To address all of the above, these patches adopt a new approach, based >>>>> in part on a posting by Chandler Carruth to the LLVM developers list >>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>>>> but which we have extended to deal with inter-function speculation. >>>>> The patches divide the problem into two halves. >>>> We're essentially turning the control dependency into a value that we >>>> can then use to munge the pointer or the resultant data. >>>> >>>>> >>>>> The first half is some target-specific code to track the speculation >>>>> condition through the generated code to provide an internal variable >>>>> which can tell us whether or not the CPU's control flow speculation >>>>> matches the data flow calculations. The idea is that the internal >>>>> variable starts with the value TRUE and if the CPU's control flow >>>>> speculation ever causes a jump to the wrong block of code the variable >>>>> becomes false until such time as the incorrect control flow >>>>> speculation gets unwound. >>>> Right. >>>> >>>> So one of the things that comes immediately to mind is you have to run >>>> this early enough that you can still get to all the control flow and >>>> build your predicates. Otherwise you have do undo stuff like >>>> conditional move generation. >>> >>> No, the opposite, in fact. We want to run this very late, at least on >>> Arm systems (AArch64 or AArch32). Conditional move instructions are >>> fine - they're data-flow operations, not control flow (in fact, that's >>> exactly what the control flow tracker instructions are). By running it >>> late we avoid disrupting any of the earlier optimization passes as well. >> Ack. I looked at the aarch64 implementation after sending my message >> and it clearly runs very late. >> >> I haven't convinced myself that all the work generic parts of the >> compiler to rewrite and eliminate conditionals is safe. But even if it >> isn't, you're probably getting enough coverage to drastically reduce the >> attack surface. I'm going to have to think about the early >> transformations we make and how they interact here harder. But I think >> the general approach can dramatically reduce the attack surface. > > My argument here would be that we are concerned about speculation that > the CPU does with the generated program. We're not particularly > bothered about the abstract machine description it's based upon. As > long as the earlier transforms lead to a valid translation (it hasn't > removed a necessary bounds check) then running late is fine. I'm thinking about obfuscation of the bounds check or the pointer or turning branchy into straightline code, possibly doing some speculation in the process, if-conversion and the like. For example hoist_adjacent_loads which results in speculative loads and likely a conditional move to select between the two loaded values. Or what if we've done something like if (x < maxval) res = *p; And we've turned that into t = *p; res = (x < maxval) ? t : res; That may be implemented as a conditional move at the RTL level, so protecting that may be nontrivial. In those examples the compiler itself has introduced the speculation. I can't find the conditional obfuscation I was looking for, so it's hard to rule it in our out as potentially problematical. WRT pointer obfuscation, we no longer propagate conditional equivalences very agressively, so it may be a non-issue in the end. But again, even with these concerns I think what you're doing cuts down the attack surface in meaningful ways. > > I can't currently conceive a situation where the compiler would be able > to remove a /necessary/ bounds check that could lead to unsafe > speculation later on. A redundant bounds check removal shouldn't be a > problem as the non-redundant check should remain and that will still get > tracking code added. It's less about removal and more about either compiler-generated speculation or obfuscation of the patterns you're looking for. jeff
On 11/07/18 21:46, Jeff Law wrote: > On 07/10/2018 10:43 AM, Richard Earnshaw (lists) wrote: >> On 10/07/18 16:42, Jeff Law wrote: >>> On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote: >>>> On 10/07/18 00:13, Jeff Law wrote: >>>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote: >>>>>> >>>>>> To address all of the above, these patches adopt a new approach, based >>>>>> in part on a posting by Chandler Carruth to the LLVM developers list >>>>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html), >>>>>> but which we have extended to deal with inter-function speculation. >>>>>> The patches divide the problem into two halves. >>>>> We're essentially turning the control dependency into a value that we >>>>> can then use to munge the pointer or the resultant data. >>>>> >>>>>> >>>>>> The first half is some target-specific code to track the speculation >>>>>> condition through the generated code to provide an internal variable >>>>>> which can tell us whether or not the CPU's control flow speculation >>>>>> matches the data flow calculations. The idea is that the internal >>>>>> variable starts with the value TRUE and if the CPU's control flow >>>>>> speculation ever causes a jump to the wrong block of code the variable >>>>>> becomes false until such time as the incorrect control flow >>>>>> speculation gets unwound. >>>>> Right. >>>>> >>>>> So one of the things that comes immediately to mind is you have to run >>>>> this early enough that you can still get to all the control flow and >>>>> build your predicates. Otherwise you have do undo stuff like >>>>> conditional move generation. >>>> >>>> No, the opposite, in fact. We want to run this very late, at least on >>>> Arm systems (AArch64 or AArch32). Conditional move instructions are >>>> fine - they're data-flow operations, not control flow (in fact, that's >>>> exactly what the control flow tracker instructions are). By running it >>>> late we avoid disrupting any of the earlier optimization passes as well. >>> Ack. I looked at the aarch64 implementation after sending my message >>> and it clearly runs very late. >>> >>> I haven't convinced myself that all the work generic parts of the >>> compiler to rewrite and eliminate conditionals is safe. But even if it >>> isn't, you're probably getting enough coverage to drastically reduce the >>> attack surface. I'm going to have to think about the early >>> transformations we make and how they interact here harder. But I think >>> the general approach can dramatically reduce the attack surface. >> >> My argument here would be that we are concerned about speculation that >> the CPU does with the generated program. We're not particularly >> bothered about the abstract machine description it's based upon. As >> long as the earlier transforms lead to a valid translation (it hasn't >> removed a necessary bounds check) then running late is fine. > I'm thinking about obfuscation of the bounds check or the pointer or > turning branchy into straightline code, possibly doing some speculation > in the process, if-conversion and the like. > > For example hoist_adjacent_loads which results in speculative loads and > likely a conditional move to select between the two loaded values. > > Or what if we've done something like > > if (x < maxval) > res = *p; > > And we've turned that into > > > t = *p; > res = (x < maxval) ? t : res; Hmm, interesting. But for that to be safe, the compiler would have to be able to prove that dereferencing p was safe even if x >= maxval, otherwise the run-time code could fault (so if there's any chance that it could point to something vulnerable, then there must also be a chance that it points to unmapped memory). Given that requirement, I don't think this case can be a specific concern, since the requirement implies that p must already be within some known bounds for the type of object it points to. R. > > > That may be implemented as a conditional move at the RTL level, so > protecting that may be nontrivial. > > In those examples the compiler itself has introduced the speculation. > > I can't find the conditional obfuscation I was looking for, so it's hard > to rule it in our out as potentially problematical. > > WRT pointer obfuscation, we no longer propagate conditional equivalences > very agressively, so it may be a non-issue in the end. > > But again, even with these concerns I think what you're doing cuts down > the attack surface in meaningful ways. > > > >> >> I can't currently conceive a situation where the compiler would be able >> to remove a /necessary/ bounds check that could lead to unsafe >> speculation later on. A redundant bounds check removal shouldn't be a >> problem as the non-redundant check should remain and that will still get >> tracking code added. > It's less about removal and more about either compiler-generated > speculation or obfuscation of the patterns you're looking for. > > > jeff > > > >