[2/2] arm64: bpf: add BPF XADD instruction

Message ID	1447195301-16757-3-git-send-email-yang.shi@linaro.org
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Yang Shi <yang.shi@linaro.org> To: ast@kernel.org, daniel@iogearbox.net, catalin.marinas@arm.com, will.deacon@arm.com Cc: zlim.lnx@gmail.com, xi.wang@gmail.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linaro-kernel@lists.linaro.org, yang.shi@linaro.org Subject: [PATCH 2/2] arm64: bpf: add BPF XADD instruction Date: Tue, 10 Nov 2015 14:41:41 -0800 Message-Id: <1447195301-16757-3-git-send-email-yang.shi@linaro.org> In-Reply-To: <1447195301-16757-1-git-send-email-yang.shi@linaro.org> References: <1447195301-16757-1-git-send-email-yang.shi@linaro.org> Sender: netdev-owner@vger.kernel.org Precedence: bulk

Yang Shi Nov. 10, 2015, 10:41 p.m. UTC

aarch64 doesn't have native support for XADD instruction, implement it by
the below instruction sequence:

Load (dst + off) to a register
Add src to it
Store it back to (dst + off)

Signed-off-by: Yang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
---
 arch/arm64/net/bpf_jit_comp.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Eric Dumazet Nov. 11, 2015, 12:08 a.m. UTC | #1

On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> aarch64 doesn't have native support for XADD instruction, implement it by
> the below instruction sequence:
> 
> Load (dst + off) to a register
> Add src to it
> Store it back to (dst + off)

Not really what is needed ?

See this BPF_XADD as an atomic_add() equivalent.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Yang Shi Nov. 11, 2015, 12:26 a.m. UTC | #2

On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>> aarch64 doesn't have native support for XADD instruction, implement it by
>> the below instruction sequence:
>>
>> Load (dst + off) to a register
>> Add src to it
>> Store it back to (dst + off)
>
> Not really what is needed ?
>
> See this BPF_XADD as an atomic_add() equivalent.

I see. Thanks. The documentation doesn't say too much about "exclusive" 
add. If so it should need load-acquire/store-release.

I will rework it.

Yang

>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexei Starovoitov Nov. 11, 2015, 12:42 a.m. UTC | #3

On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>aarch64 doesn't have native support for XADD instruction, implement it by
> >>the below instruction sequence:
> >>
> >>Load (dst + off) to a register
> >>Add src to it
> >>Store it back to (dst + off)
> >
> >Not really what is needed ?
> >
> >See this BPF_XADD as an atomic_add() equivalent.
> 
> I see. Thanks. The documentation doesn't say too much about "exclusive" add.
> If so it should need load-acquire/store-release.

I think doc is clear enough, but it can always be improved. Pls suggest a patch.
It's quite hard to write a test for atomicity in test_bpf framework, so
code review is the key. Eric, thanks for catching it!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zi Shen Lim Nov. 11, 2015, 2:52 a.m. UTC | #4

Yang,

On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>> >>aarch64 doesn't have native support for XADD instruction, implement it by
>> >>the below instruction sequence:

aarch64 supports atomic add in ARMv8.1.
For ARMv8(.0), please consider using LDXR/STXR sequence.

>> >>
>> >>Load (dst + off) to a register
>> >>Add src to it
>> >>Store it back to (dst + off)
>> >
>> >Not really what is needed ?
>> >
>> >See this BPF_XADD as an atomic_add() equivalent.
>>
>> I see. Thanks. The documentation doesn't say too much about "exclusive" add.
>> If so it should need load-acquire/store-release.
>
> I think doc is clear enough, but it can always be improved. Pls suggest a patch.
> It's quite hard to write a test for atomicity in test_bpf framework, so
> code review is the key. Eric, thanks for catching it!
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Nov. 11, 2015, 8:49 a.m. UTC | #5

On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >> >>aarch64 doesn't have native support for XADD instruction, implement it by
> >> >>the below instruction sequence:
> 
> aarch64 supports atomic add in ARMv8.1.
> For ARMv8(.0), please consider using LDXR/STXR sequence.

Is it worth optimizing for the 8.1 case? It would add a bit of complexity
to make the code depend on the CPU feature, but it's certainly doable.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Will Deacon Nov. 11, 2015, 10:24 a.m. UTC | #6

On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> > On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> > >> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> > >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> > >> >>aarch64 doesn't have native support for XADD instruction, implement it by
> > >> >>the below instruction sequence:
> > 
> > aarch64 supports atomic add in ARMv8.1.
> > For ARMv8(.0), please consider using LDXR/STXR sequence.
> 
> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> to make the code depend on the CPU feature, but it's certainly doable.

What's the atomicity required for? Put another way, what are we racing
with (I thought bpf was single-threaded)? Do we need to worry about
memory barriers?

Apologies if these are stupid questions, but all I could find was
samples/bpf/sock_example.c and it didn't help much :(

Will
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel Borkmann Nov. 11, 2015, 10:42 a.m. UTC | #7

On 11/11/2015 11:24 AM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
>> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
>>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by
>>>>>>> the below instruction sequence:
>>>
>>> aarch64 supports atomic add in ARMv8.1.
>>> For ARMv8(.0), please consider using LDXR/STXR sequence.
>>
>> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
>> to make the code depend on the CPU feature, but it's certainly doable.
>
> What's the atomicity required for? Put another way, what are we racing
> with (I thought bpf was single-threaded)? Do we need to worry about
> memory barriers?
>
> Apologies if these are stupid questions, but all I could find was
> samples/bpf/sock_example.c and it didn't help much :(

The equivalent code more readable in restricted C syntax (that can be
compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
built-in __sync_fetch_and_add() will be translated into a BPF_XADD
insn variant.

What you can race against is that an eBPF map can be _shared_ by
multiple eBPF programs that are attached somewhere in the system, and
they could all update a particular entry/counter from the map at the
same time.

Best,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Will Deacon Nov. 11, 2015, 11:58 a.m. UTC | #8

Hi Daniel,

On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
> On 11/11/2015 11:24 AM, Will Deacon wrote:
> >On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> >>On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> >>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> >>><alexei.starovoitov@gmail.com> wrote:
> >>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by
> >>>>>>>the below instruction sequence:
> >>>
> >>>aarch64 supports atomic add in ARMv8.1.
> >>>For ARMv8(.0), please consider using LDXR/STXR sequence.
> >>
> >>Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> >>to make the code depend on the CPU feature, but it's certainly doable.
> >
> >What's the atomicity required for? Put another way, what are we racing
> >with (I thought bpf was single-threaded)? Do we need to worry about
> >memory barriers?
> >
> >Apologies if these are stupid questions, but all I could find was
> >samples/bpf/sock_example.c and it didn't help much :(
> 
> The equivalent code more readable in restricted C syntax (that can be
> compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
> built-in __sync_fetch_and_add() will be translated into a BPF_XADD
> insn variant.

Yikes, so the memory-model for BPF is based around the deprecated GCC
__sync builtins, that inherit their semantics from ia64? Any reason not
to use the C11-compatible __atomic builtins[1] as a base?

> What you can race against is that an eBPF map can be _shared_ by
> multiple eBPF programs that are attached somewhere in the system, and
> they could all update a particular entry/counter from the map at the
> same time.

Ok, so it does sound like eBPF needs to define/choose a memory-model and
I worry that riding on the back of __sync isn't necessarily the right
thing to do, particularly as its fallen out of favour with the compiler
folks. On weakly-ordered architectures, it's also going to result in
heavy-weight barriers for all atomic operations.

Will

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel Borkmann Nov. 11, 2015, 12:21 p.m. UTC | #9

On 11/11/2015 12:58 PM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
>> On 11/11/2015 11:24 AM, Will Deacon wrote:
>>> On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
>>>> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
>>>>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>>>>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>>>>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>>>>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by
>>>>>>>>> the below instruction sequence:
>>>>>
>>>>> aarch64 supports atomic add in ARMv8.1.
>>>>> For ARMv8(.0), please consider using LDXR/STXR sequence.
>>>>
>>>> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
>>>> to make the code depend on the CPU feature, but it's certainly doable.
>>>
>>> What's the atomicity required for? Put another way, what are we racing
>>> with (I thought bpf was single-threaded)? Do we need to worry about
>>> memory barriers?
>>>
>>> Apologies if these are stupid questions, but all I could find was
>>> samples/bpf/sock_example.c and it didn't help much :(
>>
>> The equivalent code more readable in restricted C syntax (that can be
>> compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
>> built-in __sync_fetch_and_add() will be translated into a BPF_XADD
>> insn variant.
>
> Yikes, so the memory-model for BPF is based around the deprecated GCC
> __sync builtins, that inherit their semantics from ia64? Any reason not
> to use the C11-compatible __atomic builtins[1] as a base?

Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
from sock_example.c can be regarded as one possible equivalent program
section output from the compiler.

>> What you can race against is that an eBPF map can be _shared_ by
>> multiple eBPF programs that are attached somewhere in the system, and
>> they could all update a particular entry/counter from the map at the
>> same time.
>
> Ok, so it does sound like eBPF needs to define/choose a memory-model and
> I worry that riding on the back of __sync isn't necessarily the right
> thing to do, particularly as its fallen out of favour with the compiler
> folks. On weakly-ordered architectures, it's also going to result in
> heavy-weight barriers for all atomic operations.
>
> Will
>
> [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Will Deacon Nov. 11, 2015, 12:38 p.m. UTC | #10

On Wed, Nov 11, 2015 at 01:21:04PM +0100, Daniel Borkmann wrote:
> On 11/11/2015 12:58 PM, Will Deacon wrote:
> >On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
> >>On 11/11/2015 11:24 AM, Will Deacon wrote:
> >>>On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> >>>>On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> >>>>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> >>>>><alexei.starovoitov@gmail.com> wrote:
> >>>>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >>>>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >>>>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>>>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by
> >>>>>>>>>the below instruction sequence:
> >>>>>
> >>>>>aarch64 supports atomic add in ARMv8.1.
> >>>>>For ARMv8(.0), please consider using LDXR/STXR sequence.
> >>>>
> >>>>Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> >>>>to make the code depend on the CPU feature, but it's certainly doable.
> >>>
> >>>What's the atomicity required for? Put another way, what are we racing
> >>>with (I thought bpf was single-threaded)? Do we need to worry about
> >>>memory barriers?
> >>>
> >>>Apologies if these are stupid questions, but all I could find was
> >>>samples/bpf/sock_example.c and it didn't help much :(
> >>
> >>The equivalent code more readable in restricted C syntax (that can be
> >>compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
> >>built-in __sync_fetch_and_add() will be translated into a BPF_XADD
> >>insn variant.
> >
> >Yikes, so the memory-model for BPF is based around the deprecated GCC
> >__sync builtins, that inherit their semantics from ia64? Any reason not
> >to use the C11-compatible __atomic builtins[1] as a base?
> 
> Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
> gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
> keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
> interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
> and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
> from sock_example.c can be regarded as one possible equivalent program
> section output from the compiler.

Ok, so if I understand you correctly, then __sync_fetch_and_add() has
different semantics depending on the backend target. That seems counter
to the LLVM atomics Documentation:

  http://llvm.org/docs/Atomics.html

which specifically calls out the __sync_* primitives as being
sequentially-consistent and requiring barriers on ARM (which isn't the
case for atomic[64]_add in the kernel).

If we re-use the __sync_* naming scheme in the source language, I don't
think we can overlay our own semantics in the backend. The
__sync_fetch_and_add primitive is also expected to return the old value,
which doesn't appear to be the case for BPF_XADD.

Will
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Zijlstra Nov. 11, 2015, 12:58 p.m. UTC | #11

On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote:
> > Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
> > gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
> > keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
> > interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
> > and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
> > from sock_example.c can be regarded as one possible equivalent program
> > section output from the compiler.
> 
> Ok, so if I understand you correctly, then __sync_fetch_and_add() has
> different semantics depending on the backend target. That seems counter
> to the LLVM atomics Documentation:
> 
>   http://llvm.org/docs/Atomics.html
> 
> which specifically calls out the __sync_* primitives as being
> sequentially-consistent and requiring barriers on ARM (which isn't the
> case for atomic[64]_add in the kernel).
> 
> If we re-use the __sync_* naming scheme in the source language, I don't
> think we can overlay our own semantics in the backend. The
> __sync_fetch_and_add primitive is also expected to return the old value,
> which doesn't appear to be the case for BPF_XADD.

Yikes. That's double fail. Please don't do this.

If you use the __sync stuff (and I agree with Will, you should not) it
really _SHOULD_ be sequentially consistent, which means full barriers
all over the place.

And if you name something XADD (exchange and add, or fetch-add) then it
had better return the previous value.

atomic*_add() does neither.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel Borkmann Nov. 11, 2015, 3:52 p.m. UTC | #12

On 11/11/2015 01:58 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote:
>>> Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
>>> gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
>>> keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
>>> interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
>>> and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
>>> from sock_example.c can be regarded as one possible equivalent program
>>> section output from the compiler.
>>
>> Ok, so if I understand you correctly, then __sync_fetch_and_add() has
>> different semantics depending on the backend target. That seems counter
>> to the LLVM atomics Documentation:
>>
>>    http://llvm.org/docs/Atomics.html
>>
>> which specifically calls out the __sync_* primitives as being
>> sequentially-consistent and requiring barriers on ARM (which isn't the
>> case for atomic[64]_add in the kernel).
>>
>> If we re-use the __sync_* naming scheme in the source language, I don't
>> think we can overlay our own semantics in the backend. The
>> __sync_fetch_and_add primitive is also expected to return the old value,
>> which doesn't appear to be the case for BPF_XADD.
>
> Yikes. That's double fail. Please don't do this.
>
> If you use the __sync stuff (and I agree with Will, you should not) it
> really _SHOULD_ be sequentially consistent, which means full barriers
> all over the place.
>
> And if you name something XADD (exchange and add, or fetch-add) then it
> had better return the previous value.
>
> atomic*_add() does neither.

unsigned int ui;
unsigned long long ull;

void foo(void)
{
   (void) __sync_fetch_and_add(&ui, 1);
   (void) __sync_fetch_and_add(&ull, 1);
}

So clang front-end translates this snippet into intermediate
representation of ...

clang test.c -S -emit-llvm -o -
[...]
define void @foo() #0 {
   %1 = atomicrmw add i32* @ui, i32 1 seq_cst
   %2 = atomicrmw add i64* @ull, i64 1 seq_cst
   ret void
}
[...]

... which, if I see this correctly, then maps atomicrmw add {i32,i64}
in the BPF target into BPF_XADD as mentioned:

// Atomics
class XADD<bits<2> SizeOp, string OpcodeStr, PatFrag OpNode>
     : InstBPF<(outs GPR:$dst), (ins MEMri:$addr, GPR:$val),
               !strconcat(OpcodeStr, "\t$dst, $addr, $val"),
               [(set GPR:$dst, (OpNode ADDRri:$addr, GPR:$val))]> {
   bits<3> mode;
   bits<2> size;
   bits<4> src;
   bits<20> addr;

   let Inst{63-61} = mode;
   let Inst{60-59} = size;
   let Inst{51-48} = addr{19-16}; // base reg
   let Inst{55-52} = src;
   let Inst{47-32} = addr{15-0}; // offset

   let mode = 6;     // BPF_XADD
   let size = SizeOp;
   let BPFClass = 3; // BPF_STX
}

let Constraints = "$dst = $val" in {
def XADD32 : XADD<0, "xadd32", atomic_load_add_32>;
def XADD64 : XADD<3, "xadd64", atomic_load_add_64>;
// undefined def XADD16 : XADD<1, "xadd16", atomic_load_add_16>;
// undefined def XADD8  : XADD<2, "xadd8", atomic_load_add_8>;
}

I played a bit around with eBPF code to assign the __sync_fetch_and_add()
return value to a var and dump it to trace pipe, or use it as return code.
llvm compiles it (with the result assignment) and it looks like:

[...]
206: (b7) r3 = 3
207: (db) lock *(u64 *)(r0 +0) += r3
208: (bf) r1 = r10
209: (07) r1 += -16
210: (b7) r2 = 10
211: (85) call 6 // r3 dumped here
[...]

[...]
206: (b7) r5 = 3
207: (db) lock *(u64 *)(r0 +0) += r5
208: (bf) r1 = r10
209: (07) r1 += -16
210: (b7) r2 = 10
211: (b7) r3 = 43
212: (b7) r4 = 42
213: (85) call 6 // r5 dumped here
[...]

[...]
11: (b7) r0 = 3
12: (db) lock *(u64 *)(r1 +0) += r0
13: (95) exit // r0 returned here
[...]

What it seems is that we 'get back' the value (== 3 here in r3, r5, r0)
that we're adding, at least that's what seems to be generated wrt
register assignments. Hmm, the semantic differences of bpf target
should be documented somewhere for people writing eBPF programs to
be aware of.

Best,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Will Deacon Nov. 11, 2015, 4:23 p.m. UTC | #13

Hi Daniel,

Thanks for investigating this further.

On Wed, Nov 11, 2015 at 04:52:00PM +0100, Daniel Borkmann wrote:
> I played a bit around with eBPF code to assign the __sync_fetch_and_add()
> return value to a var and dump it to trace pipe, or use it as return code.
> llvm compiles it (with the result assignment) and it looks like:
> 
> [...]
> 206: (b7) r3 = 3
> 207: (db) lock *(u64 *)(r0 +0) += r3
> 208: (bf) r1 = r10
> 209: (07) r1 += -16
> 210: (b7) r2 = 10
> 211: (85) call 6 // r3 dumped here
> [...]
> 
> [...]
> 206: (b7) r5 = 3
> 207: (db) lock *(u64 *)(r0 +0) += r5
> 208: (bf) r1 = r10
> 209: (07) r1 += -16
> 210: (b7) r2 = 10
> 211: (b7) r3 = 43
> 212: (b7) r4 = 42
> 213: (85) call 6 // r5 dumped here
> [...]
> 
> [...]
> 11: (b7) r0 = 3
> 12: (db) lock *(u64 *)(r1 +0) += r0
> 13: (95) exit // r0 returned here
> [...]
> 
> What it seems is that we 'get back' the value (== 3 here in r3, r5, r0)
> that we're adding, at least that's what seems to be generated wrt
> register assignments. Hmm, the semantic differences of bpf target
> should be documented somewhere for people writing eBPF programs to
> be aware of.

If we're going to document it, a bug tracker might be a good place to
start. The behaviour, as it stands, is broken wrt the definition of the
__sync primitives. That is, there is no way to build __sync_fetch_and_add
out of BPF_XADD without changing its semantics.

We could fix this by either:

(1) Defining BPF_XADD to match __sync_fetch_and_add (including memory
    barriers).

(2) Introducing some new BPF_ atomics, that map to something like the
    C11 __atomic builtins and deprecating BPF_XADD in favour of these.

(3) Introducing new source-language intrinsics to match what BPF can do
    (unlikely to be popular).

As it stands, I'm not especially keen on adding BPF_XADD to the arm64
JIT backend until we have at least (1) and preferably (2) as well.

Will
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexei Starovoitov Nov. 11, 2015, 5:27 p.m. UTC | #14

On Wed, Nov 11, 2015 at 04:23:41PM +0000, Will Deacon wrote:
> 
> If we're going to document it, a bug tracker might be a good place to
> start. The behaviour, as it stands, is broken wrt the definition of the
> __sync primitives. That is, there is no way to build __sync_fetch_and_add
> out of BPF_XADD without changing its semantics.

BPF_XADD == atomic_add() in kernel. period.
we are not going to deprecate it or introduce something else.
Semantics of __sync* or atomic in C standard and/or gcc/llvm has
nothing to do with this.
arm64 JIT needs to JIT bpf_xadd insn equivalent to the code
of atomic_add() which is 'stadd' in armv8.1.
The cpu check can be done by jit and for older cpus just fall back
to interpreter. trivial.

> We could fix this by either:
> 
> (1) Defining BPF_XADD to match __sync_fetch_and_add (including memory
>     barriers).

nope.

> (2) Introducing some new BPF_ atomics, that map to something like the
>     C11 __atomic builtins and deprecating BPF_XADD in favour of these.

nope.

> (3) Introducing new source-language intrinsics to match what BPF can do
>     (unlikely to be popular).

llvm's __sync intrinsic is used temporarily until we have time to do
new intrinsic in llvm that matches kernel's atomic_add() properly.
It will be done similar to llvm-bpf load_byte/word intrinsics.
Note that we've been hiding it under lock_xadd() wrapper, like here:
https://github.com/iovisor/bcc/blob/master/examples/networking/tunnel_monitor/monitor.c#L130

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller Nov. 11, 2015, 5:35 p.m. UTC | #15

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Wed, 11 Nov 2015 09:27:00 -0800

> BPF_XADD == atomic_add() in kernel. period.
> we are not going to deprecate it or introduce something else.

Agreed, it makes no sense to try and tie C99 or whatever atomic
semantics to something that is already clearly defined to have
exactly kernel atomic_add() semantics.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Will Deacon Nov. 11, 2015, 5:44 p.m. UTC | #16

On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Date: Wed, 11 Nov 2015 09:27:00 -0800
> 
> > BPF_XADD == atomic_add() in kernel. period.
> > we are not going to deprecate it or introduce something else.
> 
> Agreed, it makes no sense to try and tie C99 or whatever atomic
> semantics to something that is already clearly defined to have
> exactly kernel atomic_add() semantics.

... and which is emitted by LLVM when asked to compile __sync_fetch_and_add,
which has clearly defined (yet conflicting) semantics.

If the discrepancy is in LLVM (and it sounds like it is), then I'll raise
a bug over there instead.

Will
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Zijlstra Nov. 11, 2015, 5:57 p.m. UTC | #17

On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Date: Wed, 11 Nov 2015 09:27:00 -0800
> 
> > BPF_XADD == atomic_add() in kernel. period.
> > we are not going to deprecate it or introduce something else.
> 
> Agreed, it makes no sense to try and tie C99 or whatever atomic
> semantics to something that is already clearly defined to have
> exactly kernel atomic_add() semantics.

Dave, this really doesn't make any sense to me. __sync primitives have
well defined semantics and (e)BPF is violating this.

Furthermore, the fetch_and_add (or XADD) name has well defined
semantics, which (e)BPF also violates.

Atomicy is hard enough as it is, backends giving random interpretations
to them isn't helping anybody.

It also baffles me that Alexei is seemingly unwilling to change/rev the
(e)BPF instructions, which would be invisible to the regular user, he
does want to change the language itself, which will impact all
'scripts'.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexei Starovoitov Nov. 11, 2015, 6:11 p.m. UTC | #18

On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Date: Wed, 11 Nov 2015 09:27:00 -0800
> > 
> > > BPF_XADD == atomic_add() in kernel. period.
> > > we are not going to deprecate it or introduce something else.
> > 
> > Agreed, it makes no sense to try and tie C99 or whatever atomic
> > semantics to something that is already clearly defined to have
> > exactly kernel atomic_add() semantics.
> 
> Dave, this really doesn't make any sense to me. __sync primitives have
> well defined semantics and (e)BPF is violating this.

bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
From the day one it meant to be atomic_add() as kernel does it.
I did piggy back on __sync in the llvm backend because it was the quick
and dirty way to move forward.
In retrospect I should have introduced a clean intrinstic for that instead,
but it's not too late to do it now. user space we can change at any time
unlike kernel.

> Furthermore, the fetch_and_add (or XADD) name has well defined
> semantics, which (e)BPF also violates.

bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.

> Atomicy is hard enough as it is, backends giving random interpretations
> to them isn't helping anybody.

no randomness. bpf_xadd == atomic_add() in kernel.
imo that is the simplest and cleanest intepretantion one can have, no?

> It also baffles me that Alexei is seemingly unwilling to change/rev the
> (e)BPF instructions, which would be invisible to the regular user, he
> does want to change the language itself, which will impact all
> 'scripts'.

well, we cannot change it in kernel because it's ABI.
I'm not against adding new insns. We definitely can, but let's figure out why?
Is anything broken? No. So what new insns make sense?
Add new one that does 'fetch_and_add' ? What is the real use case it
will be used for?
Adding new intrinsic to llvm is not a big deal. I'll add it as soon
as I have time to work on it or if somebody beats me to it I would be
glad to test it and apply it.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Zijlstra Nov. 11, 2015, 6:31 p.m. UTC | #19

On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Date: Wed, 11 Nov 2015 09:27:00 -0800
> > > 
> > > > BPF_XADD == atomic_add() in kernel. period.
> > > > we are not going to deprecate it or introduce something else.
> > > 
> > > Agreed, it makes no sense to try and tie C99 or whatever atomic
> > > semantics to something that is already clearly defined to have
> > > exactly kernel atomic_add() semantics.
> > 
> > Dave, this really doesn't make any sense to me. __sync primitives have
> > well defined semantics and (e)BPF is violating this.
> 
> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
> From the day one it meant to be atomic_add() as kernel does it.
> I did piggy back on __sync in the llvm backend because it was the quick
> and dirty way to move forward.
> In retrospect I should have introduced a clean intrinstic for that instead,
> but it's not too late to do it now. user space we can change at any time
> unlike kernel.

I would argue that breaking userspace (language in this case) is equally
bad. Programs that used to work will now no longer work.

> > Furthermore, the fetch_and_add (or XADD) name has well defined
> > semantics, which (e)BPF also violates.
> 
> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.

Then why the 'X'? The XADD name, does and always has meant: eXchange-ADD,
this means it must have a return value.

You using the XADD name for something that is not in fact XADD is just
wrong.

> > Atomicy is hard enough as it is, backends giving random interpretations
> > to them isn't helping anybody.
> 
> no randomness. 

You mean every other backend translating __sync_fetch_and_add()
differently than you isn't random on your part?

> bpf_xadd == atomic_add() in kernel.
> imo that is the simplest and cleanest intepretantion one can have, no?

Wrong though, if you'd named it BPF_ADD, sure, XADD, not so much. That
is 'randomly' co-opting something that has well defined meaning and
semantics with something else.

> > It also baffles me that Alexei is seemingly unwilling to change/rev the
> > (e)BPF instructions, which would be invisible to the regular user, he
> > does want to change the language itself, which will impact all
> > 'scripts'.
> 
> well, we cannot change it in kernel because it's ABI.

You can always rev it. Introduce a new set, and wait for users of the
old set to die, then remove it. We do that all the time with Linux ABI.

> I'm not against adding new insns. We definitely can, but let's figure out why?
> Is anything broken? No. 

Yes, __sync_fetch_and_add() is broken when pulled through the eBPF
backend.

> So what new insns make sense?

Depends a bit on how fancy you want to go. If you want to support weakly
ordered architectures at full speed you'll need more (and more
complexity) than if you decide to not go that way.

The simplest option would be a fully ordered compare-and-swap operation.
That is enough to implement everything else (at a cost). The other
extreme is a weak ll/sc with an optimizer pass recognising various forms
to translate into 'better' native instructions.

> Add new one that does 'fetch_and_add' ? What is the real use case it
> will be used for?

Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
example would be a reader-writer lock implementations. See
include/asm-generic/rwsem.h for examples.

> Adding new intrinsic to llvm is not a big deal. I'll add it as soon
> as I have time to work on it or if somebody beats me to it I would be
> glad to test it and apply it.

This isn't a speed coding contest. You want to think about this
properly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Zijlstra Nov. 11, 2015, 6:41 p.m. UTC | #20

On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > Adding new intrinsic to llvm is not a big deal. I'll add it as soon
> > as I have time to work on it or if somebody beats me to it I would be
> > glad to test it and apply it.
> 
> This isn't a speed coding contest. You want to think about this
> properly.

That is, I don't think you want to go add LLVM intrinsics at all. You
want to piggy back on the memory model work done by the C/C++11 people.

What you want to think about is what the memory model of your virtual
machine is and how many instructions you want to expose for that.

Concurrency is a right pain, a little time and effort now will safe
heaps of pain down the road.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Zijlstra Nov. 11, 2015, 6:44 p.m. UTC | #21

On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > Add new one that does 'fetch_and_add' ? What is the real use case it
> > will be used for?
> 
> Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> example would be a reader-writer lock implementations. See
> include/asm-generic/rwsem.h for examples.

Maybe a better example would be refcounting, where you free on 0.

	if (!fetch_add(&obj->ref, -1))
		free(obj);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Will Deacon Nov. 11, 2015, 6:46 p.m. UTC | #22

On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Date: Wed, 11 Nov 2015 09:27:00 -0800
> > > 
> > > > BPF_XADD == atomic_add() in kernel. period.
> > > > we are not going to deprecate it or introduce something else.
> > > 
> > > Agreed, it makes no sense to try and tie C99 or whatever atomic
> > > semantics to something that is already clearly defined to have
> > > exactly kernel atomic_add() semantics.
> > 
> > Dave, this really doesn't make any sense to me. __sync primitives have
> > well defined semantics and (e)BPF is violating this.
> 
> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
> From the day one it meant to be atomic_add() as kernel does it.
> I did piggy back on __sync in the llvm backend because it was the quick
> and dirty way to move forward.
> In retrospect I should have introduced a clean intrinstic for that instead,
> but it's not too late to do it now. user space we can change at any time
> unlike kernel.

But it's not just "user space", it's the source language definition!
I also don't see how you can change it now, without simply rejecting
the __sync primitives outright.

> > Furthermore, the fetch_and_add (or XADD) name has well defined
> > semantics, which (e)BPF also violates.
> 
> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.

Right, so it's just a misnomer.

> > Atomicy is hard enough as it is, backends giving random interpretations
> > to them isn't helping anybody.
> 
> no randomness. bpf_xadd == atomic_add() in kernel.
> imo that is the simplest and cleanest intepretantion one can have, no?

I don't really mind, as long as there is a semantic that everybody agrees
on. Really, I just want this to be consistent because memory models are
a PITA enough without having multiple interpretations flying around.

> > It also baffles me that Alexei is seemingly unwilling to change/rev the
> > (e)BPF instructions, which would be invisible to the regular user, he
> > does want to change the language itself, which will impact all
> > 'scripts'.
> 
> well, we cannot change it in kernel because it's ABI.
> I'm not against adding new insns. We definitely can, but let's figure out why?
> Is anything broken? No. So what new insns make sense?

If you end up needing a suite of atomics, I would suggest the __atomic
builtins because they are likely to be more portable and more flexible
than trying to use the kernel memory model outside of the environment
for which it was developed. However, I agree with you that we can cross
that bridge when we get there.

> Adding new intrinsic to llvm is not a big deal. I'll add it as soon
> as I have time to work on it or if somebody beats me to it I would be
> glad to test it and apply it.

I'm more interested in what you do about the existing intrinsic. Anyway,
I'll raise a ticket against LLVM so that they're aware (and maybe
somebody else will fix it :).

Will
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel Borkmann Nov. 11, 2015, 6:50 p.m. UTC | #23

On 11/11/2015 07:31 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote:
>> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
>>> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
>>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>>> Date: Wed, 11 Nov 2015 09:27:00 -0800
>>>>
>>>>> BPF_XADD == atomic_add() in kernel. period.
>>>>> we are not going to deprecate it or introduce something else.
>>>>
>>>> Agreed, it makes no sense to try and tie C99 or whatever atomic
>>>> semantics to something that is already clearly defined to have
>>>> exactly kernel atomic_add() semantics.
>>>
>>> Dave, this really doesn't make any sense to me. __sync primitives have
>>> well defined semantics and (e)BPF is violating this.
>>
>> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
>>  From the day one it meant to be atomic_add() as kernel does it.
>> I did piggy back on __sync in the llvm backend because it was the quick
>> and dirty way to move forward.
>> In retrospect I should have introduced a clean intrinstic for that instead,
>> but it's not too late to do it now. user space we can change at any time
>> unlike kernel.
>
> I would argue that breaking userspace (language in this case) is equally
> bad. Programs that used to work will now no longer work.

Well, on that note, it's not like you just change the target to bpf in your
Makefile and can compile (& load into the kernel) anything you want with it.
You do have to write small, restricted programs from scratch for a specific
use-case with the limited set of helper functions and intrinsics that are
available from the kernel. So I don't think that "Programs that used to work
will now no longer work." holds if you regard it as such.

>>> Furthermore, the fetch_and_add (or XADD) name has well defined
>>> semantics, which (e)BPF also violates.
>>
>> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.
>
> Then why the 'X'? The XADD name, does and always has meant: eXchange-ADD,
> this means it must have a return value.
>
> You using the XADD name for something that is not in fact XADD is just
> wrong.
>
>>> Atomicy is hard enough as it is, backends giving random interpretations
>>> to them isn't helping anybody.
>>
>> no randomness.
>
> You mean every other backend translating __sync_fetch_and_add()
> differently than you isn't random on your part?
>
>> bpf_xadd == atomic_add() in kernel.
>> imo that is the simplest and cleanest intepretantion one can have, no?
>
> Wrong though, if you'd named it BPF_ADD, sure, XADD, not so much. That
> is 'randomly' co-opting something that has well defined meaning and
> semantics with something else.
>
>>> It also baffles me that Alexei is seemingly unwilling to change/rev the
>>> (e)BPF instructions, which would be invisible to the regular user, he
>>> does want to change the language itself, which will impact all
>>> 'scripts'.
>>
>> well, we cannot change it in kernel because it's ABI.
>
> You can always rev it. Introduce a new set, and wait for users of the
> old set to die, then remove it. We do that all the time with Linux ABI.
>
>> I'm not against adding new insns. We definitely can, but let's figure out why?
>> Is anything broken? No.
>
> Yes, __sync_fetch_and_add() is broken when pulled through the eBPF
> backend.
>
>> So what new insns make sense?
>
> Depends a bit on how fancy you want to go. If you want to support weakly
> ordered architectures at full speed you'll need more (and more
> complexity) than if you decide to not go that way.
>
> The simplest option would be a fully ordered compare-and-swap operation.
> That is enough to implement everything else (at a cost). The other
> extreme is a weak ll/sc with an optimizer pass recognising various forms
> to translate into 'better' native instructions.
>
>> Add new one that does 'fetch_and_add' ? What is the real use case it
>> will be used for?
>
> Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> example would be a reader-writer lock implementations. See
> include/asm-generic/rwsem.h for examples.
>
>> Adding new intrinsic to llvm is not a big deal. I'll add it as soon
>> as I have time to work on it or if somebody beats me to it I would be
>> glad to test it and apply it.
>
> This isn't a speed coding contest. You want to think about this
> properly.
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Zijlstra Nov. 11, 2015, 6:54 p.m. UTC | #24

On Wed, Nov 11, 2015 at 07:44:27PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > > Add new one that does 'fetch_and_add' ? What is the real use case it
> > > will be used for?
> > 
> > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> > example would be a reader-writer lock implementations. See
> > include/asm-generic/rwsem.h for examples.
> 
> Maybe a better example would be refcounting, where you free on 0.
> 
> 	if (!fetch_add(&obj->ref, -1))
> 		free(obj);

Urgh, too used to the atomic_add_return(), which returns post op. That
wants to be:

	if (fetch_add(&obj->ref, -1) == 1)
		free(obj);

Note that I would very much recommend _against_ encoding the post-op
thing in instructions. It works for reversible operations (like add) but
is pointless for irreversible operations (like or).

That is, given or_return(), you cannot reconstruct the state
prior to the operation, so or_return() provides less information than
fetch_or().

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller Nov. 11, 2015, 7:01 p.m. UTC | #25

From: Will Deacon <will.deacon@arm.com>
Date: Wed, 11 Nov 2015 17:44:01 +0000

> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Date: Wed, 11 Nov 2015 09:27:00 -0800
>> 
>> > BPF_XADD == atomic_add() in kernel. period.
>> > we are not going to deprecate it or introduce something else.
>> 
>> Agreed, it makes no sense to try and tie C99 or whatever atomic
>> semantics to something that is already clearly defined to have
>> exactly kernel atomic_add() semantics.
> 
> ... and which is emitted by LLVM when asked to compile __sync_fetch_and_add,
> which has clearly defined (yet conflicting) semantics.

Alexei clearly stated that he knows about this issue and will fully
fix this up in LLVM.

What more do you need to hear from him once he's stated that he is
aware and is working on it?  Meanwhile you should make your JIT emit
what is expected, rather than arguing to change the semantics.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller Nov. 11, 2015, 7:01 p.m. UTC | #26

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Wed, 11 Nov 2015 10:11:33 -0800

> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
> From the day one it meant to be atomic_add() as kernel does it.

+1

> I did piggy back on __sync in the llvm backend because it was the quick
> and dirty way to move forward.
> In retrospect I should have introduced a clean intrinstic for that instead,
> but it's not too late to do it now. user space we can change at any time
> unlike kernel.

+1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Miller Nov. 11, 2015, 7:04 p.m. UTC | #27

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 11 Nov 2015 19:50:15 +0100

> Well, on that note, it's not like you just change the target to bpf
> in your Makefile and can compile (& load into the kernel) anything
> you want with it.  You do have to write small, restricted programs
> from scratch for a specific use-case with the limited set of helper
> functions and intrinsics that are available from the kernel. So I
> don't think that "Programs that used to work will now no longer
> work." holds if you regard it as such.

+1

Strict C language semantics do not apply here at all, we are talking
about purposfully built modules of "C like" code that have any
semantics we want and make the most sense for us.

Maybe BPF_XADD is unfortunately named, but this is tangental to
our ability to choose what atomic operations mean and what semantics
they match up to.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Zijlstra Nov. 11, 2015, 7:23 p.m. UTC | #28

On Wed, Nov 11, 2015 at 07:50:15PM +0100, Daniel Borkmann wrote:
> Well, on that note, it's not like you just change the target to bpf in your
> Makefile and can compile (& load into the kernel) anything you want with it.
> You do have to write small, restricted programs from scratch for a specific
> use-case with the limited set of helper functions and intrinsics that are
> available from the kernel. So I don't think that "Programs that used to work
> will now no longer work." holds if you regard it as such.

So I don't get this argument. If everything is so targeted, then why are
the BPF instructions an ABI.

If OTOH you're expected to be able to transfer these small proglets,
then too I would expect to transfer the source of these proglets.

You cannot argue both ways.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel Borkmann Nov. 11, 2015, 7:41 p.m. UTC | #29

On 11/11/2015 08:23 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 07:50:15PM +0100, Daniel Borkmann wrote:
>> Well, on that note, it's not like you just change the target to bpf in your
>> Makefile and can compile (& load into the kernel) anything you want with it.
>> You do have to write small, restricted programs from scratch for a specific
>> use-case with the limited set of helper functions and intrinsics that are
>> available from the kernel. So I don't think that "Programs that used to work
>> will now no longer work." holds if you regard it as such.
>
> So I don't get this argument. If everything is so targeted, then why are
> the BPF instructions an ABI.
>
> If OTOH you're expected to be able to transfer these small proglets,
> then too I would expect to transfer the source of these proglets.
>
> You cannot argue both ways.

Ohh, I think we were talking past each other. ;) So, yeah, you'd likely need
to add new intrinstics that then map to the existing BPF_XADD instructions,
and perhaps spill a warning when __sync_fetch_and_add() is being used to
advise the developer to switch to the new intrinstics instead. From kernel
ABI PoV nothing would change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexei Starovoitov Nov. 11, 2015, 7:55 p.m. UTC | #30

On Wed, Nov 11, 2015 at 07:54:15PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 07:44:27PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > > > Add new one that does 'fetch_and_add' ? What is the real use case it
> > > > will be used for?
> > > 
> > > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> > > example would be a reader-writer lock implementations. See
> > > include/asm-generic/rwsem.h for examples.
> > 
> > Maybe a better example would be refcounting, where you free on 0.
> > 
> > 	if (!fetch_add(&obj->ref, -1))
> > 		free(obj);
> 
> Urgh, too used to the atomic_add_return(), which returns post op. That
> wants to be:
> 
> 	if (fetch_add(&obj->ref, -1) == 1)
> 		free(obj);

this type of code will never be acceptable in bpf world.
If C code does cmpxchg-like things, it's clearly beyond bpf abilities.
There are no locks or support for locks in bpf design and will not be.
We don't want a program to grab a lock and then terminate automatically
because it did divide by zero.
Programs are not allowed to directly allocate/free memory either.
We don't want dangling pointers.
Therefore things like memory barriers, full set of atomics are not applicable
in bpf world.
The only goal for bpf_xadd (could have been named better, agreed) was to
do counters. Like counting packets or bytes or events. In all such cases
there is no need to do 'fetch' part.
Another reason for lack of 'fetch' part is simplifying JIT.
It's easier to emit 'atomic_add' equivalent than to emit 'atomic_add_return'.
The only shared data structure two programs can see is a map element.
They can increment counters via bpf_xadd or replace the whole map element
atomically via bpf_update_map_elem() helper. That's it.
If the program needs to grab the lock, do some writes and release it,
then probably bpf is not suitable for such use case.
The bpf programs should be "fast by design" meaning that there should
be no mechanisms in bpf architecture that would allow a program to slow
down other programs or the kernel in general.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Zijlstra Nov. 11, 2015, 10:21 p.m. UTC | #31

On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> Therefore things like memory barriers, full set of atomics are not applicable
> in bpf world.

There are still plenty of wait-free constructs one can make using them.

Say a barrier/rendezvous construct for knowing when an event has
happened on all CPUs.

But if you really do not want any of that, I suppose that is a valid
choice.

Is even privileged (e)BPF not allowed things like this? I was thinking
the strict no loops stuff was for unpriv (e)BPF only.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexei Starovoitov Nov. 11, 2015, 11:40 p.m. UTC | #32

On Wed, Nov 11, 2015 at 11:21:35PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> > Therefore things like memory barriers, full set of atomics are not applicable
> > in bpf world.
> 
> There are still plenty of wait-free constructs one can make using them.

yes, but all such lock-free algos are typically based on cmpxchg8b and
tight loop, so it would be very hard for verifier to proof termination
of such loops. I think when we'd need to add something like this, we'll
add new bpf insn that will be membarrier+cmpxhg8b+check+loop as
a single insn, so it cannot be misused.
I don't know of any concrete use case yet. All possible though.

> Say a barrier/rendezvous construct for knowing when an event has
> happened on all CPUs.
> 
> But if you really do not want any of that, I suppose that is a valid
> choice.

I do want it :) and I think in the future we'll add a bunch
of interesting stuff. May be including things like above. I just
don't want to rush things in just because x86 has such insn
or because gcc has a builtin for it.
Like we discussed adding popcnt insn. It can be useful in some cases,
but doesn't seem to worth the pain of adding it to interpreter, JITs
and llvm backends... as of today... May be tomorrow it will be must have.

> Is even privileged (e)BPF not allowed things like this? I was thinking
> the strict no loops stuff was for unpriv (e)BPF only.

the only difference between unpriv and priv is the ability to send
all values (including kernel addresses) to user space (like tracing
needs to see all registers). The rest is the same.
root should never crash the kernel as well. If we relax even little bit
for root then the whole bpf stuff is no better than kernel module.

btw, support for mini loops was requested many times in the past.
I guess we'd have to add something like this, but it's tricky.
Mainly because control flow graph analysis becomes much more complicated.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Peter Zijlstra Nov. 12, 2015, 8:57 a.m. UTC | #33

On Wed, Nov 11, 2015 at 03:40:15PM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 11, 2015 at 11:21:35PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> > > Therefore things like memory barriers, full set of atomics are not applicable
> > > in bpf world.
> > 
> > There are still plenty of wait-free constructs one can make using them.
> 
> yes, but all such lock-free algos are typically based on cmpxchg8b and
> tight loop, so it would be very hard for verifier to proof termination
> of such loops. I think when we'd need to add something like this, we'll
> add new bpf insn that will be membarrier+cmpxhg8b+check+loop as
> a single insn, so it cannot be misused.
> I don't know of any concrete use case yet. All possible though.

So this is where the 'unconditional' atomic ops come in handy.

Like the x86: xchg, lock {xadd,add,sub,inc,dec,or,and,xor}

Those do not have a loop, and then you can create truly wait-free
things; even some applications of cmpxchg do not actually need the loop.

But this class of wait-free constructs is indeed significantly smaller
than the class of lock-less constructs.

> btw, support for mini loops was requested many times in the past.
> I guess we'd have to add something like this, but it's tricky.
> Mainly because control flow graph analysis becomes much more complicated.

Agreed, that does sound like an 'interesting' problem :-)

Something like:

atomic_op(ptr, f)
{
	for (;;) {
		val = *ptr;
		new = f(val)
		old = cmpxchg(ptr, val, new);
		if (old == val)
			break;

		cpu_relax();
	}
}

might be castable as an instruction I suppose, but I'm not sure you have
function references in (e)BPF.

The above is 'sane' if f is sane (although there is a
starvation case, which is why things like sparc (iirc) need an
increasing backoff instead of cpu_relax()).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[2/2] arm64: bpf: add BPF XADD instruction

Commit Message

Comments

Patch