[bpf,0/2] bpf: do not use bucket_lock for hashmap iterator

Message ID	20200902235340.2001300-1-yhs@fb.com
Headers	show Return-Path: <bpf-owner@vger.kernel.org> From: Yonghong Song <yhs@fb.com> To: <bpf@vger.kernel.org>, Lorenz Bauer <lmb@cloudflare.com>, Martin KaFai Lau <kafai@fb.com>, <netdev@vger.kernel.org> CC: Alexei Starovoitov <ast@fb.com>, Daniel Borkmann <daniel@iogearbox.net>, <kernel-team@fb.com> Subject: [PATCH bpf 0/2] bpf: do not use bucket_lock for hashmap iterator Date: Wed, 2 Sep 2020 16:53:40 -0700 Message-ID: <20200902235340.2001300-1-yhs@fb.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain Sender: bpf-owner@vger.kernel.org Precedence: bulk
Series	bpf: do not use bucket_lock for hashmap iterator \| expand [bpf,0/2] bpf: do not use bucket_lock for hashmap iterator [bpf,1/2] bpf: do not use bucket_lock for hashmap iterator [bpf,2/2] selftests/bpf: add bpf_{update,delete}_map_elem in hashmap iter program

Message ID

20200902235340.2001300-1-yhs@fb.com

Headers

From: Yonghong Song <yhs@fb.com>
To: <bpf@vger.kernel.org>, Lorenz Bauer <lmb@cloudflare.com>,
	Martin KaFai Lau <kafai@fb.com>, <netdev@vger.kernel.org>
CC: Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>, <kernel-team@fb.com>
Subject: [PATCH bpf 0/2] bpf: do not use bucket_lock for hashmap iterator
Date: Wed, 2 Sep 2020 16:53:40 -0700
Message-ID: <20200902235340.2001300-1-yhs@fb.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
Sender: bpf-owner@vger.kernel.org
Precedence: bulk

Series

bpf: do not use bucket_lock for hashmap iterator | expand

Message

Yonghong Song Sept. 2, 2020, 11:53 p.m. UTC

Currently, the bpf hashmap iterator takes a bucket_lock, a spin_lock,
before visiting each element in the bucket. This will cause a deadlock
if a map update/delete operates on an element with the same
bucket id of the visited map.

To avoid the deadlock, let us just use rcu_read_lock instead of
bucket_lock. This may result in visiting stale elements, missing some elements,
or repeating some elements, if concurrent map delete/update happens for the
same map. I think using rcu_read_lock is a reasonable compromise.
For users caring stale/missing/repeating element issues, bpf map batch
access syscall interface can be used.

Note that another approach is during bpf_iter link stage, we check
whether the iter program might be able to do update/delete to the visited
map. If it is, reject the link_create. Verifier needs to record whether
an update/delete operation happens for each map for this approach.
I just feel this checking is too specialized, hence still prefer
rcu_read_lock approach.

Patch #1 has the kernel implementation and Patch #2 added a selftest
which can trigger deadlock without Patch #1.

Yonghong Song (2):
  bpf: do not use bucket_lock for hashmap iterator
  selftests/bpf: add bpf_{update,delete}_map_elem in hashmap iter
    program

 kernel/bpf/hashtab.c                              | 15 ++++-----------
 .../selftests/bpf/progs/bpf_iter_bpf_hash_map.c   | 15 +++++++++++++++
 2 files changed, 19 insertions(+), 11 deletions(-)

Comments

Alexei Starovoitov Sept. 4, 2020, 12:44 a.m. UTC | #1

On Wed, Sep 2, 2020 at 4:54 PM Yonghong Song <yhs@fb.com> wrote:
>
> Currently, the bpf hashmap iterator takes a bucket_lock, a spin_lock,
> before visiting each element in the bucket. This will cause a deadlock
> if a map update/delete operates on an element with the same
> bucket id of the visited map.
>
> To avoid the deadlock, let us just use rcu_read_lock instead of
> bucket_lock. This may result in visiting stale elements, missing some elements,
> or repeating some elements, if concurrent map delete/update happens for the
> same map. I think using rcu_read_lock is a reasonable compromise.
> For users caring stale/missing/repeating element issues, bpf map batch
> access syscall interface can be used.
>
> Note that another approach is during bpf_iter link stage, we check
> whether the iter program might be able to do update/delete to the visited
> map. If it is, reject the link_create. Verifier needs to record whether
> an update/delete operation happens for each map for this approach.
> I just feel this checking is too specialized, hence still prefer
> rcu_read_lock approach.
>
> Patch #1 has the kernel implementation and Patch #2 added a selftest
> which can trigger deadlock without Patch #1.

Applied. Thanks