From patchwork Sat Aug 1 08:47:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 1339677 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.a=rsa-sha256 header.s=facebook header.b=dZRQys+Y; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BJdBZ6xKdz9sV9 for ; Sat, 1 Aug 2020 18:49:58 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728581AbgHAIty (ORCPT ); Sat, 1 Aug 2020 04:49:54 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:15168 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728484AbgHAItx (ORCPT ); Sat, 1 Aug 2020 04:49:53 -0400 Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.42/8.16.0.42) with SMTP id 0718mPNE022611 for ; Sat, 1 Aug 2020 01:49:50 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=WIgQmJTcjQAh1WZkR5Yt3/IhXJ/pNkVpepRij4ldUnk=; b=dZRQys+YRCMJEE+UCAQx77GSGd+RHBQXMZ76QTGDLhrxNEYSAOEJewNzf7S/NM/fPBMV vvkA8UEc9+53nVsJdmtVAbUeDq7Cij4FmbIxAwkIEF6hFKETdGppB9xl51OkkUaSpN47 m7zTjRqy0sMb0sL/HqpejIPhGT723LvllfA= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0001303.ppops.net with ESMTP id 32n42m05vt-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Sat, 01 Aug 2020 01:49:50 -0700 Received: from intmgw002.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Sat, 1 Aug 2020 01:49:49 -0700 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id A474562E53C3; Sat, 1 Aug 2020 01:47:25 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Song Liu Smtp-Origin-Hostname: devbig006.ftw2.facebook.com To: , , CC: , , , , , , , Song Liu Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 1/5] bpf: introduce BPF_PROG_TYPE_USER Date: Sat, 1 Aug 2020 01:47:17 -0700 Message-ID: <20200801084721.1812607-2-songliubraving@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200801084721.1812607-1-songliubraving@fb.com> References: <20200801084721.1812607-1-songliubraving@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-08-01_07:2020-07-31,2020-08-01 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxscore=0 bulkscore=0 suspectscore=0 mlxlogscore=992 spamscore=0 phishscore=0 adultscore=0 malwarescore=0 impostorscore=0 lowpriorityscore=0 clxscore=1015 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2008010068 X-FB-Internal: deliver Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org As of today, to trigger BPF program from user space, the common practise is to create a uprobe on a special function and calls that function. For example, bpftrace uses BEGIN_trigger and END_trigger for the BEGIN and END programs. However, uprobe is not ideal for this use case. First, uprobe uses trap, which adds non-trivial overhead. Second, uprobe requires calculating function offset at runtime, which is not very reliable. bpftrace has seen issues with this: https://github.com/iovisor/bpftrace/pull/1438 https://github.com/iovisor/bpftrace/issues/1440 This patch introduces a new BPF program type BPF_PROG_TYPE_USER, or "user program". User program is triggered via sys_bpf(BPF_PROG_TEST_RUN), which is significant faster than a trap. To make user program more flexible, we enabled the following features: 1. The user can specify on which cpu the program should run. If the target cpu is not current cpu, the program is triggered via IPI. 2. User can pass optional argument to user program. Currently, the argument can only be 5x u64 numbers. User program has access to helper functions in bpf_tracing_func_proto() and bpf_get_stack|stackid(). Signed-off-by: Song Liu Reported-by: kernel test robot Reported-by: kernel test robot Reported-by: kernel test robot --- include/linux/bpf_types.h | 2 + include/uapi/linux/bpf.h | 19 ++++++ kernel/bpf/syscall.c | 3 +- kernel/trace/bpf_trace.c | 121 +++++++++++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 19 ++++++ 5 files changed, 163 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index a52a5688418e5..3c52f3207aced 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -76,6 +76,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_EXT, bpf_extension, BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm, void *, void *) #endif /* CONFIG_BPF_LSM */ +BPF_PROG_TYPE(BPF_PROG_TYPE_USER, user, + void *, void *) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index eb5e0c38eb2cf..f6b9d4e7eeb4e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -190,6 +190,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_EXT, BPF_PROG_TYPE_LSM, BPF_PROG_TYPE_SK_LOOKUP, + BPF_PROG_TYPE_USER, }; enum bpf_attach_type { @@ -556,6 +557,12 @@ union bpf_attr { */ __aligned_u64 ctx_in; __aligned_u64 ctx_out; + __u32 cpu_plus; /* run this program on cpu + * (cpu_plus - 1). + * If cpu_plus == 0, run on + * current cpu. Only valid + * for BPF_PROG_TYPE_USER. + */ } test; struct { /* anonymous struct used by BPF_*_GET_*_ID */ @@ -4441,4 +4448,16 @@ struct bpf_sk_lookup { __u32 local_port; /* Host byte order */ }; +struct pt_regs; + +#define BPF_USER_PROG_MAX_ARGS 5 +struct bpf_user_prog_args { + __u64 args[BPF_USER_PROG_MAX_ARGS]; +}; + +struct bpf_user_prog_ctx { + struct pt_regs *regs; + __u64 args[BPF_USER_PROG_MAX_ARGS]; +}; + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index cd3d599e9e90e..f5a28fd8a9bc2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2078,6 +2078,7 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type) case BPF_PROG_TYPE_LSM: case BPF_PROG_TYPE_STRUCT_OPS: /* has access to struct sock */ case BPF_PROG_TYPE_EXT: /* extends any prog */ + case BPF_PROG_TYPE_USER: return true; default: return false; @@ -2969,7 +2970,7 @@ static int bpf_prog_query(const union bpf_attr *attr, } } -#define BPF_PROG_TEST_RUN_LAST_FIELD test.ctx_out +#define BPF_PROG_TEST_RUN_LAST_FIELD test.cpu_plus static int bpf_prog_test_run(const union bpf_attr *attr, union bpf_attr __user *uattr) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index cb91ef902cc43..cbe789bc1b986 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -16,6 +16,7 @@ #include #include +#include #include #include "trace_probe.h" @@ -1740,6 +1741,126 @@ const struct bpf_verifier_ops perf_event_verifier_ops = { const struct bpf_prog_ops perf_event_prog_ops = { }; +struct bpf_user_prog_test_run_info { + struct bpf_prog *prog; + struct bpf_user_prog_ctx ctx; + u32 retval; +}; + +static void +__bpf_prog_test_run_user(struct bpf_user_prog_test_run_info *info) +{ + rcu_read_lock(); + migrate_disable(); + info->retval = BPF_PROG_RUN(info->prog, &info->ctx); + migrate_enable(); + rcu_read_unlock(); +} + +static void _bpf_prog_test_run_user(void *data) +{ + struct bpf_user_prog_test_run_info *info = data; + + info->ctx.regs = get_irq_regs(); + __bpf_prog_test_run_user(info); +} + +static int bpf_prog_test_run_user(struct bpf_prog *prog, + const union bpf_attr *kattr, + union bpf_attr __user *uattr) +{ + void __user *data_in = u64_to_user_ptr(kattr->test.data_in); + __u32 data_size = kattr->test.data_size_in; + struct bpf_user_prog_test_run_info info; + int cpu = kattr->test.cpu_plus - 1; + int err; + + if (kattr->test.ctx_in || kattr->test.ctx_out || + kattr->test.duration || kattr->test.repeat || + kattr->test.data_out) + return -EINVAL; + + if ((data_in && !data_size) || (!data_in && data_size)) + return -EINVAL; + + /* if provided, data_in should be struct bpf_user_prog_args */ + if (data_size > 0 && data_size != sizeof(struct bpf_user_prog_args)) + return -EINVAL; + + if (kattr->test.data_size_in) { + if (copy_from_user(&info.ctx.args, data_in, + sizeof(struct bpf_user_prog_args))) + return -EFAULT; + } else { + memset(&info.ctx.args, 0, sizeof(struct bpf_user_prog_args)); + } + + info.prog = prog; + + if (!kattr->test.cpu_plus || cpu == smp_processor_id()) { + /* non-IPI, use regs from perf_fetch_caller_regs */ + info.ctx.regs = get_bpf_raw_tp_regs(); + if (IS_ERR(info.ctx.regs)) + return PTR_ERR(info.ctx.regs); + perf_fetch_caller_regs(info.ctx.regs); + __bpf_prog_test_run_user(&info); + put_bpf_raw_tp_regs(); + } else { + err = smp_call_function_single(cpu, _bpf_prog_test_run_user, + &info, 1); + if (err) + return err; + } + + if (copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32))) + return -EFAULT; + + return 0; +} + +static bool user_prog_is_valid_access(int off, int size, enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + const int size_u64 = sizeof(u64); + + if (off < 0 || off >= sizeof(struct bpf_user_prog_ctx)) + return false; + + switch (off) { + case bpf_ctx_range(struct bpf_user_prog_ctx, regs): + bpf_ctx_record_field_size(info, size_u64); + if (!bpf_ctx_narrow_access_ok(off, size, size_u64)) + return false; + break; + default: + break; + } + return true; +} + +static const struct bpf_func_proto * +user_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + switch (func_id) { + case BPF_FUNC_get_stackid: + return &bpf_get_stackid_proto; + case BPF_FUNC_get_stack: + return &bpf_get_stack_proto; + default: + return bpf_tracing_func_proto(func_id, prog); + } +} + +const struct bpf_verifier_ops user_verifier_ops = { + .get_func_proto = user_prog_func_proto, + .is_valid_access = user_prog_is_valid_access, +}; + +const struct bpf_prog_ops user_prog_ops = { + .test_run = bpf_prog_test_run_user, +}; + static DEFINE_MUTEX(bpf_event_mutex); #define BPF_TRACE_MAX_PROGS 64 diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index eb5e0c38eb2cf..f6b9d4e7eeb4e 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -190,6 +190,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_EXT, BPF_PROG_TYPE_LSM, BPF_PROG_TYPE_SK_LOOKUP, + BPF_PROG_TYPE_USER, }; enum bpf_attach_type { @@ -556,6 +557,12 @@ union bpf_attr { */ __aligned_u64 ctx_in; __aligned_u64 ctx_out; + __u32 cpu_plus; /* run this program on cpu + * (cpu_plus - 1). + * If cpu_plus == 0, run on + * current cpu. Only valid + * for BPF_PROG_TYPE_USER. + */ } test; struct { /* anonymous struct used by BPF_*_GET_*_ID */ @@ -4441,4 +4448,16 @@ struct bpf_sk_lookup { __u32 local_port; /* Host byte order */ }; +struct pt_regs; + +#define BPF_USER_PROG_MAX_ARGS 5 +struct bpf_user_prog_args { + __u64 args[BPF_USER_PROG_MAX_ARGS]; +}; + +struct bpf_user_prog_ctx { + struct pt_regs *regs; + __u64 args[BPF_USER_PROG_MAX_ARGS]; +}; + #endif /* _UAPI__LINUX_BPF_H__ */