diff mbox series

[bpf-next,v12,1/2] bpf: add new helper get_file_path for mapping a file descriptor to a pathname

Message ID 0117d6e17ba8b3b1273e5a964f87a71c1b2d8741.1576381512.git.ethercflow@gmail.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series bpf: adding get_file_path helper | expand

Commit Message

Wenbo Zhang Dec. 15, 2019, 4:01 a.m. UTC
When people want to identify which file system files are being opened,
read, and written to, they can use this helper with file descriptor as
input to achieve this goal. Other pseudo filesystems are also supported.

This requirement is mainly discussed here:

  https://github.com/iovisor/bcc/issues/237

v11->v12: addressed Alexei's feedback
- only allow tracepoints to make sure it won't dead lock

v10->v11: addressed Al and Alexei's feedback
- fix missing fput()

v9->v10: addressed Andrii's feedback
- send this patch together with the patch selftests as one patch series

v8->v9:
- format helper description

v7->v8: addressed Alexei's feedback
- use fget_raw instead of fdget_raw, as fdget_raw is only used inside fs/
- ensure we're in user context which is safe fot the help to run
- filter unmountable pseudo filesystem, because they don't have real path
- supplement the description of this helper function

v6->v7:
- fix missing signed-off-by line

v5->v6: addressed Andrii's feedback
- avoid unnecessary goto end by having two explicit returns

v4->v5: addressed Andrii and Daniel's feedback
- rename bpf_fd2path to bpf_get_file_path to be consistent with other
helper's names
- when fdget_raw fails, set ret to -EBADF instead of -EINVAL
- remove fdput from fdget_raw's error path
- use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
into the buffer or an error code if the path was too long
- modify the normal path's return value to return copied string length
including NUL
- update this helper description's Return bits.

v3->v4: addressed Daniel's feedback
- fix missing fdput()
- move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
- move fd2path's test code to another patch
- add comment to explain why use fdget_raw instead of fdget

v2->v3: addressed Yonghong's feedback
- remove unnecessary LOCKDOWN_BPF_READ
- refactor error handling section for enhanced readability
- provide a test case in tools/testing/selftests/bpf

v1->v2: addressed Daniel's feedback
- fix backward compatibility
- add this helper description
- fix signed-off name

Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
---
 include/uapi/linux/bpf.h       | 29 +++++++++++++-
 kernel/trace/bpf_trace.c       | 70 ++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 29 +++++++++++++-
 3 files changed, 126 insertions(+), 2 deletions(-)

Comments

Yonghong Song Dec. 15, 2019, 4:05 p.m. UTC | #1
On 12/14/19 8:01 PM, Wenbo Zhang wrote:
> When people want to identify which file system files are being opened,
> read, and written to, they can use this helper with file descriptor as
> input to achieve this goal. Other pseudo filesystems are also supported.
> 
> This requirement is mainly discussed here:
> 
>    https://github.com/iovisor/bcc/issues/237
> 
> v11->v12: addressed Alexei's feedback
> - only allow tracepoints to make sure it won't dead lock
> 
> v10->v11: addressed Al and Alexei's feedback
> - fix missing fput()
> 
> v9->v10: addressed Andrii's feedback
> - send this patch together with the patch selftests as one patch series
> 
> v8->v9:
> - format helper description
> 
> v7->v8: addressed Alexei's feedback
> - use fget_raw instead of fdget_raw, as fdget_raw is only used inside fs/
> - ensure we're in user context which is safe fot the help to run
> - filter unmountable pseudo filesystem, because they don't have real path
> - supplement the description of this helper function
> 
> v6->v7:
> - fix missing signed-off-by line
> 
> v5->v6: addressed Andrii's feedback
> - avoid unnecessary goto end by having two explicit returns
> 
> v4->v5: addressed Andrii and Daniel's feedback
> - rename bpf_fd2path to bpf_get_file_path to be consistent with other
> helper's names
> - when fdget_raw fails, set ret to -EBADF instead of -EINVAL
> - remove fdput from fdget_raw's error path
> - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
> into the buffer or an error code if the path was too long
> - modify the normal path's return value to return copied string length
> including NUL
> - update this helper description's Return bits.
> 
> v3->v4: addressed Daniel's feedback
> - fix missing fdput()
> - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
> - move fd2path's test code to another patch
> - add comment to explain why use fdget_raw instead of fdget
> 
> v2->v3: addressed Yonghong's feedback
> - remove unnecessary LOCKDOWN_BPF_READ
> - refactor error handling section for enhanced readability
> - provide a test case in tools/testing/selftests/bpf
> 
> v1->v2: addressed Daniel's feedback
> - fix backward compatibility
> - add this helper description
> - fix signed-off name
> 
> Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
> ---
>   include/uapi/linux/bpf.h       | 29 +++++++++++++-
>   kernel/trace/bpf_trace.c       | 70 ++++++++++++++++++++++++++++++++++
>   tools/include/uapi/linux/bpf.h | 29 +++++++++++++-
>   3 files changed, 126 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index dbbcf0b02970..71d9705df120 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2821,6 +2821,32 @@ union bpf_attr {
>    * 	Return
>    * 		On success, the strictly positive length of the string,	including
>    * 		the trailing NUL character. On error, a negative value.
> + *
> + * int bpf_get_file_path(char *path, u32 size, int fd)
> + *	Description
> + *		Get **file** atrribute from the current task by *fd*, then call
> + *		**d_path** to get it's absolute path and copy it as string into
> + *		*path* of *size*. Notice the **path** don't support unmountable
> + *		pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> + *		The *size* must be strictly positive. On success, the helper
> + *		makes sure that the *path* is NUL-terminated, and the buffer
> + *		could be:
> + *		- a regular full path (include mountable fs eg: /proc, /sys)
> + *		- a regular full path with "(deleted)" at the end.

Let us say with " (deleted)" is appended to be consistent with comments
in d_path() and is more clear to user what the format will looks like.

> + *		On failure, it is filled with zeroes.
> + *	Return
> + *		On success, returns the length of the copied string INCLUDING
> + *		the trailing NUL.

trailing '\0'.

> + *
> + *		On failure, the returned value is one of the following:
> + *
> + *		**-EPERM** if no permission to get the path (eg: in irq ctx).
> + *
> + *		**-EBADF** if *fd* is invalid.
> + *
> + *		**-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> + *
> + *		**-ENAMETOOLONG** if full path is longer than *size*
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2938,7 +2964,8 @@ union bpf_attr {
>   	FN(probe_read_user),		\
>   	FN(probe_read_kernel),		\
>   	FN(probe_read_user_str),	\
> -	FN(probe_read_kernel_str),
> +	FN(probe_read_kernel_str),	\
> +	FN(get_file_path),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index e5ef4ae9edb5..db9c0ec46a5d 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -762,6 +762,72 @@ static const struct bpf_func_proto bpf_send_signal_proto = {
>   	.arg1_type	= ARG_ANYTHING,
>   };
>   
> +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> +{
> +	struct file *f;
> +	char *p;
> +	int ret = -EBADF;
> +
> +	/* Ensure we're in user context which is safe for the helper to
> +	 * run. This helper has no business in a kthread.
> +	 */
> +	if (unlikely(in_interrupt() ||
> +		     current->flags & (PF_KTHREAD | PF_EXITING))) {
> +		ret = -EPERM;
> +		goto error;
> +	}
> +
> +	/* Use fget_raw instead of fget to support O_PATH, and it doesn't
> +	 * have any sleepable code, so it's ok to be here.
> +	 */
> +	f = fget_raw(fd);
> +	if (!f)
> +		goto error;
> +
> +	/* For unmountable pseudo filesystem, it seems to have no meaning
> +	 * to get their fake paths as they don't have path, and to be no
> +	 * way to validate this function pointer can be always safe to call
> +	 * in the current context.
> +	 */
> +	if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname) {
> +		ret = -EINVAL;
> +		fput(f);
> +		goto error;
> +	}
> +
> +	/* After filter unmountable pseudo filesytem, d_path won't call
> +	 * dentry->d_op->d_name(), the normally path doesn't have any
> +	 * sleepable code, and despite it uses the current macro to get
> +	 * fs_struct (current->fs), we've already ensured we're in user
> +	 * context, so it's ok to be here.
> +	 */
> +	p = d_path(&f->f_path, dst, size);
> +	if (IS_ERR(p)) {
> +		ret = PTR_ERR(p);
> +		fput(f);
> +		goto error;
> +	}
> +
> +	ret = strlen(p);
> +	memmove(dst, p, ret);
> +	dst[ret++] = '\0';

nit: you could do memmove(dst, p, ret + 1)?

> +	fput(f);
> +	return ret;

The description says the return value length including trailing '\0'.
The above 'ret' does not include trailing '\0'.

> +
> +error:
> +	memset(dst, '0', size);
> +	return ret;
> +}
> +
> +static const struct bpf_func_proto bpf_get_file_path_proto = {
> +	.func       = bpf_get_file_path,
> +	.gpl_only   = true,
> +	.ret_type   = RET_INTEGER,
> +	.arg1_type  = ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type  = ARG_CONST_SIZE,
> +	.arg3_type  = ARG_ANYTHING,
> +};
> +
>   static const struct bpf_func_proto *
>   tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   {
> @@ -953,6 +1019,8 @@ tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   		return &bpf_get_stackid_proto_tp;
>   	case BPF_FUNC_get_stack:
>   		return &bpf_get_stack_proto_tp;
> +	case BPF_FUNC_get_file_path:
> +		return &bpf_get_file_path_proto;
>   	default:
>   		return tracing_func_proto(func_id, prog);
>   	}
> @@ -1146,6 +1214,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   		return &bpf_get_stackid_proto_raw_tp;
>   	case BPF_FUNC_get_stack:
>   		return &bpf_get_stack_proto_raw_tp;
> +	case BPF_FUNC_get_file_path:
> +		return &bpf_get_file_path_proto;
>   	default:
>   		return tracing_func_proto(func_id, prog);
>   	}
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index dbbcf0b02970..71d9705df120 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2821,6 +2821,32 @@ union bpf_attr {
>    * 	Return
>    * 		On success, the strictly positive length of the string,	including
>    * 		the trailing NUL character. On error, a negative value.
> + *
> + * int bpf_get_file_path(char *path, u32 size, int fd)
> + *	Description
> + *		Get **file** atrribute from the current task by *fd*, then call
> + *		**d_path** to get it's absolute path and copy it as string into
> + *		*path* of *size*. Notice the **path** don't support unmountable
> + *		pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> + *		The *size* must be strictly positive. On success, the helper
> + *		makes sure that the *path* is NUL-terminated, and the buffer
> + *		could be:
> + *		- a regular full path (include mountable fs eg: /proc, /sys)
> + *		- a regular full path with "(deleted)" at the end.

ditto

> + *		On failure, it is filled with zeroes.
> + *	Return
> + *		On success, returns the length of the copied string INCLUDING
> + *		the trailing NUL.

ditto

> + *
> + *		On failure, the returned value is one of the following:
> + *
> + *		**-EPERM** if no permission to get the path (eg: in irq ctx).
> + *
> + *		**-EBADF** if *fd* is invalid.
> + *
> + *		**-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> + *
> + *		**-ENAMETOOLONG** if full path is longer than *size*
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2938,7 +2964,8 @@ union bpf_attr {
>   	FN(probe_read_user),		\
>   	FN(probe_read_kernel),		\
>   	FN(probe_read_user_str),	\
> -	FN(probe_read_kernel_str),
> +	FN(probe_read_kernel_str),	\
> +	FN(get_file_path),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
>
Yonghong Song Dec. 15, 2019, 4:10 p.m. UTC | #2
On 12/14/19 8:01 PM, Wenbo Zhang wrote:
> When people want to identify which file system files are being opened,
> read, and written to, they can use this helper with file descriptor as
> input to achieve this goal. Other pseudo filesystems are also supported.
> 
> This requirement is mainly discussed here:
> 
>    https://github.com/iovisor/bcc/issues/237
> 
> v11->v12: addressed Alexei's feedback
> - only allow tracepoints to make sure it won't dead lock
> 
> v10->v11: addressed Al and Alexei's feedback
> - fix missing fput()
> 
> v9->v10: addressed Andrii's feedback
> - send this patch together with the patch selftests as one patch series
> 
> v8->v9:
> - format helper description
> 
> v7->v8: addressed Alexei's feedback
> - use fget_raw instead of fdget_raw, as fdget_raw is only used inside fs/
> - ensure we're in user context which is safe fot the help to run
> - filter unmountable pseudo filesystem, because they don't have real path
> - supplement the description of this helper function
> 
> v6->v7:
> - fix missing signed-off-by line
> 
> v5->v6: addressed Andrii's feedback
> - avoid unnecessary goto end by having two explicit returns
> 
> v4->v5: addressed Andrii and Daniel's feedback
> - rename bpf_fd2path to bpf_get_file_path to be consistent with other
> helper's names
> - when fdget_raw fails, set ret to -EBADF instead of -EINVAL
> - remove fdput from fdget_raw's error path
> - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
> into the buffer or an error code if the path was too long
> - modify the normal path's return value to return copied string length
> including NUL
> - update this helper description's Return bits.
> 
> v3->v4: addressed Daniel's feedback
> - fix missing fdput()
> - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
> - move fd2path's test code to another patch
> - add comment to explain why use fdget_raw instead of fdget
> 
> v2->v3: addressed Yonghong's feedback
> - remove unnecessary LOCKDOWN_BPF_READ
> - refactor error handling section for enhanced readability
> - provide a test case in tools/testing/selftests/bpf
> 
> v1->v2: addressed Daniel's feedback
> - fix backward compatibility
> - add this helper description
> - fix signed-off name
> 
> Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
> ---
>   include/uapi/linux/bpf.h       | 29 +++++++++++++-
>   kernel/trace/bpf_trace.c       | 70 ++++++++++++++++++++++++++++++++++
>   tools/include/uapi/linux/bpf.h | 29 +++++++++++++-
>   3 files changed, 126 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index dbbcf0b02970..71d9705df120 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2821,6 +2821,32 @@ union bpf_attr {
>    * 	Return
>    * 		On success, the strictly positive length of the string,	including
>    * 		the trailing NUL character. On error, a negative value.
> + *
> + * int bpf_get_file_path(char *path, u32 size, int fd)
> + *	Description
> + *		Get **file** atrribute from the current task by *fd*, then call
> + *		**d_path** to get it's absolute path and copy it as string into
> + *		*path* of *size*. Notice the **path** don't support unmountable
> + *		pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> + *		The *size* must be strictly positive. On success, the helper
> + *		makes sure that the *path* is NUL-terminated, and the buffer
> + *		could be:
> + *		- a regular full path (include mountable fs eg: /proc, /sys)
> + *		- a regular full path with "(deleted)" at the end.
> + *		On failure, it is filled with zeroes.
> + *	Return
> + *		On success, returns the length of the copied string INCLUDING
> + *		the trailing NUL.
> + *
> + *		On failure, the returned value is one of the following:
> + *
> + *		**-EPERM** if no permission to get the path (eg: in irq ctx).
> + *
> + *		**-EBADF** if *fd* is invalid.
> + *
> + *		**-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> + *
> + *		**-ENAMETOOLONG** if full path is longer than *size*
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2938,7 +2964,8 @@ union bpf_attr {
>   	FN(probe_read_user),		\
>   	FN(probe_read_kernel),		\
>   	FN(probe_read_user_str),	\
> -	FN(probe_read_kernel_str),
> +	FN(probe_read_kernel_str),	\
> +	FN(get_file_path),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index e5ef4ae9edb5..db9c0ec46a5d 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -762,6 +762,72 @@ static const struct bpf_func_proto bpf_send_signal_proto = {
>   	.arg1_type	= ARG_ANYTHING,
>   };
>   
> +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> +{
> +	struct file *f;
> +	char *p;
> +	int ret = -EBADF;

please try to use reverse Christmas tree for declarations.

> +
> +	/* Ensure we're in user context which is safe for the helper to
> +	 * run. This helper has no business in a kthread.
> +	 */
> +	if (unlikely(in_interrupt() ||
> +		     current->flags & (PF_KTHREAD | PF_EXITING))) {
> +		ret = -EPERM;
> +		goto error;
> +	}
> +
> +	/* Use fget_raw instead of fget to support O_PATH, and it doesn't
> +	 * have any sleepable code, so it's ok to be here.
> +	 */
> +	f = fget_raw(fd);
> +	if (!f)
> +		goto error;
> +
> +	/* For unmountable pseudo filesystem, it seems to have no meaning
> +	 * to get their fake paths as they don't have path, and to be no
> +	 * way to validate this function pointer can be always safe to call
> +	 * in the current context.
> +	 */
> +	if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname) {
> +		ret = -EINVAL;
> +		fput(f);
> +		goto error;
> +	}
> +
> +	/* After filter unmountable pseudo filesytem, d_path won't call
> +	 * dentry->d_op->d_name(), the normally path doesn't have any
> +	 * sleepable code, and despite it uses the current macro to get
> +	 * fs_struct (current->fs), we've already ensured we're in user
> +	 * context, so it's ok to be here.
> +	 */
> +	p = d_path(&f->f_path, dst, size);
> +	if (IS_ERR(p)) {
> +		ret = PTR_ERR(p);
> +		fput(f);
> +		goto error;
> +	}
> +
> +	ret = strlen(p);
> +	memmove(dst, p, ret);
> +	dst[ret++] = '\0';
> +	fput(f);
> +	return ret;
> +
> +error:
> +	memset(dst, '0', size);
> +	return ret;
> +}
> +
[...]
Brendan Gregg Dec. 16, 2019, 10:09 p.m. UTC | #3
On Sat, Dec 14, 2019 at 8:01 PM Wenbo Zhang <ethercflow@gmail.com> wrote:
>
> When people want to identify which file system files are being opened,
> read, and written to, they can use this helper with file descriptor as
> input to achieve this goal. Other pseudo filesystems are also supported.
>
> This requirement is mainly discussed here:
>
>   https://github.com/iovisor/bcc/issues/237
>
> v11->v12: addressed Alexei's feedback
> - only allow tracepoints to make sure it won't dead lock
>
> v10->v11: addressed Al and Alexei's feedback
> - fix missing fput()
>
> v9->v10: addressed Andrii's feedback
> - send this patch together with the patch selftests as one patch series
>
> v8->v9:
> - format helper description
>
> v7->v8: addressed Alexei's feedback
> - use fget_raw instead of fdget_raw, as fdget_raw is only used inside fs/
> - ensure we're in user context which is safe fot the help to run
> - filter unmountable pseudo filesystem, because they don't have real path
> - supplement the description of this helper function
>
> v6->v7:
> - fix missing signed-off-by line
>
> v5->v6: addressed Andrii's feedback
> - avoid unnecessary goto end by having two explicit returns
>
> v4->v5: addressed Andrii and Daniel's feedback
> - rename bpf_fd2path to bpf_get_file_path to be consistent with other
> helper's names
> - when fdget_raw fails, set ret to -EBADF instead of -EINVAL
> - remove fdput from fdget_raw's error path
> - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
> into the buffer or an error code if the path was too long
> - modify the normal path's return value to return copied string length
> including NUL
> - update this helper description's Return bits.
>
> v3->v4: addressed Daniel's feedback
> - fix missing fdput()
> - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
> - move fd2path's test code to another patch
> - add comment to explain why use fdget_raw instead of fdget
>
> v2->v3: addressed Yonghong's feedback
> - remove unnecessary LOCKDOWN_BPF_READ
> - refactor error handling section for enhanced readability
> - provide a test case in tools/testing/selftests/bpf
>
> v1->v2: addressed Daniel's feedback
> - fix backward compatibility
> - add this helper description
> - fix signed-off name
>
> Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
> ---
>  include/uapi/linux/bpf.h       | 29 +++++++++++++-
>  kernel/trace/bpf_trace.c       | 70 ++++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 29 +++++++++++++-
>  3 files changed, 126 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index dbbcf0b02970..71d9705df120 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2821,6 +2821,32 @@ union bpf_attr {
>   *     Return
>   *             On success, the strictly positive length of the string, including
>   *             the trailing NUL character. On error, a negative value.
> + *
> + * int bpf_get_file_path(char *path, u32 size, int fd)
> + *     Description
> + *             Get **file** atrribute from the current task by *fd*, then call
> + *             **d_path** to get it's absolute path and copy it as string into
> + *             *path* of *size*. Notice the **path** don't support unmountable
> + *             pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> + *             The *size* must be strictly positive. On success, the helper
> + *             makes sure that the *path* is NUL-terminated, and the buffer
> + *             could be:
> + *             - a regular full path (include mountable fs eg: /proc, /sys)
> + *             - a regular full path with "(deleted)" at the end.
> + *             On failure, it is filled with zeroes.
> + *     Return
> + *             On success, returns the length of the copied string INCLUDING
> + *             the trailing NUL.
> + *
> + *             On failure, the returned value is one of the following:
> + *
> + *             **-EPERM** if no permission to get the path (eg: in irq ctx).
> + *
> + *             **-EBADF** if *fd* is invalid.
> + *
> + *             **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> + *
> + *             **-ENAMETOOLONG** if full path is longer than *size*
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -2938,7 +2964,8 @@ union bpf_attr {
>         FN(probe_read_user),            \
>         FN(probe_read_kernel),          \
>         FN(probe_read_user_str),        \
> -       FN(probe_read_kernel_str),
> +       FN(probe_read_kernel_str),      \
> +       FN(get_file_path),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index e5ef4ae9edb5..db9c0ec46a5d 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -762,6 +762,72 @@ static const struct bpf_func_proto bpf_send_signal_proto = {
>         .arg1_type      = ARG_ANYTHING,
>  };
>
> +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> +{
> +       struct file *f;
> +       char *p;
> +       int ret = -EBADF;
> +
> +       /* Ensure we're in user context which is safe for the helper to
> +        * run. This helper has no business in a kthread.
> +        */
> +       if (unlikely(in_interrupt() ||
> +                    current->flags & (PF_KTHREAD | PF_EXITING))) {
> +               ret = -EPERM;
> +               goto error;
> +       }
> +
> +       /* Use fget_raw instead of fget to support O_PATH, and it doesn't
> +        * have any sleepable code, so it's ok to be here.
> +        */
> +       f = fget_raw(fd);
> +       if (!f)
> +               goto error;
> +
> +       /* For unmountable pseudo filesystem, it seems to have no meaning
> +        * to get their fake paths as they don't have path, and to be no
> +        * way to validate this function pointer can be always safe to call
> +        * in the current context.
> +        */
> +       if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname) {
> +               ret = -EINVAL;
> +               fput(f);
> +               goto error;
> +       }
> +
> +       /* After filter unmountable pseudo filesytem, d_path won't call
> +        * dentry->d_op->d_name(), the normally path doesn't have any
> +        * sleepable code, and despite it uses the current macro to get
> +        * fs_struct (current->fs), we've already ensured we're in user
> +        * context, so it's ok to be here.
> +        */
> +       p = d_path(&f->f_path, dst, size);
> +       if (IS_ERR(p)) {
> +               ret = PTR_ERR(p);
> +               fput(f);
> +               goto error;
> +       }
> +
> +       ret = strlen(p);
> +       memmove(dst, p, ret);
> +       dst[ret++] = '\0';
> +       fput(f);
> +       return ret;
> +
> +error:
> +       memset(dst, '0', size);
> +       return ret;
> +}
> +
> +static const struct bpf_func_proto bpf_get_file_path_proto = {
> +       .func       = bpf_get_file_path,
> +       .gpl_only   = true,
> +       .ret_type   = RET_INTEGER,
> +       .arg1_type  = ARG_PTR_TO_UNINIT_MEM,
> +       .arg2_type  = ARG_CONST_SIZE,
> +       .arg3_type  = ARG_ANYTHING,
> +};
> +
>  static const struct bpf_func_proto *
>  tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> @@ -953,6 +1019,8 @@ tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>                 return &bpf_get_stackid_proto_tp;
>         case BPF_FUNC_get_stack:
>                 return &bpf_get_stack_proto_tp;
> +       case BPF_FUNC_get_file_path:
> +               return &bpf_get_file_path_proto;
>         default:
>                 return tracing_func_proto(func_id, prog);
>         }
> @@ -1146,6 +1214,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>                 return &bpf_get_stackid_proto_raw_tp;
>         case BPF_FUNC_get_stack:
>                 return &bpf_get_stack_proto_raw_tp;
> +       case BPF_FUNC_get_file_path:
> +               return &bpf_get_file_path_proto;
>         default:
>                 return tracing_func_proto(func_id, prog);
>         }
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index dbbcf0b02970..71d9705df120 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2821,6 +2821,32 @@ union bpf_attr {
>   *     Return
>   *             On success, the strictly positive length of the string, including
>   *             the trailing NUL character. On error, a negative value.
> + *
> + * int bpf_get_file_path(char *path, u32 size, int fd)
> + *     Description
> + *             Get **file** atrribute from the current task by *fd*, then call
> + *             **d_path** to get it's absolute path and copy it as string into
> + *             *path* of *size*. Notice the **path** don't support unmountable
> + *             pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> + *             The *size* must be strictly positive. On success, the helper
> + *             makes sure that the *path* is NUL-terminated, and the buffer
> + *             could be:
> + *             - a regular full path (include mountable fs eg: /proc, /sys)
> + *             - a regular full path with "(deleted)" at the end.
> + *             On failure, it is filled with zeroes.
> + *     Return
> + *             On success, returns the length of the copied string INCLUDING
> + *             the trailing NUL.
> + *
> + *             On failure, the returned value is one of the following:
> + *
> + *             **-EPERM** if no permission to get the path (eg: in irq ctx).
> + *
> + *             **-EBADF** if *fd* is invalid.
> + *
> + *             **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> + *
> + *             **-ENAMETOOLONG** if full path is longer than *size*
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -2938,7 +2964,8 @@ union bpf_attr {
>         FN(probe_read_user),            \
>         FN(probe_read_kernel),          \
>         FN(probe_read_user_str),        \
> -       FN(probe_read_kernel_str),
> +       FN(probe_read_kernel_str),      \
> +       FN(get_file_path),


I just realized that among my tools that want the path, the input is either:

A) syscall tracepoints: int fd
B) kprobes: struct file *

This serves (A). If we ever add a different helper for (B), we might
think that this helper was misnamed. Should it be called get_fd_path
instead? That leaves get_file_path available for a later "struct file
*" -> pathname helper.

Brendan
Wenbo Zhang Dec. 17, 2019, 4:05 a.m. UTC | #4
> I just realized that among my tools that want the path, the input is either:

> A) syscall tracepoints: int fd
> B) kprobes: struct file *

> This serves (A). If we ever add a different helper for (B), we might
> think that this helper was misnamed. Should it be called get_fd_path
> instead? That leaves get_file_path available for a later "struct file
> *" -> pathname helper.

+1, I'll rename it in the next version.

Brendan Gregg <bgregg@netflix.com> 于2019年12月17日周二 上午6:09写道:
>
> On Sat, Dec 14, 2019 at 8:01 PM Wenbo Zhang <ethercflow@gmail.com> wrote:
> >
> > When people want to identify which file system files are being opened,
> > read, and written to, they can use this helper with file descriptor as
> > input to achieve this goal. Other pseudo filesystems are also supported.
> >
> > This requirement is mainly discussed here:
> >
> >   https://github.com/iovisor/bcc/issues/237
> >
> > v11->v12: addressed Alexei's feedback
> > - only allow tracepoints to make sure it won't dead lock
> >
> > v10->v11: addressed Al and Alexei's feedback
> > - fix missing fput()
> >
> > v9->v10: addressed Andrii's feedback
> > - send this patch together with the patch selftests as one patch series
> >
> > v8->v9:
> > - format helper description
> >
> > v7->v8: addressed Alexei's feedback
> > - use fget_raw instead of fdget_raw, as fdget_raw is only used inside fs/
> > - ensure we're in user context which is safe fot the help to run
> > - filter unmountable pseudo filesystem, because they don't have real path
> > - supplement the description of this helper function
> >
> > v6->v7:
> > - fix missing signed-off-by line
> >
> > v5->v6: addressed Andrii's feedback
> > - avoid unnecessary goto end by having two explicit returns
> >
> > v4->v5: addressed Andrii and Daniel's feedback
> > - rename bpf_fd2path to bpf_get_file_path to be consistent with other
> > helper's names
> > - when fdget_raw fails, set ret to -EBADF instead of -EINVAL
> > - remove fdput from fdget_raw's error path
> > - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
> > into the buffer or an error code if the path was too long
> > - modify the normal path's return value to return copied string length
> > including NUL
> > - update this helper description's Return bits.
> >
> > v3->v4: addressed Daniel's feedback
> > - fix missing fdput()
> > - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
> > - move fd2path's test code to another patch
> > - add comment to explain why use fdget_raw instead of fdget
> >
> > v2->v3: addressed Yonghong's feedback
> > - remove unnecessary LOCKDOWN_BPF_READ
> > - refactor error handling section for enhanced readability
> > - provide a test case in tools/testing/selftests/bpf
> >
> > v1->v2: addressed Daniel's feedback
> > - fix backward compatibility
> > - add this helper description
> > - fix signed-off name
> >
> > Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
> > ---
> >  include/uapi/linux/bpf.h       | 29 +++++++++++++-
> >  kernel/trace/bpf_trace.c       | 70 ++++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h | 29 +++++++++++++-
> >  3 files changed, 126 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index dbbcf0b02970..71d9705df120 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2821,6 +2821,32 @@ union bpf_attr {
> >   *     Return
> >   *             On success, the strictly positive length of the string, including
> >   *             the trailing NUL character. On error, a negative value.
> > + *
> > + * int bpf_get_file_path(char *path, u32 size, int fd)
> > + *     Description
> > + *             Get **file** atrribute from the current task by *fd*, then call
> > + *             **d_path** to get it's absolute path and copy it as string into
> > + *             *path* of *size*. Notice the **path** don't support unmountable
> > + *             pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> > + *             The *size* must be strictly positive. On success, the helper
> > + *             makes sure that the *path* is NUL-terminated, and the buffer
> > + *             could be:
> > + *             - a regular full path (include mountable fs eg: /proc, /sys)
> > + *             - a regular full path with "(deleted)" at the end.
> > + *             On failure, it is filled with zeroes.
> > + *     Return
> > + *             On success, returns the length of the copied string INCLUDING
> > + *             the trailing NUL.
> > + *
> > + *             On failure, the returned value is one of the following:
> > + *
> > + *             **-EPERM** if no permission to get the path (eg: in irq ctx).
> > + *
> > + *             **-EBADF** if *fd* is invalid.
> > + *
> > + *             **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> > + *
> > + *             **-ENAMETOOLONG** if full path is longer than *size*
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)          \
> >         FN(unspec),                     \
> > @@ -2938,7 +2964,8 @@ union bpf_attr {
> >         FN(probe_read_user),            \
> >         FN(probe_read_kernel),          \
> >         FN(probe_read_user_str),        \
> > -       FN(probe_read_kernel_str),
> > +       FN(probe_read_kernel_str),      \
> > +       FN(get_file_path),
> >
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >   * function eBPF program intends to call
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index e5ef4ae9edb5..db9c0ec46a5d 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -762,6 +762,72 @@ static const struct bpf_func_proto bpf_send_signal_proto = {
> >         .arg1_type      = ARG_ANYTHING,
> >  };
> >
> > +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> > +{
> > +       struct file *f;
> > +       char *p;
> > +       int ret = -EBADF;
> > +
> > +       /* Ensure we're in user context which is safe for the helper to
> > +        * run. This helper has no business in a kthread.
> > +        */
> > +       if (unlikely(in_interrupt() ||
> > +                    current->flags & (PF_KTHREAD | PF_EXITING))) {
> > +               ret = -EPERM;
> > +               goto error;
> > +       }
> > +
> > +       /* Use fget_raw instead of fget to support O_PATH, and it doesn't
> > +        * have any sleepable code, so it's ok to be here.
> > +        */
> > +       f = fget_raw(fd);
> > +       if (!f)
> > +               goto error;
> > +
> > +       /* For unmountable pseudo filesystem, it seems to have no meaning
> > +        * to get their fake paths as they don't have path, and to be no
> > +        * way to validate this function pointer can be always safe to call
> > +        * in the current context.
> > +        */
> > +       if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname) {
> > +               ret = -EINVAL;
> > +               fput(f);
> > +               goto error;
> > +       }
> > +
> > +       /* After filter unmountable pseudo filesytem, d_path won't call
> > +        * dentry->d_op->d_name(), the normally path doesn't have any
> > +        * sleepable code, and despite it uses the current macro to get
> > +        * fs_struct (current->fs), we've already ensured we're in user
> > +        * context, so it's ok to be here.
> > +        */
> > +       p = d_path(&f->f_path, dst, size);
> > +       if (IS_ERR(p)) {
> > +               ret = PTR_ERR(p);
> > +               fput(f);
> > +               goto error;
> > +       }
> > +
> > +       ret = strlen(p);
> > +       memmove(dst, p, ret);
> > +       dst[ret++] = '\0';
> > +       fput(f);
> > +       return ret;
> > +
> > +error:
> > +       memset(dst, '0', size);
> > +       return ret;
> > +}
> > +
> > +static const struct bpf_func_proto bpf_get_file_path_proto = {
> > +       .func       = bpf_get_file_path,
> > +       .gpl_only   = true,
> > +       .ret_type   = RET_INTEGER,
> > +       .arg1_type  = ARG_PTR_TO_UNINIT_MEM,
> > +       .arg2_type  = ARG_CONST_SIZE,
> > +       .arg3_type  = ARG_ANYTHING,
> > +};
> > +
> >  static const struct bpf_func_proto *
> >  tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >  {
> > @@ -953,6 +1019,8 @@ tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >                 return &bpf_get_stackid_proto_tp;
> >         case BPF_FUNC_get_stack:
> >                 return &bpf_get_stack_proto_tp;
> > +       case BPF_FUNC_get_file_path:
> > +               return &bpf_get_file_path_proto;
> >         default:
> >                 return tracing_func_proto(func_id, prog);
> >         }
> > @@ -1146,6 +1214,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >                 return &bpf_get_stackid_proto_raw_tp;
> >         case BPF_FUNC_get_stack:
> >                 return &bpf_get_stack_proto_raw_tp;
> > +       case BPF_FUNC_get_file_path:
> > +               return &bpf_get_file_path_proto;
> >         default:
> >                 return tracing_func_proto(func_id, prog);
> >         }
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index dbbcf0b02970..71d9705df120 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -2821,6 +2821,32 @@ union bpf_attr {
> >   *     Return
> >   *             On success, the strictly positive length of the string, including
> >   *             the trailing NUL character. On error, a negative value.
> > + *
> > + * int bpf_get_file_path(char *path, u32 size, int fd)
> > + *     Description
> > + *             Get **file** atrribute from the current task by *fd*, then call
> > + *             **d_path** to get it's absolute path and copy it as string into
> > + *             *path* of *size*. Notice the **path** don't support unmountable
> > + *             pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> > + *             The *size* must be strictly positive. On success, the helper
> > + *             makes sure that the *path* is NUL-terminated, and the buffer
> > + *             could be:
> > + *             - a regular full path (include mountable fs eg: /proc, /sys)
> > + *             - a regular full path with "(deleted)" at the end.
> > + *             On failure, it is filled with zeroes.
> > + *     Return
> > + *             On success, returns the length of the copied string INCLUDING
> > + *             the trailing NUL.
> > + *
> > + *             On failure, the returned value is one of the following:
> > + *
> > + *             **-EPERM** if no permission to get the path (eg: in irq ctx).
> > + *
> > + *             **-EBADF** if *fd* is invalid.
> > + *
> > + *             **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> > + *
> > + *             **-ENAMETOOLONG** if full path is longer than *size*
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)          \
> >         FN(unspec),                     \
> > @@ -2938,7 +2964,8 @@ union bpf_attr {
> >         FN(probe_read_user),            \
> >         FN(probe_read_kernel),          \
> >         FN(probe_read_user_str),        \
> > -       FN(probe_read_kernel_str),
> > +       FN(probe_read_kernel_str),      \
> > +       FN(get_file_path),
>
>
> I just realized that among my tools that want the path, the input is either:
>
> A) syscall tracepoints: int fd
> B) kprobes: struct file *
>
> This serves (A). If we ever add a different helper for (B), we might
> think that this helper was misnamed. Should it be called get_fd_path
> instead? That leaves get_file_path available for a later "struct file
> *" -> pathname helper.
>
> Brendan
>
> --
> Brendan Gregg, Senior Performance Architect, Netflix
Wenbo Zhang Dec. 17, 2019, 6:26 a.m. UTC | #5
> > + *           - a regular full path (include mountable fs eg: /proc, /sys)
> > + *           - a regular full path with "(deleted)" at the end.

> Let us say with " (deleted)" is appended to be consistent with comments
> in d_path() and is more clear to user what the format will looks like.

Thank you, I'll fix this.

> > +     ret = strlen(p);
> > +     memmove(dst, p, ret);
> > +     dst[ret++] = '\0';

> nit: you could do memmove(dst, p, ret + 1)?

I did with `dst[ret++]='\0';`  to return value length including
trailing '\0'. as you mentioned below:

> > +     fput(f);
> > +     return ret;

> The description says the return value length including trailing '\0'.
> The above 'ret' does not include trailing '\0'.

It seems `[ret++]` not very clear to read and '\0' can be done by
`memmove`. I think I'll refactor to

```
ret = strlen(p) + 1;
memmove(dst, p, ret);
fput(f);
return ret;
```

Is this better?

Yonghong Song <yhs@fb.com> 于2019年12月16日周一 上午12:06写道:
>
>
>
> On 12/14/19 8:01 PM, Wenbo Zhang wrote:
> > When people want to identify which file system files are being opened,
> > read, and written to, they can use this helper with file descriptor as
> > input to achieve this goal. Other pseudo filesystems are also supported.
> >
> > This requirement is mainly discussed here:
> >
> >    https://github.com/iovisor/bcc/issues/237
> >
> > v11->v12: addressed Alexei's feedback
> > - only allow tracepoints to make sure it won't dead lock
> >
> > v10->v11: addressed Al and Alexei's feedback
> > - fix missing fput()
> >
> > v9->v10: addressed Andrii's feedback
> > - send this patch together with the patch selftests as one patch series
> >
> > v8->v9:
> > - format helper description
> >
> > v7->v8: addressed Alexei's feedback
> > - use fget_raw instead of fdget_raw, as fdget_raw is only used inside fs/
> > - ensure we're in user context which is safe fot the help to run
> > - filter unmountable pseudo filesystem, because they don't have real path
> > - supplement the description of this helper function
> >
> > v6->v7:
> > - fix missing signed-off-by line
> >
> > v5->v6: addressed Andrii's feedback
> > - avoid unnecessary goto end by having two explicit returns
> >
> > v4->v5: addressed Andrii and Daniel's feedback
> > - rename bpf_fd2path to bpf_get_file_path to be consistent with other
> > helper's names
> > - when fdget_raw fails, set ret to -EBADF instead of -EINVAL
> > - remove fdput from fdget_raw's error path
> > - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
> > into the buffer or an error code if the path was too long
> > - modify the normal path's return value to return copied string length
> > including NUL
> > - update this helper description's Return bits.
> >
> > v3->v4: addressed Daniel's feedback
> > - fix missing fdput()
> > - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
> > - move fd2path's test code to another patch
> > - add comment to explain why use fdget_raw instead of fdget
> >
> > v2->v3: addressed Yonghong's feedback
> > - remove unnecessary LOCKDOWN_BPF_READ
> > - refactor error handling section for enhanced readability
> > - provide a test case in tools/testing/selftests/bpf
> >
> > v1->v2: addressed Daniel's feedback
> > - fix backward compatibility
> > - add this helper description
> > - fix signed-off name
> >
> > Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
> > ---
> >   include/uapi/linux/bpf.h       | 29 +++++++++++++-
> >   kernel/trace/bpf_trace.c       | 70 ++++++++++++++++++++++++++++++++++
> >   tools/include/uapi/linux/bpf.h | 29 +++++++++++++-
> >   3 files changed, 126 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index dbbcf0b02970..71d9705df120 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2821,6 +2821,32 @@ union bpf_attr {
> >    *  Return
> >    *          On success, the strictly positive length of the string, including
> >    *          the trailing NUL character. On error, a negative value.
> > + *
> > + * int bpf_get_file_path(char *path, u32 size, int fd)
> > + *   Description
> > + *           Get **file** atrribute from the current task by *fd*, then call
> > + *           **d_path** to get it's absolute path and copy it as string into
> > + *           *path* of *size*. Notice the **path** don't support unmountable
> > + *           pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> > + *           The *size* must be strictly positive. On success, the helper
> > + *           makes sure that the *path* is NUL-terminated, and the buffer
> > + *           could be:
> > + *           - a regular full path (include mountable fs eg: /proc, /sys)
> > + *           - a regular full path with "(deleted)" at the end.
>
> Let us say with " (deleted)" is appended to be consistent with comments
> in d_path() and is more clear to user what the format will looks like.
>
> > + *           On failure, it is filled with zeroes.
> > + *   Return
> > + *           On success, returns the length of the copied string INCLUDING
> > + *           the trailing NUL.
>
> trailing '\0'.
>
> > + *
> > + *           On failure, the returned value is one of the following:
> > + *
> > + *           **-EPERM** if no permission to get the path (eg: in irq ctx).
> > + *
> > + *           **-EBADF** if *fd* is invalid.
> > + *
> > + *           **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> > + *
> > + *           **-ENAMETOOLONG** if full path is longer than *size*
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)               \
> >       FN(unspec),                     \
> > @@ -2938,7 +2964,8 @@ union bpf_attr {
> >       FN(probe_read_user),            \
> >       FN(probe_read_kernel),          \
> >       FN(probe_read_user_str),        \
> > -     FN(probe_read_kernel_str),
> > +     FN(probe_read_kernel_str),      \
> > +     FN(get_file_path),
> >
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >    * function eBPF program intends to call
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index e5ef4ae9edb5..db9c0ec46a5d 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -762,6 +762,72 @@ static const struct bpf_func_proto bpf_send_signal_proto = {
> >       .arg1_type      = ARG_ANYTHING,
> >   };
> >
> > +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> > +{
> > +     struct file *f;
> > +     char *p;
> > +     int ret = -EBADF;
> > +
> > +     /* Ensure we're in user context which is safe for the helper to
> > +      * run. This helper has no business in a kthread.
> > +      */
> > +     if (unlikely(in_interrupt() ||
> > +                  current->flags & (PF_KTHREAD | PF_EXITING))) {
> > +             ret = -EPERM;
> > +             goto error;
> > +     }
> > +
> > +     /* Use fget_raw instead of fget to support O_PATH, and it doesn't
> > +      * have any sleepable code, so it's ok to be here.
> > +      */
> > +     f = fget_raw(fd);
> > +     if (!f)
> > +             goto error;
> > +
> > +     /* For unmountable pseudo filesystem, it seems to have no meaning
> > +      * to get their fake paths as they don't have path, and to be no
> > +      * way to validate this function pointer can be always safe to call
> > +      * in the current context.
> > +      */
> > +     if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname) {
> > +             ret = -EINVAL;
> > +             fput(f);
> > +             goto error;
> > +     }
> > +
> > +     /* After filter unmountable pseudo filesytem, d_path won't call
> > +      * dentry->d_op->d_name(), the normally path doesn't have any
> > +      * sleepable code, and despite it uses the current macro to get
> > +      * fs_struct (current->fs), we've already ensured we're in user
> > +      * context, so it's ok to be here.
> > +      */
> > +     p = d_path(&f->f_path, dst, size);
> > +     if (IS_ERR(p)) {
> > +             ret = PTR_ERR(p);
> > +             fput(f);
> > +             goto error;
> > +     }
> > +
> > +     ret = strlen(p);
> > +     memmove(dst, p, ret);
> > +     dst[ret++] = '\0';
>
> nit: you could do memmove(dst, p, ret + 1)?
>
> > +     fput(f);
> > +     return ret;
>
> The description says the return value length including trailing '\0'.
> The above 'ret' does not include trailing '\0'.
>
> > +
> > +error:
> > +     memset(dst, '0', size);
> > +     return ret;
> > +}
> > +
> > +static const struct bpf_func_proto bpf_get_file_path_proto = {
> > +     .func       = bpf_get_file_path,
> > +     .gpl_only   = true,
> > +     .ret_type   = RET_INTEGER,
> > +     .arg1_type  = ARG_PTR_TO_UNINIT_MEM,
> > +     .arg2_type  = ARG_CONST_SIZE,
> > +     .arg3_type  = ARG_ANYTHING,
> > +};
> > +
> >   static const struct bpf_func_proto *
> >   tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >   {
> > @@ -953,6 +1019,8 @@ tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >               return &bpf_get_stackid_proto_tp;
> >       case BPF_FUNC_get_stack:
> >               return &bpf_get_stack_proto_tp;
> > +     case BPF_FUNC_get_file_path:
> > +             return &bpf_get_file_path_proto;
> >       default:
> >               return tracing_func_proto(func_id, prog);
> >       }
> > @@ -1146,6 +1214,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >               return &bpf_get_stackid_proto_raw_tp;
> >       case BPF_FUNC_get_stack:
> >               return &bpf_get_stack_proto_raw_tp;
> > +     case BPF_FUNC_get_file_path:
> > +             return &bpf_get_file_path_proto;
> >       default:
> >               return tracing_func_proto(func_id, prog);
> >       }
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index dbbcf0b02970..71d9705df120 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -2821,6 +2821,32 @@ union bpf_attr {
> >    *  Return
> >    *          On success, the strictly positive length of the string, including
> >    *          the trailing NUL character. On error, a negative value.
> > + *
> > + * int bpf_get_file_path(char *path, u32 size, int fd)
> > + *   Description
> > + *           Get **file** atrribute from the current task by *fd*, then call
> > + *           **d_path** to get it's absolute path and copy it as string into
> > + *           *path* of *size*. Notice the **path** don't support unmountable
> > + *           pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> > + *           The *size* must be strictly positive. On success, the helper
> > + *           makes sure that the *path* is NUL-terminated, and the buffer
> > + *           could be:
> > + *           - a regular full path (include mountable fs eg: /proc, /sys)
> > + *           - a regular full path with "(deleted)" at the end.
>
> ditto
>
> > + *           On failure, it is filled with zeroes.
> > + *   Return
> > + *           On success, returns the length of the copied string INCLUDING
> > + *           the trailing NUL.
>
> ditto
>
> > + *
> > + *           On failure, the returned value is one of the following:
> > + *
> > + *           **-EPERM** if no permission to get the path (eg: in irq ctx).
> > + *
> > + *           **-EBADF** if *fd* is invalid.
> > + *
> > + *           **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> > + *
> > + *           **-ENAMETOOLONG** if full path is longer than *size*
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)               \
> >       FN(unspec),                     \
> > @@ -2938,7 +2964,8 @@ union bpf_attr {
> >       FN(probe_read_user),            \
> >       FN(probe_read_kernel),          \
> >       FN(probe_read_user_str),        \
> > -     FN(probe_read_kernel_str),
> > +     FN(probe_read_kernel_str),      \
> > +     FN(get_file_path),
> >
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >    * function eBPF program intends to call
> >
Wenbo Zhang Dec. 17, 2019, 6:27 a.m. UTC | #6
> > +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> > +{
> > +     struct file *f;
> > +     char *p;
> > +     int ret = -EBADF;

> please try to use reverse Christmas tree for declarations.

Thank you, I'll fix this.

Yonghong Song <yhs@fb.com> 于2019年12月16日周一 上午12:10写道:
>
>
>
> On 12/14/19 8:01 PM, Wenbo Zhang wrote:
> > When people want to identify which file system files are being opened,
> > read, and written to, they can use this helper with file descriptor as
> > input to achieve this goal. Other pseudo filesystems are also supported.
> >
> > This requirement is mainly discussed here:
> >
> >    https://github.com/iovisor/bcc/issues/237
> >
> > v11->v12: addressed Alexei's feedback
> > - only allow tracepoints to make sure it won't dead lock
> >
> > v10->v11: addressed Al and Alexei's feedback
> > - fix missing fput()
> >
> > v9->v10: addressed Andrii's feedback
> > - send this patch together with the patch selftests as one patch series
> >
> > v8->v9:
> > - format helper description
> >
> > v7->v8: addressed Alexei's feedback
> > - use fget_raw instead of fdget_raw, as fdget_raw is only used inside fs/
> > - ensure we're in user context which is safe fot the help to run
> > - filter unmountable pseudo filesystem, because they don't have real path
> > - supplement the description of this helper function
> >
> > v6->v7:
> > - fix missing signed-off-by line
> >
> > v5->v6: addressed Andrii's feedback
> > - avoid unnecessary goto end by having two explicit returns
> >
> > v4->v5: addressed Andrii and Daniel's feedback
> > - rename bpf_fd2path to bpf_get_file_path to be consistent with other
> > helper's names
> > - when fdget_raw fails, set ret to -EBADF instead of -EINVAL
> > - remove fdput from fdget_raw's error path
> > - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer
> > into the buffer or an error code if the path was too long
> > - modify the normal path's return value to return copied string length
> > including NUL
> > - update this helper description's Return bits.
> >
> > v3->v4: addressed Daniel's feedback
> > - fix missing fdput()
> > - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c
> > - move fd2path's test code to another patch
> > - add comment to explain why use fdget_raw instead of fdget
> >
> > v2->v3: addressed Yonghong's feedback
> > - remove unnecessary LOCKDOWN_BPF_READ
> > - refactor error handling section for enhanced readability
> > - provide a test case in tools/testing/selftests/bpf
> >
> > v1->v2: addressed Daniel's feedback
> > - fix backward compatibility
> > - add this helper description
> > - fix signed-off name
> >
> > Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
> > ---
> >   include/uapi/linux/bpf.h       | 29 +++++++++++++-
> >   kernel/trace/bpf_trace.c       | 70 ++++++++++++++++++++++++++++++++++
> >   tools/include/uapi/linux/bpf.h | 29 +++++++++++++-
> >   3 files changed, 126 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index dbbcf0b02970..71d9705df120 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2821,6 +2821,32 @@ union bpf_attr {
> >    *  Return
> >    *          On success, the strictly positive length of the string, including
> >    *          the trailing NUL character. On error, a negative value.
> > + *
> > + * int bpf_get_file_path(char *path, u32 size, int fd)
> > + *   Description
> > + *           Get **file** atrribute from the current task by *fd*, then call
> > + *           **d_path** to get it's absolute path and copy it as string into
> > + *           *path* of *size*. Notice the **path** don't support unmountable
> > + *           pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
> > + *           The *size* must be strictly positive. On success, the helper
> > + *           makes sure that the *path* is NUL-terminated, and the buffer
> > + *           could be:
> > + *           - a regular full path (include mountable fs eg: /proc, /sys)
> > + *           - a regular full path with "(deleted)" at the end.
> > + *           On failure, it is filled with zeroes.
> > + *   Return
> > + *           On success, returns the length of the copied string INCLUDING
> > + *           the trailing NUL.
> > + *
> > + *           On failure, the returned value is one of the following:
> > + *
> > + *           **-EPERM** if no permission to get the path (eg: in irq ctx).
> > + *
> > + *           **-EBADF** if *fd* is invalid.
> > + *
> > + *           **-EINVAL** if *fd* corresponds to a unmountable pseudo fs
> > + *
> > + *           **-ENAMETOOLONG** if full path is longer than *size*
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)               \
> >       FN(unspec),                     \
> > @@ -2938,7 +2964,8 @@ union bpf_attr {
> >       FN(probe_read_user),            \
> >       FN(probe_read_kernel),          \
> >       FN(probe_read_user_str),        \
> > -     FN(probe_read_kernel_str),
> > +     FN(probe_read_kernel_str),      \
> > +     FN(get_file_path),
> >
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >    * function eBPF program intends to call
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index e5ef4ae9edb5..db9c0ec46a5d 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -762,6 +762,72 @@ static const struct bpf_func_proto bpf_send_signal_proto = {
> >       .arg1_type      = ARG_ANYTHING,
> >   };
> >
> > +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
> > +{
> > +     struct file *f;
> > +     char *p;
> > +     int ret = -EBADF;
>
> please try to use reverse Christmas tree for declarations.
>
> > +
> > +     /* Ensure we're in user context which is safe for the helper to
> > +      * run. This helper has no business in a kthread.
> > +      */
> > +     if (unlikely(in_interrupt() ||
> > +                  current->flags & (PF_KTHREAD | PF_EXITING))) {
> > +             ret = -EPERM;
> > +             goto error;
> > +     }
> > +
> > +     /* Use fget_raw instead of fget to support O_PATH, and it doesn't
> > +      * have any sleepable code, so it's ok to be here.
> > +      */
> > +     f = fget_raw(fd);
> > +     if (!f)
> > +             goto error;
> > +
> > +     /* For unmountable pseudo filesystem, it seems to have no meaning
> > +      * to get their fake paths as they don't have path, and to be no
> > +      * way to validate this function pointer can be always safe to call
> > +      * in the current context.
> > +      */
> > +     if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname) {
> > +             ret = -EINVAL;
> > +             fput(f);
> > +             goto error;
> > +     }
> > +
> > +     /* After filter unmountable pseudo filesytem, d_path won't call
> > +      * dentry->d_op->d_name(), the normally path doesn't have any
> > +      * sleepable code, and despite it uses the current macro to get
> > +      * fs_struct (current->fs), we've already ensured we're in user
> > +      * context, so it's ok to be here.
> > +      */
> > +     p = d_path(&f->f_path, dst, size);
> > +     if (IS_ERR(p)) {
> > +             ret = PTR_ERR(p);
> > +             fput(f);
> > +             goto error;
> > +     }
> > +
> > +     ret = strlen(p);
> > +     memmove(dst, p, ret);
> > +     dst[ret++] = '\0';
> > +     fput(f);
> > +     return ret;
> > +
> > +error:
> > +     memset(dst, '0', size);
> > +     return ret;
> > +}
> > +
> [...]
Yonghong Song Dec. 17, 2019, 6:33 a.m. UTC | #7
On 12/16/19 10:26 PM, Wenbo Zhang wrote:
>>> + *           - a regular full path (include mountable fs eg: /proc, /sys)
>>> + *           - a regular full path with "(deleted)" at the end.
> 
>> Let us say with " (deleted)" is appended to be consistent with comments
>> in d_path() and is more clear to user what the format will looks like.
> 
> Thank you, I'll fix this.
> 
>>> +     ret = strlen(p);
>>> +     memmove(dst, p, ret);
>>> +     dst[ret++] = '\0';
> 
>> nit: you could do memmove(dst, p, ret + 1)?
> 
> I did with `dst[ret++]='\0';`  to return value length including
> trailing '\0'. as you mentioned below:
> 
>>> +     fput(f);
>>> +     return ret;
> 
>> The description says the return value length including trailing '\0'.
>> The above 'ret' does not include trailing '\0'.
> 
> It seems `[ret++]` not very clear to read and '\0' can be done by
> `memmove`. I think I'll refactor to
> 
> ```
> ret = strlen(p) + 1;
> memmove(dst, p, ret);
> fput(f);
> return ret;
> ```
> 
> Is this better?

Ah, I missed ret++ in dst[ret++]. Indeed the above code is better.
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dbbcf0b02970..71d9705df120 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2821,6 +2821,32 @@  union bpf_attr {
  * 	Return
  * 		On success, the strictly positive length of the string,	including
  * 		the trailing NUL character. On error, a negative value.
+ *
+ * int bpf_get_file_path(char *path, u32 size, int fd)
+ *	Description
+ *		Get **file** atrribute from the current task by *fd*, then call
+ *		**d_path** to get it's absolute path and copy it as string into
+ *		*path* of *size*. Notice the **path** don't support unmountable
+ *		pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
+ *		The *size* must be strictly positive. On success, the helper
+ *		makes sure that the *path* is NUL-terminated, and the buffer
+ *		could be:
+ *		- a regular full path (include mountable fs eg: /proc, /sys)
+ *		- a regular full path with "(deleted)" at the end.
+ *		On failure, it is filled with zeroes.
+ *	Return
+ *		On success, returns the length of the copied string INCLUDING
+ *		the trailing NUL.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EPERM** if no permission to get the path (eg: in irq ctx).
+ *
+ *		**-EBADF** if *fd* is invalid.
+ *
+ *		**-EINVAL** if *fd* corresponds to a unmountable pseudo fs
+ *
+ *		**-ENAMETOOLONG** if full path is longer than *size*
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2938,7 +2964,8 @@  union bpf_attr {
 	FN(probe_read_user),		\
 	FN(probe_read_kernel),		\
 	FN(probe_read_user_str),	\
-	FN(probe_read_kernel_str),
+	FN(probe_read_kernel_str),	\
+	FN(get_file_path),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e5ef4ae9edb5..db9c0ec46a5d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -762,6 +762,72 @@  static const struct bpf_func_proto bpf_send_signal_proto = {
 	.arg1_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd)
+{
+	struct file *f;
+	char *p;
+	int ret = -EBADF;
+
+	/* Ensure we're in user context which is safe for the helper to
+	 * run. This helper has no business in a kthread.
+	 */
+	if (unlikely(in_interrupt() ||
+		     current->flags & (PF_KTHREAD | PF_EXITING))) {
+		ret = -EPERM;
+		goto error;
+	}
+
+	/* Use fget_raw instead of fget to support O_PATH, and it doesn't
+	 * have any sleepable code, so it's ok to be here.
+	 */
+	f = fget_raw(fd);
+	if (!f)
+		goto error;
+
+	/* For unmountable pseudo filesystem, it seems to have no meaning
+	 * to get their fake paths as they don't have path, and to be no
+	 * way to validate this function pointer can be always safe to call
+	 * in the current context.
+	 */
+	if (f->f_path.dentry->d_op && f->f_path.dentry->d_op->d_dname) {
+		ret = -EINVAL;
+		fput(f);
+		goto error;
+	}
+
+	/* After filter unmountable pseudo filesytem, d_path won't call
+	 * dentry->d_op->d_name(), the normally path doesn't have any
+	 * sleepable code, and despite it uses the current macro to get
+	 * fs_struct (current->fs), we've already ensured we're in user
+	 * context, so it's ok to be here.
+	 */
+	p = d_path(&f->f_path, dst, size);
+	if (IS_ERR(p)) {
+		ret = PTR_ERR(p);
+		fput(f);
+		goto error;
+	}
+
+	ret = strlen(p);
+	memmove(dst, p, ret);
+	dst[ret++] = '\0';
+	fput(f);
+	return ret;
+
+error:
+	memset(dst, '0', size);
+	return ret;
+}
+
+static const struct bpf_func_proto bpf_get_file_path_proto = {
+	.func       = bpf_get_file_path,
+	.gpl_only   = true,
+	.ret_type   = RET_INTEGER,
+	.arg1_type  = ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type  = ARG_CONST_SIZE,
+	.arg3_type  = ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -953,6 +1019,8 @@  tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_stackid_proto_tp;
 	case BPF_FUNC_get_stack:
 		return &bpf_get_stack_proto_tp;
+	case BPF_FUNC_get_file_path:
+		return &bpf_get_file_path_proto;
 	default:
 		return tracing_func_proto(func_id, prog);
 	}
@@ -1146,6 +1214,8 @@  raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_stackid_proto_raw_tp;
 	case BPF_FUNC_get_stack:
 		return &bpf_get_stack_proto_raw_tp;
+	case BPF_FUNC_get_file_path:
+		return &bpf_get_file_path_proto;
 	default:
 		return tracing_func_proto(func_id, prog);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index dbbcf0b02970..71d9705df120 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2821,6 +2821,32 @@  union bpf_attr {
  * 	Return
  * 		On success, the strictly positive length of the string,	including
  * 		the trailing NUL character. On error, a negative value.
+ *
+ * int bpf_get_file_path(char *path, u32 size, int fd)
+ *	Description
+ *		Get **file** atrribute from the current task by *fd*, then call
+ *		**d_path** to get it's absolute path and copy it as string into
+ *		*path* of *size*. Notice the **path** don't support unmountable
+ *		pseudo filesystems as they don't have path (eg: SOCKFS, PIPEFS).
+ *		The *size* must be strictly positive. On success, the helper
+ *		makes sure that the *path* is NUL-terminated, and the buffer
+ *		could be:
+ *		- a regular full path (include mountable fs eg: /proc, /sys)
+ *		- a regular full path with "(deleted)" at the end.
+ *		On failure, it is filled with zeroes.
+ *	Return
+ *		On success, returns the length of the copied string INCLUDING
+ *		the trailing NUL.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EPERM** if no permission to get the path (eg: in irq ctx).
+ *
+ *		**-EBADF** if *fd* is invalid.
+ *
+ *		**-EINVAL** if *fd* corresponds to a unmountable pseudo fs
+ *
+ *		**-ENAMETOOLONG** if full path is longer than *size*
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2938,7 +2964,8 @@  union bpf_attr {
 	FN(probe_read_user),		\
 	FN(probe_read_kernel),		\
 	FN(probe_read_user_str),	\
-	FN(probe_read_kernel_str),
+	FN(probe_read_kernel_str),	\
+	FN(get_file_path),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call