From patchwork Sat May 9 16:52:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Yakunin X-Patchwork-Id: 1286745 X-Patchwork-Delegate: dsahern@gmail.com Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=yandex-team.ru Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=yandex-team.ru header.i=@yandex-team.ru header.a=rsa-sha256 header.s=default header.b=MJQZH00W; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49KCt66YTYz9sNH for ; Sun, 10 May 2020 02:52:30 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728104AbgEIQw2 (ORCPT ); Sat, 9 May 2020 12:52:28 -0400 Received: from forwardcorp1j.mail.yandex.net ([5.45.199.163]:38342 "EHLO forwardcorp1j.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726214AbgEIQw1 (ORCPT ); Sat, 9 May 2020 12:52:27 -0400 Received: from mxbackcorp1j.mail.yandex.net (mxbackcorp1j.mail.yandex.net [IPv6:2a02:6b8:0:1619::162]) by forwardcorp1j.mail.yandex.net (Yandex) with ESMTP id C1ED32E0DF2; Sat, 9 May 2020 19:52:22 +0300 (MSK) Received: from vla1-81430ab5870b.qloud-c.yandex.net (vla1-81430ab5870b.qloud-c.yandex.net [2a02:6b8:c0d:35a1:0:640:8143:ab5]) by mxbackcorp1j.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id XdwogZeIWl-qMWmaNAC; Sat, 09 May 2020 19:52:22 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1589043142; bh=8jFLpqTS7drSlAmHwwOWLmMftyzh4hpif5bq1HUzQG0=; h=Message-Id:Date:Subject:To:From:Cc; b=MJQZH00WHvpQNDpGyq2iS/yTFR78QR4SL6sI/WAzh+trZ8ezTEzUBD0iLrqSTzGs5 9RrZgCUCYOJ37rHLAKa+1DPpJk7KgWzVjYEK6vGU+fL+GbQwLFJcvUx+cI9mV3YRqy oJWFV+3zXlPl8VV5V0LHnlMK6GR8a31bitFAH3d0= Authentication-Results: mxbackcorp1j.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from 178.154.191.33-vpn.dhcp.yndx.net (178.154.191.33-vpn.dhcp.yndx.net [178.154.191.33]) by vla1-81430ab5870b.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id lkcxSGO050-qLXaPCrk; Sat, 09 May 2020 19:52:21 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) From: Dmitry Yakunin To: netdev@vger.kernel.org, dsahern@gmail.com Cc: cgroups@vger.kernel.org Subject: [PATCH iproute2-next v2 1/3] ss: introduce cgroup2 cache and helper functions Date: Sat, 9 May 2020 19:52:00 +0300 Message-Id: <20200509165202.17959-1-zeil@yandex-team.ru> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch prepares infrastructure for matching sockets by cgroups. Two helper functions are added for transformation between cgroup v2 ID and pathname. Cgroup v2 cache is implemented as hash table indexed by ID. This cache is needed for faster lookups of socket cgroup. v2: - style fixes (David Ahern) Signed-off-by: Dmitry Yakunin --- include/cg_map.h | 6 +++ include/utils.h | 4 +- ip/ipvrf.c | 4 +- lib/Makefile | 2 +- lib/cg_map.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ lib/fs.c | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 6 files changed, 282 insertions(+), 6 deletions(-) create mode 100644 include/cg_map.h create mode 100644 lib/cg_map.c diff --git a/include/cg_map.h b/include/cg_map.h new file mode 100644 index 0000000..d30517f --- /dev/null +++ b/include/cg_map.h @@ -0,0 +1,6 @@ +#ifndef __CG_MAP_H__ +#define __CG_MAP_H__ + +const char *cg_id_to_path(__u64 id); + +#endif /* __CG_MAP_H__ */ diff --git a/include/utils.h b/include/utils.h index 001491a..7041c46 100644 --- a/include/utils.h +++ b/include/utils.h @@ -302,7 +302,9 @@ int get_real_family(int rtm_type, int rtm_family); int cmd_exec(const char *cmd, char **argv, bool do_fork, int (*setup)(void *), void *arg); int make_path(const char *path, mode_t mode); -char *find_cgroup2_mount(void); +char *find_cgroup2_mount(bool do_mount); +__u64 get_cgroup2_id(const char *path); +char *get_cgroup2_path(__u64 id, bool full); int get_command_name(const char *pid, char *comm, size_t len); int get_rtnl_link_stats_rta(struct rtnl_link_stats64 *stats64, diff --git a/ip/ipvrf.c b/ip/ipvrf.c index b9a4367..28dd8e2 100644 --- a/ip/ipvrf.c +++ b/ip/ipvrf.c @@ -225,7 +225,7 @@ static int ipvrf_pids(int argc, char **argv) return -1; } - mnt = find_cgroup2_mount(); + mnt = find_cgroup2_mount(true); if (!mnt) return -1; @@ -366,7 +366,7 @@ static int vrf_switch(const char *name) } } - mnt = find_cgroup2_mount(); + mnt = find_cgroup2_mount(true); if (!mnt) return -1; diff --git a/lib/Makefile b/lib/Makefile index bab8cbf..7cba185 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -5,7 +5,7 @@ CFLAGS += -fPIC UTILOBJ = utils.o rt_names.o ll_map.o ll_types.o ll_proto.o ll_addr.o \ inet_proto.o namespace.o json_writer.o json_print.o \ - names.o color.o bpf.o exec.o fs.o + names.o color.o bpf.o exec.o fs.o cg_map.o NLOBJ=libgenl.o libnetlink.o diff --git a/lib/cg_map.c b/lib/cg_map.c new file mode 100644 index 0000000..77f030e --- /dev/null +++ b/lib/cg_map.c @@ -0,0 +1,135 @@ +/* + * cg_map.c cgroup v2 cache + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors: Dmitry Yakunin + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "cg_map.h" +#include "list.h" +#include "utils.h" + +struct cg_cache { + struct hlist_node id_hash; + __u64 id; + char path[]; +}; + +#define IDMAP_SIZE 1024 +static struct hlist_head id_head[IDMAP_SIZE]; + +static struct cg_cache *cg_get_by_id(__u64 id) +{ + unsigned int h = id & (IDMAP_SIZE - 1); + struct hlist_node *n; + + hlist_for_each(n, &id_head[h]) { + struct cg_cache *cg; + + cg = container_of(n, struct cg_cache, id_hash); + if (cg->id == id) + return cg; + } + + return NULL; +} + +static struct cg_cache *cg_entry_create(__u64 id, const char *path) +{ + unsigned int h = id & (IDMAP_SIZE - 1); + struct cg_cache *cg; + + cg = malloc(sizeof(*cg) + strlen(path) + 1); + if (!cg) { + fprintf(stderr, + "Failed to allocate memory for cgroup2 cache entry"); + return NULL; + } + cg->id = id; + strcpy(cg->path, path); + + hlist_add_head(&cg->id_hash, &id_head[h]); + + return cg; +} + +static int mntlen; + +static int nftw_fn(const char *fpath, const struct stat *sb, + int typeflag, struct FTW *ftw) +{ + const char *path; + __u64 id; + + if (typeflag != FTW_D) + return 0; + + id = get_cgroup2_id(fpath); + if (!id) + return -1; + + path = fpath + mntlen; + if (*path == '\0') + /* root cgroup */ + path = "/"; + if (!cg_entry_create(id, path)) + return -1; + + return 0; +} + +static void cg_init_map(void) +{ + char *mnt; + + mnt = find_cgroup2_mount(false); + if (!mnt) + exit(1); + + mntlen = strlen(mnt); + if (nftw(mnt, nftw_fn, 1024, FTW_MOUNT) < 0) + exit(1); + + free(mnt); +} + +const char *cg_id_to_path(__u64 id) +{ + static int initialized; + static char buf[64]; + + const struct cg_cache *cg; + char *path; + + if (!initialized) { + cg_init_map(); + initialized = 1; + } + + cg = cg_get_by_id(id); + if (cg) + return cg->path; + + path = get_cgroup2_path(id, false); + if (path) { + cg = cg_entry_create(id, path); + free(path); + if (cg) + return cg->path; + } + + snprintf(buf, sizeof(buf), "unreachable:%llx", id); + return buf; +} diff --git a/lib/fs.c b/lib/fs.c index 86efd4e..e265fc0 100644 --- a/lib/fs.c +++ b/lib/fs.c @@ -59,13 +59,18 @@ static char *find_fs_mount(const char *fs_to_find) } /* caller needs to free string returned */ -char *find_cgroup2_mount(void) +char *find_cgroup2_mount(bool do_mount) { char *mnt = find_fs_mount(CGROUP2_FS_NAME); if (mnt) return mnt; + if (!do_mount) { + fprintf(stderr, "Failed to find cgroup2 mount\n"); + return NULL; + } + mnt = strdup(MNT_CGRP2_PATH); if (!mnt) { fprintf(stderr, "Failed to allocate memory for cgroup2 path\n"); @@ -74,7 +79,7 @@ char *find_cgroup2_mount(void) } if (make_path(mnt, 0755)) { - fprintf(stderr, "Failed to setup vrf cgroup2 directory\n"); + fprintf(stderr, "Failed to setup cgroup2 directory\n"); free(mnt); return NULL; } @@ -99,6 +104,134 @@ out: return mnt; } +__u64 get_cgroup2_id(const char *path) +{ + char fh_buf[sizeof(struct file_handle) + sizeof(__u64)] = { 0 }; + struct file_handle *fhp = (struct file_handle *)fh_buf; + union { + __u64 id; + unsigned char bytes[sizeof(__u64)]; + } cg_id = { .id = 0 }; + char *mnt = NULL; + int mnt_fd = -1; + int mnt_id; + + if (!path) { + fprintf(stderr, "Invalid cgroup2 path\n"); + return 0; + } + + fhp->handle_bytes = sizeof(__u64); + if (name_to_handle_at(AT_FDCWD, path, fhp, &mnt_id, 0) < 0) { + /* try at cgroup2 mount */ + + while (*path == '/') + path++; + if (*path == '\0') { + fprintf(stderr, "Invalid cgroup2 path\n"); + goto out; + } + + mnt = find_cgroup2_mount(false); + if (!mnt) + goto out; + + mnt_fd = open(mnt, O_RDONLY); + if (mnt_fd < 0) { + fprintf(stderr, "Failed to open cgroup2 mount\n"); + goto out; + } + + fhp->handle_bytes = sizeof(__u64); + if (name_to_handle_at(mnt_fd, path, fhp, &mnt_id, 0) < 0) { + fprintf(stderr, "Failed to get cgroup2 ID: %s\n", + strerror(errno)); + goto out; + } + if (fhp->handle_bytes != sizeof(__u64)) { + fprintf(stderr, "Invalid size of cgroup2 ID\n"); + goto out; + } + } + + memcpy(cg_id.bytes, fhp->f_handle, sizeof(__u64)); + +out: + close(mnt_fd); + free(mnt); + + return cg_id.id; +} + +#define FILEID_INO32_GEN 1 + +/* caller needs to free string returned */ +char *get_cgroup2_path(__u64 id, bool full) +{ + char fh_buf[sizeof(struct file_handle) + sizeof(__u64)] = { 0 }; + struct file_handle *fhp = (struct file_handle *)fh_buf; + union { + __u64 id; + unsigned char bytes[sizeof(__u64)]; + } cg_id = { .id = id }; + int mnt_fd = -1, fd = -1; + char link_buf[PATH_MAX]; + char *path = NULL; + char fd_path[64]; + int link_len; + char *mnt; + + if (!id) { + fprintf(stderr, "Invalid cgroup2 ID\n"); + return NULL; + } + + mnt = find_cgroup2_mount(false); + if (!mnt) + return NULL; + + mnt_fd = open(mnt, O_RDONLY); + if (mnt_fd < 0) { + fprintf(stderr, "Failed to open cgroup2 mount\n"); + goto out; + } + + fhp->handle_bytes = sizeof(__u64); + fhp->handle_type = FILEID_INO32_GEN; + memcpy(fhp->f_handle, cg_id.bytes, sizeof(__u64)); + + fd = open_by_handle_at(mnt_fd, fhp, 0); + if (fd < 0) { + fprintf(stderr, "Failed to open cgroup2 by ID\n"); + goto out; + } + + snprintf(fd_path, sizeof(fd_path), "/proc/self/fd/%d", fd); + link_len = readlink(fd_path, link_buf, sizeof(link_buf) - 1); + if (link_len < 0) { + fprintf(stderr, + "Failed to read value of symbolic link %s\n", + fd_path); + goto out; + } + link_buf[link_len] = '\0'; + + if (full) + path = strdup(link_buf); + else + path = strdup(link_buf + strlen(mnt)); + if (!path) + fprintf(stderr, + "Failed to allocate memory for cgroup2 path\n"); + +out: + close(fd); + close(mnt_fd); + free(mnt); + + return path; +} + int make_path(const char *path, mode_t mode) { char *dir, *delim; From patchwork Sat May 9 16:52:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Yakunin X-Patchwork-Id: 1286746 X-Patchwork-Delegate: dsahern@gmail.com Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=yandex-team.ru Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=yandex-team.ru header.i=@yandex-team.ru header.a=rsa-sha256 header.s=default header.b=luADOpYZ; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49KCtZ31BVz9sNH for ; Sun, 10 May 2020 02:52:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728223AbgEIQwx (ORCPT ); Sat, 9 May 2020 12:52:53 -0400 Received: from forwardcorp1j.mail.yandex.net ([5.45.199.163]:38788 "EHLO forwardcorp1j.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726214AbgEIQwx (ORCPT ); Sat, 9 May 2020 12:52:53 -0400 Received: from mxbackcorp2j.mail.yandex.net (mxbackcorp2j.mail.yandex.net [IPv6:2a02:6b8:0:1619::119]) by forwardcorp1j.mail.yandex.net (Yandex) with ESMTP id 29D672E0DF2; Sat, 9 May 2020 19:52:49 +0300 (MSK) Received: from vla1-81430ab5870b.qloud-c.yandex.net (vla1-81430ab5870b.qloud-c.yandex.net [2a02:6b8:c0d:35a1:0:640:8143:ab5]) by mxbackcorp2j.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id MD8WDsPY7T-qlXqssvI; Sat, 09 May 2020 19:52:49 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1589043169; bh=eY9ApoVEM+jffkVAX35mJ0Fa3hnBUudjCCNzhbqxNJA=; h=In-Reply-To:Message-Id:References:Date:Subject:To:From:Cc; b=luADOpYZbYLZcwfHbDI2pLY98TFP7659u6awPXe7YDRlHvjO/RRiKmgoSyiZap/kW hWvWpeNcSksDo8NydqtquPhfzOGknHzkoW04g/K1yBZZW9VqXxVrLbcgK5Zx5rH8pY NSjKtb43YqJUkmV0Izpic0D9hyJSeyRZxMsnXZ6U= Authentication-Results: mxbackcorp2j.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from 178.154.191.33-vpn.dhcp.yndx.net (178.154.191.33-vpn.dhcp.yndx.net [178.154.191.33]) by vla1-81430ab5870b.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id lkcxSGO050-qlXa0QV4; Sat, 09 May 2020 19:52:47 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) From: Dmitry Yakunin To: netdev@vger.kernel.org, dsahern@gmail.com Cc: cgroups@vger.kernel.org Subject: [PATCH iproute2-next v2 2/3] ss: add support for cgroup v2 information and filtering Date: Sat, 9 May 2020 19:52:01 +0300 Message-Id: <20200509165202.17959-2-zeil@yandex-team.ru> In-Reply-To: <20200509165202.17959-1-zeil@yandex-team.ru> References: <20200509165202.17959-1-zeil@yandex-team.ru> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch introduces two new features: obtaining cgroup information and filtering sockets by cgroups. These features work based on cgroup v2 ID field in the socket (kernel should be compiled with CONFIG_SOCK_CGROUP_DATA). Cgroup information can be obtained by specifying --cgroup flag and now contains only pathname. For faster pathname lookups cgroup cache is implemented. This cache is filled on ss startup and missed entries are resolved and saved on the fly. Cgroup filter extends EXPRESSION and allows to specify cgroup pathname (relative or absolute) to obtain sockets attached only to this cgroup. Filter syntax: ss [ cgroup PATHNAME ] Examples: ss -a cgroup /sys/fs/cgroup/unified (or ss -a cgroup .) ss -a cgroup /sys/fs/cgroup/unified/cgroup1 (or ss -a cgroup cgroup1) v2: - style fixes (David Ahern) Signed-off-by: Dmitry Yakunin --- include/uapi/linux/inet_diag.h | 2 ++ man/man8/ss.8 | 9 +++++++ misc/ss.c | 61 ++++++++++++++++++++++++++++++++++++++++++ misc/ssfilter.h | 2 ++ misc/ssfilter.y | 22 ++++++++++++++- 5 files changed, 95 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h index 0c1c781..f009abf 100644 --- a/include/uapi/linux/inet_diag.h +++ b/include/uapi/linux/inet_diag.h @@ -96,6 +96,7 @@ enum { INET_DIAG_BC_MARK_COND, INET_DIAG_BC_S_EQ, INET_DIAG_BC_D_EQ, + INET_DIAG_BC_CGROUP_COND, /* u64 cgroup v2 ID */ }; struct inet_diag_hostcond { @@ -157,6 +158,7 @@ enum { INET_DIAG_MD5SIG, INET_DIAG_ULP_INFO, INET_DIAG_SK_BPF_STORAGES, + INET_DIAG_CGROUP_ID, __INET_DIAG_MAX, }; diff --git a/man/man8/ss.8 b/man/man8/ss.8 index 023d771..894cb20 100644 --- a/man/man8/ss.8 +++ b/man/man8/ss.8 @@ -281,6 +281,15 @@ Class id set by net_cls cgroup. If class is zero this shows priority set by SO_PRIORITY. .RE .TP +.B \-\-cgroup +Show cgroup information. Below fields may appear: +.RS +.P +.TP +.B cgroup +Cgroup v2 pathname. This pathname is relative to the mount point of the hierarchy. +.RE +.TP .B \-K, \-\-kill Attempts to forcibly close sockets. This option displays sockets that are successfully closed and silently skips sockets that the kernel does not support diff --git a/misc/ss.c b/misc/ss.c index 75fde23..b9e6b15 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -36,6 +36,7 @@ #include "namespace.h" #include "SNAPSHOT.h" #include "rt_names.h" +#include "cg_map.h" #include #include @@ -122,6 +123,7 @@ static int follow_events; static int sctp_ino; static int show_tipcinfo; static int show_tos; +static int show_cgroup; int oneline; enum col_id { @@ -797,6 +799,7 @@ struct sockstat { char *name; char *peer_name; __u32 mark; + __u64 cgroup_id; }; struct dctcpstat { @@ -1417,6 +1420,9 @@ static void sock_details_print(struct sockstat *s) if (s->mark) out(" fwmark:0x%x", s->mark); + + if (s->cgroup_id) + out(" cgroup:%s", cg_id_to_path(s->cgroup_id)); } static void sock_addr_print(const char *addr, char *delim, const char *port, @@ -1643,6 +1649,7 @@ struct aafilter { unsigned int iface; __u32 mark; __u32 mask; + __u64 cgroup_id; struct aafilter *next; }; @@ -1771,6 +1778,12 @@ static int run_ssfilter(struct ssfilter *f, struct sockstat *s) return (s->mark & a->mask) == a->mark; } + case SSF_CGROUPCOND: + { + struct aafilter *a = (void *)f->pred; + + return s->cgroup_id == a->cgroup_id; + } /* Yup. It is recursion. Sorry. */ case SSF_AND: return run_ssfilter(f->pred, s) && run_ssfilter(f->post, s); @@ -1963,6 +1976,23 @@ static int ssfilter_bytecompile(struct ssfilter *f, char **bytecode) return inslen; } + case SSF_CGROUPCOND: + { + struct aafilter *a = (void *)f->pred; + struct instr { + struct inet_diag_bc_op op; + __u64 cgroup_id; + } __attribute__((packed)); + int inslen = sizeof(struct instr); + + if (!(*bytecode = malloc(inslen))) abort(); + ((struct instr *)*bytecode)[0] = (struct instr) { + { INET_DIAG_BC_CGROUP_COND, inslen, inslen + 4 }, + a->cgroup_id, + }; + + return inslen; + } default: abort(); } @@ -2300,6 +2330,22 @@ void *parse_markmask(const char *markmask) return res; } +void *parse_cgroupcond(const char *path) +{ + struct aafilter *res; + __u64 id; + + id = get_cgroup2_id(path); + if (!id) + return NULL; + + res = malloc(sizeof(*res)); + if (res) + res->cgroup_id = id; + + return res; +} + static void proc_ctx_print(struct sockstat *s) { char *buf; @@ -3104,6 +3150,9 @@ static void parse_diag_msg(struct nlmsghdr *nlh, struct sockstat *s) s->mark = 0; if (tb[INET_DIAG_MARK]) s->mark = rta_getattr_u32(tb[INET_DIAG_MARK]); + s->cgroup_id = 0; + if (tb[INET_DIAG_CGROUP_ID]) + s->cgroup_id = rta_getattr_u64(tb[INET_DIAG_CGROUP_ID]); if (tb[INET_DIAG_PROTOCOL]) s->raw_prot = rta_getattr_u8(tb[INET_DIAG_PROTOCOL]); else @@ -3171,6 +3220,11 @@ static int inet_show_sock(struct nlmsghdr *nlh, out(" class_id:%#x", rta_getattr_u32(tb[INET_DIAG_CLASS_ID])); } + if (show_cgroup) { + if (tb[INET_DIAG_CGROUP_ID]) + out(" cgroup:%s", cg_id_to_path(rta_getattr_u64(tb[INET_DIAG_CGROUP_ID]))); + } + if (show_mem || (show_tcpinfo && s->type != IPPROTO_UDP)) { if (!oneline) out("\n\t"); @@ -4996,6 +5050,7 @@ static void _usage(FILE *dest) " --tipcinfo show internal tipc socket information\n" " -s, --summary show socket usage summary\n" " --tos show tos and priority information\n" +" --cgroup show cgroup information\n" " -b, --bpf show bpf filter socket information\n" " -E, --events continually display sockets as they are destroyed\n" " -Z, --context display process SELinux security contexts\n" @@ -5106,6 +5161,8 @@ static int scan_state(const char *state) /* Values of 'x' are already used so a non-character is used */ #define OPT_XDPSOCK 260 +#define OPT_CGROUP 261 + static const struct option long_opts[] = { { "numeric", 0, 0, 'n' }, { "resolve", 0, 0, 'r' }, @@ -5142,6 +5199,7 @@ static const struct option long_opts[] = { { "net", 1, 0, 'N' }, { "tipcinfo", 0, 0, OPT_TIPCINFO}, { "tos", 0, 0, OPT_TOS }, + { "cgroup", 0, 0, OPT_CGROUP }, { "kill", 0, 0, 'K' }, { "no-header", 0, 0, 'H' }, { "xdp", 0, 0, OPT_XDPSOCK}, @@ -5329,6 +5387,9 @@ int main(int argc, char *argv[]) case OPT_TOS: show_tos = 1; break; + case OPT_CGROUP: + show_cgroup = 1; + break; case 'K': current_filter.kill = 1; break; diff --git a/misc/ssfilter.h b/misc/ssfilter.h index f5b0bc8..d85c084 100644 --- a/misc/ssfilter.h +++ b/misc/ssfilter.h @@ -11,6 +11,7 @@ #define SSF_S_AUTO 9 #define SSF_DEVCOND 10 #define SSF_MARKMASK 11 +#define SSF_CGROUPCOND 12 #include @@ -25,3 +26,4 @@ int ssfilter_parse(struct ssfilter **f, int argc, char **argv, FILE *fp); void *parse_hostcond(char *addr, bool is_port); void *parse_devcond(char *name); void *parse_markmask(const char *markmask); +void *parse_cgroupcond(const char *path); diff --git a/misc/ssfilter.y b/misc/ssfilter.y index a901ae7..b417579 100644 --- a/misc/ssfilter.y +++ b/misc/ssfilter.y @@ -36,7 +36,7 @@ static void yyerror(char *s) %} -%token HOSTCOND DCOND SCOND DPORT SPORT LEQ GEQ NEQ AUTOBOUND DEVCOND DEVNAME MARKMASK FWMARK +%token HOSTCOND DCOND SCOND DPORT SPORT LEQ GEQ NEQ AUTOBOUND DEVCOND DEVNAME MARKMASK FWMARK CGROUPCOND CGROUPPATH %left '|' %left '&' %nonassoc '!' @@ -156,6 +156,14 @@ expr: '(' exprlist ')' { $$ = alloc_node(SSF_NOT, alloc_node(SSF_MARKMASK, $3)); } + | CGROUPPATH eq CGROUPCOND + { + $$ = alloc_node(SSF_CGROUPCOND, $3); + } + | CGROUPPATH NEQ CGROUPCOND + { + $$ = alloc_node(SSF_NOT, alloc_node(SSF_CGROUPCOND, $3)); + } | AUTOBOUND { $$ = alloc_node(SSF_S_AUTO, NULL); @@ -276,6 +284,10 @@ int yylex(void) tok_type = FWMARK; return FWMARK; } + if (strcmp(curtok, "cgroup") == 0) { + tok_type = CGROUPPATH; + return CGROUPPATH; + } if (strcmp(curtok, ">=") == 0 || strcmp(curtok, "ge") == 0 || strcmp(curtok, "geq") == 0) @@ -318,6 +330,14 @@ int yylex(void) } return MARKMASK; } + if (tok_type == CGROUPPATH) { + yylval = (void*)parse_cgroupcond(curtok); + if (yylval == NULL) { + fprintf(stderr, "Cannot parse cgroup %s.\n", curtok); + exit(1); + } + return CGROUPCOND; + } yylval = (void*)parse_hostcond(curtok, tok_type == SPORT || tok_type == DPORT); if (yylval == NULL) { fprintf(stderr, "Cannot parse dst/src address.\n"); From patchwork Sat May 9 16:52:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Yakunin X-Patchwork-Id: 1286747 X-Patchwork-Delegate: dsahern@gmail.com Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=yandex-team.ru Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=yandex-team.ru header.i=@yandex-team.ru header.a=rsa-sha256 header.s=default header.b=jJOJvZfR; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49KCtc1V4Hz9sSc for ; Sun, 10 May 2020 02:52:56 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728299AbgEIQwz (ORCPT ); Sat, 9 May 2020 12:52:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726214AbgEIQwy (ORCPT ); Sat, 9 May 2020 12:52:54 -0400 Received: from forwardcorp1p.mail.yandex.net (forwardcorp1p.mail.yandex.net [IPv6:2a02:6b8:0:1472:2741:0:8b6:217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DA24C061A0C; Sat, 9 May 2020 09:52:54 -0700 (PDT) Received: from mxbackcorp1g.mail.yandex.net (mxbackcorp1g.mail.yandex.net [IPv6:2a02:6b8:0:1402::301]) by forwardcorp1p.mail.yandex.net (Yandex) with ESMTP id BE06B2E14DB; Sat, 9 May 2020 19:52:51 +0300 (MSK) Received: from vla1-81430ab5870b.qloud-c.yandex.net (vla1-81430ab5870b.qloud-c.yandex.net [2a02:6b8:c0d:35a1:0:640:8143:ab5]) by mxbackcorp1g.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id RvmXcuqFWP-qpAimdQ6; Sat, 09 May 2020 19:52:51 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1589043171; bh=8k209FJXinjPV43znfx17gvseR4z9Ra4J9eybSHcz/s=; h=In-Reply-To:Message-Id:References:Date:Subject:To:From:Cc; b=jJOJvZfRM/NeBG6xUQsKebwBdxzXs55cw2qCbS+Px8HIM4apTo+xLIzUjbBsqjsCJ g4mqVlXD4zzhL/URfDmpRaExHdhgkqdtCwflanMmw/w35k0IVv8/PzIIizsuavPp6l vryQpH9gx3WTe2yz/OSBHkYzS7ic7KRw//u70u4k= Authentication-Results: mxbackcorp1g.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from 178.154.191.33-vpn.dhcp.yndx.net (178.154.191.33-vpn.dhcp.yndx.net [178.154.191.33]) by vla1-81430ab5870b.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id lkcxSGO050-qoXaRVLJ; Sat, 09 May 2020 19:52:50 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) From: Dmitry Yakunin To: netdev@vger.kernel.org, dsahern@gmail.com Cc: cgroups@vger.kernel.org Subject: [PATCH iproute2-next v2 3/3] ss: add checks for bc filter support Date: Sat, 9 May 2020 19:52:02 +0300 Message-Id: <20200509165202.17959-3-zeil@yandex-team.ru> In-Reply-To: <20200509165202.17959-1-zeil@yandex-team.ru> References: <20200509165202.17959-1-zeil@yandex-team.ru> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org As noted by David Ahern, now if some bytecode filter is not supported by running kernel printed error message is not clear. This patch is attempt to detect such case and print correct message. This is done by providing checking function for new filter types. As example check function for cgroup filter is implemented. It sends correct lightweight request (idiag_states = 0) with zero cgroup condition to the kernel and checks returned errno. If filter is not supported EINVAL is returned. Result of checking is cached to avoid extra checks if several same filters are specified. Signed-off-by: Dmitry Yakunin --- misc/Makefile | 2 +- misc/ss.c | 17 +-------- misc/ss_util.h | 22 +++++++++++ misc/ssfilter.h | 34 +++++++++-------- misc/ssfilter.y | 9 ++++- misc/ssfilter_check.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 154 insertions(+), 33 deletions(-) create mode 100644 misc/ss_util.h create mode 100644 misc/ssfilter_check.c diff --git a/misc/Makefile b/misc/Makefile index 1debfb1..50dae79 100644 --- a/misc/Makefile +++ b/misc/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -SSOBJ=ss.o ssfilter.tab.o +SSOBJ=ss.o ssfilter_check.o ssfilter.tab.o LNSTATOBJ=lnstat.o lnstat_util.o TARGETS=ss nstat ifstat rtacct lnstat diff --git a/misc/ss.c b/misc/ss.c index b9e6b15..1891e9c 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -29,6 +29,7 @@ #include #include +#include "ss_util.h" #include "utils.h" #include "rt_names.h" #include "ll_map.h" @@ -39,8 +40,6 @@ #include "cg_map.h" #include -#include -#include #include #include /* for MAX_ADDR_LEN */ #include @@ -63,24 +62,10 @@ #define AF_VSOCK PF_VSOCK #endif -#define MAGIC_SEQ 123456 #define BUF_CHUNK (1024 * 1024) /* Buffer chunk allocation size */ #define BUF_CHUNKS_MAX 5 /* Maximum number of allocated buffer chunks */ #define LEN_ALIGN(x) (((x) + 1) & ~1) -#define DIAG_REQUEST(_req, _r) \ - struct { \ - struct nlmsghdr nlh; \ - _r; \ - } _req = { \ - .nlh = { \ - .nlmsg_type = SOCK_DIAG_BY_FAMILY, \ - .nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST,\ - .nlmsg_seq = MAGIC_SEQ, \ - .nlmsg_len = sizeof(_req), \ - }, \ - } - #if HAVE_SELINUX #include #else diff --git a/misc/ss_util.h b/misc/ss_util.h new file mode 100644 index 0000000..f7e40bb --- /dev/null +++ b/misc/ss_util.h @@ -0,0 +1,22 @@ +#ifndef __SS_UTIL_H__ +#define __SS_UTIL_H__ + +#include +#include + +#define MAGIC_SEQ 123456 + +#define DIAG_REQUEST(_req, _r) \ + struct { \ + struct nlmsghdr nlh; \ + _r; \ + } _req = { \ + .nlh = { \ + .nlmsg_type = SOCK_DIAG_BY_FAMILY, \ + .nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST,\ + .nlmsg_seq = MAGIC_SEQ, \ + .nlmsg_len = sizeof(_req), \ + }, \ + } + +#endif /* __SS_UTIL_H__ */ diff --git a/misc/ssfilter.h b/misc/ssfilter.h index d85c084..0be3b1e 100644 --- a/misc/ssfilter.h +++ b/misc/ssfilter.h @@ -1,20 +1,24 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#define SSF_DCOND 0 -#define SSF_SCOND 1 -#define SSF_OR 2 -#define SSF_AND 3 -#define SSF_NOT 4 -#define SSF_D_GE 5 -#define SSF_D_LE 6 -#define SSF_S_GE 7 -#define SSF_S_LE 8 -#define SSF_S_AUTO 9 -#define SSF_DEVCOND 10 -#define SSF_MARKMASK 11 -#define SSF_CGROUPCOND 12 - #include +enum { + SSF_DCOND, + SSF_SCOND, + SSF_OR, + SSF_AND, + SSF_NOT, + SSF_D_GE, + SSF_D_LE, + SSF_S_GE, + SSF_S_LE, + SSF_S_AUTO, + SSF_DEVCOND, + SSF_MARKMASK, + SSF_CGROUPCOND, + SSF__MAX +}; + +bool ssfilter_is_supported(int type); + struct ssfilter { int type; diff --git a/misc/ssfilter.y b/misc/ssfilter.y index b417579..8e16b44 100644 --- a/misc/ssfilter.y +++ b/misc/ssfilter.y @@ -12,7 +12,14 @@ typedef struct ssfilter * ssfilter_t; static struct ssfilter * alloc_node(int type, void *pred) { - struct ssfilter *n = malloc(sizeof(*n)); + struct ssfilter *n; + + if (!ssfilter_is_supported(type)) { + fprintf(stderr, "It looks like such filter is not supported! Too old kernel?\n"); + exit(-1); + } + + n = malloc(sizeof(*n)); if (n == NULL) abort(); n->type = type; diff --git a/misc/ssfilter_check.c b/misc/ssfilter_check.c new file mode 100644 index 0000000..38c960c --- /dev/null +++ b/misc/ssfilter_check.c @@ -0,0 +1,103 @@ +#include +#include +#include + +#include "libnetlink.h" +#include "ssfilter.h" +#include "ss_util.h" + +static int dummy_filter(struct nlmsghdr *n, void *arg) +{ + /* just stops rtnl_dump_filter() */ + return -1; +} + +static bool cgroup_filter_check(void) +{ + struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; + DIAG_REQUEST(req, struct inet_diag_req_v2 r); + struct instr { + struct inet_diag_bc_op op; + __u64 cgroup_id; + } __attribute__((packed)); + int inslen = sizeof(struct instr); + struct instr instr = { + { INET_DIAG_BC_CGROUP_COND, inslen, inslen + 4 }, + 0 + }; + struct rtnl_handle rth; + struct iovec iov[3]; + struct msghdr msg; + struct rtattr rta; + int ret = false; + int iovlen = 3; + + if (rtnl_open_byproto(&rth, 0, NETLINK_SOCK_DIAG)) + return false; + rth.dump = MAGIC_SEQ; + rth.flags = RTNL_HANDLE_F_SUPPRESS_NLERR; + + memset(&req.r, 0, sizeof(req.r)); + req.r.sdiag_family = AF_INET; + req.r.sdiag_protocol = IPPROTO_TCP; + req.nlh.nlmsg_len += RTA_LENGTH(inslen); + + rta.rta_type = INET_DIAG_REQ_BYTECODE; + rta.rta_len = RTA_LENGTH(inslen); + + iov[0] = (struct iovec) { &req, sizeof(req) }; + iov[1] = (struct iovec) { &rta, sizeof(rta) }; + iov[2] = (struct iovec) { &instr, inslen }; + + msg = (struct msghdr) { + .msg_name = (void *)&nladdr, + .msg_namelen = sizeof(nladdr), + .msg_iov = iov, + .msg_iovlen = iovlen, + }; + + if (sendmsg(rth.fd, &msg, 0) < 0) + goto out; + + if (rtnl_dump_filter(&rth, dummy_filter, NULL) < 0) { + ret = (errno != EINVAL); + goto out; + } + + ret = true; + +out: + rtnl_close(&rth); + + return ret; +} + + +struct filter_check_t { + bool (*check)(void); + int checked:1, + supported:1; +}; + +static struct filter_check_t filter_checks[SSF__MAX] = { + [SSF_CGROUPCOND] = { cgroup_filter_check, 0 }, +}; + +bool ssfilter_is_supported(int type) +{ + struct filter_check_t f; + + if (type >= SSF__MAX) + return false; + + f = filter_checks[type]; + if (!f.check) + return true; + + if (!f.checked) { + f.supported = f.check(); + f.checked = 1; + } + + return f.supported; +}