Message ID | 20190606114142.15972-2-christian@brauner.io |
---|---|
State | Awaiting Upstream |
Delegated to: | David Miller |
Headers | show |
Series | br_netfilter: enable in non-initial netns | expand |
On Thu, 6 Jun 2019 13:41:41 +0200 Christian Brauner <christian@brauner.io> wrote: > +struct netns_brnf { > +#ifdef CONFIG_SYSCTL > + struct ctl_table_header *ctl_hdr; > +#endif > + > + /* default value is 1 */ > + int call_iptables; > + int call_ip6tables; > + int call_arptables; > + > + /* default value is 0 */ > + int filter_vlan_tagged; > + int filter_pppoe_tagged; > + int pass_vlan_indev; > +}; Do you really need to waste four bytes for each flag value. If you use a u8 that would work just as well. Bool would also work but the kernel developers frown on bool in structures.
On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote: > On Thu, 6 Jun 2019 13:41:41 +0200 > Christian Brauner <christian@brauner.io> wrote: > > > +struct netns_brnf { > > +#ifdef CONFIG_SYSCTL > > + struct ctl_table_header *ctl_hdr; > > +#endif > > + > > + /* default value is 1 */ > > + int call_iptables; > > + int call_ip6tables; > > + int call_arptables; > > + > > + /* default value is 0 */ > > + int filter_vlan_tagged; > > + int filter_pppoe_tagged; > > + int pass_vlan_indev; > > +}; > > Do you really need to waste four bytes for each > flag value. If you use a u8 that would work just as well. I think we had discussed something like this but the problem why we can't do this stems from how the sysctl-table stuff is implemented. I distinctly remember that it couldn't be done with a flag due to that. Christian
On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote: > On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote: > > On Thu, 6 Jun 2019 13:41:41 +0200 > > Christian Brauner <christian@brauner.io> wrote: > > > > > +struct netns_brnf { > > > +#ifdef CONFIG_SYSCTL > > > + struct ctl_table_header *ctl_hdr; > > > +#endif > > > + > > > + /* default value is 1 */ > > > + int call_iptables; > > > + int call_ip6tables; > > > + int call_arptables; > > > + > > > + /* default value is 0 */ > > > + int filter_vlan_tagged; > > > + int filter_pppoe_tagged; > > > + int pass_vlan_indev; > > > +}; > > > > Do you really need to waste four bytes for each > > flag value. If you use a u8 that would work just as well. > > I think we had discussed something like this but the problem why we > can't do this stems from how the sysctl-table stuff is implemented. > I distinctly remember that it couldn't be done with a flag due to that. Could you define a pernet_operations object? I mean, define the id and size fields, then pass it to register_pernet_subsys() for registration. Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see clusterip_net_ops and clusterip_pernet() for instance.
On Thu, Jun 06, 2019 at 06:30:35PM +0200, Pablo Neira Ayuso wrote: > On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote: > > On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote: > > > On Thu, 6 Jun 2019 13:41:41 +0200 > > > Christian Brauner <christian@brauner.io> wrote: > > > > > > > +struct netns_brnf { > > > > +#ifdef CONFIG_SYSCTL > > > > + struct ctl_table_header *ctl_hdr; > > > > +#endif > > > > + > > > > + /* default value is 1 */ > > > > + int call_iptables; > > > > + int call_ip6tables; > > > > + int call_arptables; > > > > + > > > > + /* default value is 0 */ > > > > + int filter_vlan_tagged; > > > > + int filter_pppoe_tagged; > > > > + int pass_vlan_indev; > > > > +}; > > > > > > Do you really need to waste four bytes for each > > > flag value. If you use a u8 that would work just as well. > > > > I think we had discussed something like this but the problem why we > > can't do this stems from how the sysctl-table stuff is implemented. > > I distinctly remember that it couldn't be done with a flag due to that. > > Could you define a pernet_operations object? I mean, define the id and size > fields, then pass it to register_pernet_subsys() for registration. > Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see > clusterip_net_ops and clusterip_pernet() for instance. Hm, I don't think that would work. The sysctls for br_netfilter are located in /proc/sys/net/bridge under /proc/sys/net which is tightly integrated with the sysctls infrastructure for all of net/ and all the folder underneath it including "core", "ipv4" and "ipv6". I don't think creating and managing files manually in /proc/sys/net is going to fly. It also doesn't seem very wise from a consistency and complexity pov. I'm also not sure if this would work at all wrt to file creation and reference counting if there are two different ways of managing them in the same subfolder... (clusterip creates files manually underneath /proc/net which probably is the reason why it gets away with it.) Christian
On Fri, Jun 07, 2019 at 03:25:16PM +0200, Christian Brauner wrote: > On Thu, Jun 06, 2019 at 06:30:35PM +0200, Pablo Neira Ayuso wrote: > > On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote: > > > On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote: > > > > On Thu, 6 Jun 2019 13:41:41 +0200 > > > > Christian Brauner <christian@brauner.io> wrote: > > > > > > > > > +struct netns_brnf { > > > > > +#ifdef CONFIG_SYSCTL > > > > > + struct ctl_table_header *ctl_hdr; > > > > > +#endif > > > > > + > > > > > + /* default value is 1 */ > > > > > + int call_iptables; > > > > > + int call_ip6tables; > > > > > + int call_arptables; > > > > > + > > > > > + /* default value is 0 */ > > > > > + int filter_vlan_tagged; > > > > > + int filter_pppoe_tagged; > > > > > + int pass_vlan_indev; > > > > > +}; > > > > > > > > Do you really need to waste four bytes for each > > > > flag value. If you use a u8 that would work just as well. > > > > > > I think we had discussed something like this but the problem why we > > > can't do this stems from how the sysctl-table stuff is implemented. > > > I distinctly remember that it couldn't be done with a flag due to that. > > > > Could you define a pernet_operations object? I mean, define the id and size > > fields, then pass it to register_pernet_subsys() for registration. > > Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see > > clusterip_net_ops and clusterip_pernet() for instance. > > Hm, I don't think that would work. The sysctls for br_netfilter are > located in /proc/sys/net/bridge under /proc/sys/net which is tightly > integrated with the sysctls infrastructure for all of net/ and all the > folder underneath it including "core", "ipv4" and "ipv6". > I don't think creating and managing files manually in /proc/sys/net is > going to fly. It also doesn't seem very wise from a consistency and > complexity pov. I'm also not sure if this would work at all wrt to file > creation and reference counting if there are two different ways of > managing them in the same subfolder... > (clusterip creates files manually underneath /proc/net which probably is > the reason why it gets away with it.) br_netfilter is now a module, and br_netfilter_hooks.c is part of it IIRC, this file registers these sysctl entries from the module __init path. It would be a matter of adding a new .init callback to the existing brnf_net_ops object in br_netfilter_hooks.c. Then, call register_net_sysctl() from this .init callback to register the sysctl entries per netns. There is already a brnf_net area that you can reuse for this purpose, to place these pernetns flags... struct brnf_net { bool enabled; }; which is going to be glad to have more fields (under the #ifdef CONFIG_SYSCTL) there.
On Fri, Jun 07, 2019 at 04:28:58PM +0200, Pablo Neira Ayuso wrote: > On Fri, Jun 07, 2019 at 03:25:16PM +0200, Christian Brauner wrote: > > On Thu, Jun 06, 2019 at 06:30:35PM +0200, Pablo Neira Ayuso wrote: > > > On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote: > > > > On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote: > > > > > On Thu, 6 Jun 2019 13:41:41 +0200 > > > > > Christian Brauner <christian@brauner.io> wrote: > > > > > > > > > > > +struct netns_brnf { > > > > > > +#ifdef CONFIG_SYSCTL > > > > > > + struct ctl_table_header *ctl_hdr; > > > > > > +#endif > > > > > > + > > > > > > + /* default value is 1 */ > > > > > > + int call_iptables; > > > > > > + int call_ip6tables; > > > > > > + int call_arptables; > > > > > > + > > > > > > + /* default value is 0 */ > > > > > > + int filter_vlan_tagged; > > > > > > + int filter_pppoe_tagged; > > > > > > + int pass_vlan_indev; > > > > > > +}; > > > > > > > > > > Do you really need to waste four bytes for each > > > > > flag value. If you use a u8 that would work just as well. > > > > > > > > I think we had discussed something like this but the problem why we > > > > can't do this stems from how the sysctl-table stuff is implemented. > > > > I distinctly remember that it couldn't be done with a flag due to that. > > > > > > Could you define a pernet_operations object? I mean, define the id and size > > > fields, then pass it to register_pernet_subsys() for registration. > > > Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see > > > clusterip_net_ops and clusterip_pernet() for instance. > > > > Hm, I don't think that would work. The sysctls for br_netfilter are > > located in /proc/sys/net/bridge under /proc/sys/net which is tightly > > integrated with the sysctls infrastructure for all of net/ and all the > > folder underneath it including "core", "ipv4" and "ipv6". > > I don't think creating and managing files manually in /proc/sys/net is > > going to fly. It also doesn't seem very wise from a consistency and > > complexity pov. I'm also not sure if this would work at all wrt to file > > creation and reference counting if there are two different ways of > > managing them in the same subfolder... > > (clusterip creates files manually underneath /proc/net which probably is > > the reason why it gets away with it.) > > br_netfilter is now a module, and br_netfilter_hooks.c is part of it > IIRC, this file registers these sysctl entries from the module __init > path. > > It would be a matter of adding a new .init callback to the existing > brnf_net_ops object in br_netfilter_hooks.c. Then, call > register_net_sysctl() from this .init callback to register the sysctl > entries per netns. Actually, this is what you patch is doing... > There is already a brnf_net area that you can reuse for this purpose, > to place these pernetns flags... > > struct brnf_net { > bool enabled; > }; > > which is going to be glad to have more fields (under the #ifdef > CONFIG_SYSCTL) there. ... except that struct brnf_net is not used to store the ctl_table. So what I'm propose should be result in a small update to your patch 2/2.
On Fri, Jun 07, 2019 at 04:43:43PM +0200, Pablo Neira Ayuso wrote: > On Fri, Jun 07, 2019 at 04:28:58PM +0200, Pablo Neira Ayuso wrote: > > On Fri, Jun 07, 2019 at 03:25:16PM +0200, Christian Brauner wrote: > > > On Thu, Jun 06, 2019 at 06:30:35PM +0200, Pablo Neira Ayuso wrote: > > > > On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote: > > > > > On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote: > > > > > > On Thu, 6 Jun 2019 13:41:41 +0200 > > > > > > Christian Brauner <christian@brauner.io> wrote: > > > > > > > > > > > > > +struct netns_brnf { > > > > > > > +#ifdef CONFIG_SYSCTL > > > > > > > + struct ctl_table_header *ctl_hdr; > > > > > > > +#endif > > > > > > > + > > > > > > > + /* default value is 1 */ > > > > > > > + int call_iptables; > > > > > > > + int call_ip6tables; > > > > > > > + int call_arptables; > > > > > > > + > > > > > > > + /* default value is 0 */ > > > > > > > + int filter_vlan_tagged; > > > > > > > + int filter_pppoe_tagged; > > > > > > > + int pass_vlan_indev; > > > > > > > +}; > > > > > > > > > > > > Do you really need to waste four bytes for each > > > > > > flag value. If you use a u8 that would work just as well. > > > > > > > > > > I think we had discussed something like this but the problem why we > > > > > can't do this stems from how the sysctl-table stuff is implemented. > > > > > I distinctly remember that it couldn't be done with a flag due to that. > > > > > > > > Could you define a pernet_operations object? I mean, define the id and size > > > > fields, then pass it to register_pernet_subsys() for registration. > > > > Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see > > > > clusterip_net_ops and clusterip_pernet() for instance. > > > > > > Hm, I don't think that would work. The sysctls for br_netfilter are > > > located in /proc/sys/net/bridge under /proc/sys/net which is tightly > > > integrated with the sysctls infrastructure for all of net/ and all the > > > folder underneath it including "core", "ipv4" and "ipv6". > > > I don't think creating and managing files manually in /proc/sys/net is > > > going to fly. It also doesn't seem very wise from a consistency and > > > complexity pov. I'm also not sure if this would work at all wrt to file > > > creation and reference counting if there are two different ways of > > > managing them in the same subfolder... > > > (clusterip creates files manually underneath /proc/net which probably is > > > the reason why it gets away with it.) > > > > br_netfilter is now a module, and br_netfilter_hooks.c is part of it > > IIRC, this file registers these sysctl entries from the module __init > > path. > > > > It would be a matter of adding a new .init callback to the existing > > brnf_net_ops object in br_netfilter_hooks.c. Then, call > > register_net_sysctl() from this .init callback to register the sysctl > > entries per netns. > > Actually, this is what you patch is doing... > > > There is already a brnf_net area that you can reuse for this purpose, > > to place these pernetns flags... > > > > struct brnf_net { > > bool enabled; > > }; > > > > which is going to be glad to have more fields (under the #ifdef > > CONFIG_SYSCTL) there. > > ... except that struct brnf_net is not used to store the ctl_table. > > So what I'm propose should be result in a small update to your patch 2/2. Actually not, I think. I had to rework it substantially but I think the outcome is quite nice. :) I'll send a new version now/today. :) Thanks! Christian
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 12689ddfc24c..a958d09dc14d 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -127,6 +127,9 @@ struct net { #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) struct netns_ct ct; #endif +#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) + struct netns_brnf brnf; +#endif #if defined(CONFIG_NF_TABLES) || defined(CONFIG_NF_TABLES_MODULE) struct netns_nftables nft; #endif diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h index ca043342c0eb..eedbd1ac940e 100644 --- a/include/net/netns/netfilter.h +++ b/include/net/netns/netfilter.h @@ -35,4 +35,20 @@ struct netns_nf { bool defrag_ipv6; #endif }; + +struct netns_brnf { +#ifdef CONFIG_SYSCTL + struct ctl_table_header *ctl_hdr; +#endif + + /* default value is 1 */ + int call_iptables; + int call_ip6tables; + int call_arptables; + + /* default value is 0 */ + int filter_vlan_tagged; + int filter_pppoe_tagged; + int pass_vlan_indev; +}; #endif diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 34fa72c72ad8..b51c6b49fc6f 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -49,23 +49,6 @@ struct brnf_net { bool enabled; }; -#ifdef CONFIG_SYSCTL -static struct ctl_table_header *brnf_sysctl_header; -static int brnf_call_iptables __read_mostly = 1; -static int brnf_call_ip6tables __read_mostly = 1; -static int brnf_call_arptables __read_mostly = 1; -static int brnf_filter_vlan_tagged __read_mostly; -static int brnf_filter_pppoe_tagged __read_mostly; -static int brnf_pass_vlan_indev __read_mostly; -#else -#define brnf_call_iptables 1 -#define brnf_call_ip6tables 1 -#define brnf_call_arptables 1 -#define brnf_filter_vlan_tagged 0 -#define brnf_filter_pppoe_tagged 0 -#define brnf_pass_vlan_indev 0 -#endif - #define IS_IP(skb) \ (!skb_vlan_tag_present(skb) && skb->protocol == htons(ETH_P_IP)) @@ -87,15 +70,15 @@ static inline __be16 vlan_proto(const struct sk_buff *skb) #define IS_VLAN_IP(skb) \ (vlan_proto(skb) == htons(ETH_P_IP) && \ - brnf_filter_vlan_tagged) + init_net.brnf.filter_vlan_tagged) #define IS_VLAN_IPV6(skb) \ (vlan_proto(skb) == htons(ETH_P_IPV6) && \ - brnf_filter_vlan_tagged) + init_net.brnf.filter_vlan_tagged) #define IS_VLAN_ARP(skb) \ (vlan_proto(skb) == htons(ETH_P_ARP) && \ - brnf_filter_vlan_tagged) + init_net.brnf.filter_vlan_tagged) static inline __be16 pppoe_proto(const struct sk_buff *skb) { @@ -106,12 +89,12 @@ static inline __be16 pppoe_proto(const struct sk_buff *skb) #define IS_PPPOE_IP(skb) \ (skb->protocol == htons(ETH_P_PPP_SES) && \ pppoe_proto(skb) == htons(PPP_IP) && \ - brnf_filter_pppoe_tagged) + init_net.brnf.filter_pppoe_tagged) #define IS_PPPOE_IPV6(skb) \ (skb->protocol == htons(ETH_P_PPP_SES) && \ pppoe_proto(skb) == htons(PPP_IPV6) && \ - brnf_filter_pppoe_tagged) + init_net.brnf.filter_pppoe_tagged) /* largest possible L2 header, see br_nf_dev_queue_xmit() */ #define NF_BRIDGE_MAX_MAC_HEADER_LENGTH (PPPOE_SES_HLEN + ETH_HLEN) @@ -413,7 +396,7 @@ static struct net_device *brnf_get_logical_dev(struct sk_buff *skb, const struct struct net_device *vlan, *br; br = bridge_parent(dev); - if (brnf_pass_vlan_indev == 0 || !skb_vlan_tag_present(skb)) + if (init_net.brnf.pass_vlan_indev == 0 || !skb_vlan_tag_present(skb)) return br; vlan = __vlan_find_dev_deep_rcu(br, skb->vlan_proto, @@ -470,7 +453,7 @@ static unsigned int br_nf_pre_routing(void *priv, br = p->br; if (IS_IPV6(skb) || IS_VLAN_IPV6(skb) || IS_PPPOE_IPV6(skb)) { - if (!brnf_call_ip6tables && + if (!init_net.brnf.call_ip6tables && !br_opt_get(br, BROPT_NF_CALL_IP6TABLES)) return NF_ACCEPT; @@ -478,7 +461,8 @@ static unsigned int br_nf_pre_routing(void *priv, return br_nf_pre_routing_ipv6(priv, skb, state); } - if (!brnf_call_iptables && !br_opt_get(br, BROPT_NF_CALL_IPTABLES)) + if (!init_net.brnf.call_iptables && + !br_opt_get(br, BROPT_NF_CALL_IPTABLES)) return NF_ACCEPT; if (!IS_IP(skb) && !IS_VLAN_IP(skb) && !IS_PPPOE_IP(skb)) @@ -621,7 +605,8 @@ static unsigned int br_nf_forward_arp(void *priv, return NF_ACCEPT; br = p->br; - if (!brnf_call_arptables && !br_opt_get(br, BROPT_NF_CALL_ARPTABLES)) + if (!init_net.brnf.call_arptables && + !br_opt_get(br, BROPT_NF_CALL_ARPTABLES)) return NF_ACCEPT; if (!IS_ARP(skb)) { @@ -1021,42 +1006,42 @@ int brnf_sysctl_call_tables(struct ctl_table *ctl, int write, static struct ctl_table brnf_table[] = { { .procname = "bridge-nf-call-arptables", - .data = &brnf_call_arptables, + .data = &init_net.brnf.call_arptables, .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-call-iptables", - .data = &brnf_call_iptables, + .data = &init_net.brnf.call_iptables, .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-call-ip6tables", - .data = &brnf_call_ip6tables, + .data = &init_net.brnf.call_ip6tables, .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-filter-vlan-tagged", - .data = &brnf_filter_vlan_tagged, + .data = &init_net.brnf.filter_vlan_tagged, .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-filter-pppoe-tagged", - .data = &brnf_filter_pppoe_tagged, + .data = &init_net.brnf.filter_pppoe_tagged, .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, }, { .procname = "bridge-nf-pass-vlan-input-dev", - .data = &brnf_pass_vlan_indev, + .data = &init_net.brnf.pass_vlan_indev, .maxlen = sizeof(int), .mode = 0644, .proc_handler = brnf_sysctl_call_tables, @@ -1065,6 +1050,16 @@ static struct ctl_table brnf_table[] = { }; #endif +static inline void br_netfilter_sysctl_default(struct netns_brnf *brnf) +{ + brnf->call_iptables = 1; + brnf->call_ip6tables = 1; + brnf->call_arptables = 1; + brnf->filter_vlan_tagged = 0; + brnf->filter_pppoe_tagged = 0; + brnf->pass_vlan_indev = 0; +} + static int __init br_netfilter_init(void) { int ret; @@ -1079,9 +1074,12 @@ static int __init br_netfilter_init(void) return ret; } + /* Always set default values. Even if CONFIG_SYSCTL is not set. */ + br_netfilter_sysctl_default(&init_net.brnf); + #ifdef CONFIG_SYSCTL - brnf_sysctl_header = register_net_sysctl(&init_net, "net/bridge", brnf_table); - if (brnf_sysctl_header == NULL) { + init_net.brnf.ctl_hdr = register_net_sysctl(&init_net, "net/bridge", brnf_table); + if (!init_net.brnf.ctl_hdr) { printk(KERN_WARNING "br_netfilter: can't register to sysctl.\n"); unregister_netdevice_notifier(&brnf_notifier); @@ -1100,7 +1098,7 @@ static void __exit br_netfilter_fini(void) unregister_netdevice_notifier(&brnf_notifier); unregister_pernet_subsys(&brnf_net_ops); #ifdef CONFIG_SYSCTL - unregister_net_sysctl_table(brnf_sysctl_header); + unregister_net_sysctl_table(init_net.brnf.ctl_hdr); #endif }