Message ID | 20200603212516.22414-1-dwilder@us.ibm.com |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
Series | [(RFC)] NULL pointer dereference on rmmod iptable_mangle. | expand |
David Wilder <dwilder@us.ibm.com> wrote: > This crash happened on a ppc64le system running ltp network tests when ltp script ran "rmmod iptable_mangle". > > [213425.602369] BUG: Kernel NULL pointer dereference at 0x00000010 > [213425.602388] Faulting instruction address: 0xc008000000550bdc [..] > In the crash we find in iptable_mangle_hook() that state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference. net->ipv4.iptable_mangle is set to NULL in iptable_mangle_net_exit() and called when ip_mangle modules is unloaded. A rmmod task was found in the crash dump. A 2nd crash showed the same problem when running "rmmod iptable_filter" (net->ipv4.iptable_filter=NULL). > > Once a hook is registered packets will picked up a pointer from: net->ipv4.iptable_$table. The patch adds a call to synchronize_net() in ipt_unregister_table() to insure no packets are in flight that have picked up the pointer before completing the un-register. > > This change has has prevented the problem in our testing. However, we have concerns with this change as it would mean that on netns cleanup, we would need one synchronize_net() call for every table in use. Also, on module unload, there would be one synchronize_net() for every existing netns. Yes, I agree with the analysis. > Signed-off-by: David Wilder <dwilder@us.ibm.com> > --- > net/ipv4/netfilter/ip_tables.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c > index c2670ea..97c4121 100644 > --- a/net/ipv4/netfilter/ip_tables.c > +++ b/net/ipv4/netfilter/ip_tables.c > @@ -1800,8 +1800,10 @@ int ipt_register_table(struct net *net, const struct xt_table *table, > void ipt_unregister_table(struct net *net, struct xt_table *table, > const struct nf_hook_ops *ops) > { > - if (ops) > + if (ops) { > nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks)); > + synchronize_net(); > + } I'd wager ebtables, arptables and ip6tables have the same bug. The extra synchronize_net() isn't ideal. We could probably do it this way and then improve in a second patch. One way to fix this without a new synchronize_net() is to switch all iptable_foo.c to use ".pre_exit" hook as well. pre_exit would unregister the underlying hook and .exit would to the table freeing. Since the netns core already does an unconditional synchronize_rcu after the pre_exit hooks this would avoid the problem as well.
On 2020-06-03 15:05, Florian Westphal wrote: > David Wilder <dwilder@us.ibm.com> wrote: >> This crash happened on a ppc64le system running ltp network tests when >> ltp script ran "rmmod iptable_mangle". >> >> [213425.602369] BUG: Kernel NULL pointer dereference at 0x00000010 >> [213425.602388] Faulting instruction address: 0xc008000000550bdc > [..] > >> In the crash we find in iptable_mangle_hook() that >> state->net->ipv4.iptable_mangle=NULL causing a NULL pointer >> dereference. net->ipv4.iptable_mangle is set to NULL in >> iptable_mangle_net_exit() and called when ip_mangle modules is >> unloaded. A rmmod task was found in the crash dump. A 2nd crash >> showed the same problem when running "rmmod iptable_filter" >> (net->ipv4.iptable_filter=NULL). >> >> Once a hook is registered packets will picked up a pointer from: >> net->ipv4.iptable_$table. The patch adds a call to synchronize_net() >> in ipt_unregister_table() to insure no packets are in flight that have >> picked up the pointer before completing the un-register. >> >> This change has has prevented the problem in our testing. However, we >> have concerns with this change as it would mean that on netns cleanup, >> we would need one synchronize_net() call for every table in use. Also, >> on module unload, there would be one synchronize_net() for every >> existing netns. > > Yes, I agree with the analysis. > >> Signed-off-by: David Wilder <dwilder@us.ibm.com> >> --- >> net/ipv4/netfilter/ip_tables.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/net/ipv4/netfilter/ip_tables.c >> b/net/ipv4/netfilter/ip_tables.c >> index c2670ea..97c4121 100644 >> --- a/net/ipv4/netfilter/ip_tables.c >> +++ b/net/ipv4/netfilter/ip_tables.c >> @@ -1800,8 +1800,10 @@ int ipt_register_table(struct net *net, const >> struct xt_table *table, >> void ipt_unregister_table(struct net *net, struct xt_table *table, >> const struct nf_hook_ops *ops) >> { >> - if (ops) >> + if (ops) { >> nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks)); >> + synchronize_net(); >> + } > > I'd wager ebtables, arptables and ip6tables have the same bug. > > The extra synchronize_net() isn't ideal. We could probably do it this > way and then improve in a second patch. > > One way to fix this without a new synchronize_net() is to switch all > iptable_foo.c to use ".pre_exit" hook as well. > > pre_exit would unregister the underlying hook and .exit would to the > table freeing. > > Since the netns core already does an unconditional synchronize_rcu > after > the pre_exit hooks this would avoid the problem as well. Something like this? (un-tested) diff --git a/net/ipv4/netfilter/iptable_mangle.c b/net/ipv4/netfilter/iptable_mangle.c index bb9266ea3785..0d448e4d5213 100644 --- a/net/ipv4/netfilter/iptable_mangle.c +++ b/net/ipv4/netfilter/iptable_mangle.c @@ -100,15 +100,26 @@ static int __net_init iptable_mangle_table_init(struct net *net) return ret; } +static void __net_exit iptable_mangle_net_pre_exit(struct net *net) +{ + struct xt_table *table = net->ipv4.iptable_mangle; + + if (mangle_ops) + nf_unregister_net_hooks(net, mangle_ops, + hweight32(table->valid_hooks)); +} + + static void __net_exit iptable_mangle_net_exit(struct net *net) { if (!net->ipv4.iptable_mangle) return; - ipt_unregister_table(net, net->ipv4.iptable_mangle, mangle_ops); + ipt_unregister_table(net, net->ipv4.iptable_mangle, NULL); net->ipv4.iptable_mangle = NULL; } static struct pernet_operations iptable_mangle_net_ops = { + .pre_exit = iptable_mangle_net_pre_exit, .exit = iptable_mangle_net_exit, };
dwilder <dwilder@us.ibm.com> wrote: > > Since the netns core already does an unconditional synchronize_rcu after > > the pre_exit hooks this would avoid the problem as well. > > Something like this? (un-tested) Yes. > diff --git a/net/ipv4/netfilter/iptable_mangle.c > b/net/ipv4/netfilter/iptable_mangle.c > index bb9266ea3785..0d448e4d5213 100644 > --- a/net/ipv4/netfilter/iptable_mangle.c > +++ b/net/ipv4/netfilter/iptable_mangle.c > @@ -100,15 +100,26 @@ static int __net_init iptable_mangle_table_init(struct > net *net) > return ret; > } > > +static void __net_exit iptable_mangle_net_pre_exit(struct net *net) > +{ > + struct xt_table *table = net->ipv4.iptable_mangle; > + > + if (mangle_ops) > + nf_unregister_net_hooks(net, mangle_ops, > + hweight32(table->valid_hooks)); > +} You probably need if (table) here, not mangle_ops. I'm not sure if it makes sense to add a new xt_unregister_table_hook() helper, I guess one would have to try and see if that reduces copy&paste programming or not. > static void __net_exit iptable_mangle_net_exit(struct net *net) > { > if (!net->ipv4.iptable_mangle) > return; > - ipt_unregister_table(net, net->ipv4.iptable_mangle, mangle_ops); > + ipt_unregister_table(net, net->ipv4.iptable_mangle, NULL); I guess the 3rd arg could be removed from the helper. But yes, this looks like what I had in mind.
Florian Westphal <fw@strlen.de> wrote: > dwilder <dwilder@us.ibm.com> wrote: > > > Since the netns core already does an unconditional synchronize_rcu after > > > the pre_exit hooks this would avoid the problem as well. > > > > Something like this? (un-tested) > > Yes. > > > diff --git a/net/ipv4/netfilter/iptable_mangle.c > > b/net/ipv4/netfilter/iptable_mangle.c > > index bb9266ea3785..0d448e4d5213 100644 > > --- a/net/ipv4/netfilter/iptable_mangle.c > > +++ b/net/ipv4/netfilter/iptable_mangle.c > > @@ -100,15 +100,26 @@ static int __net_init iptable_mangle_table_init(struct > > net *net) > > return ret; > > } > > > > +static void __net_exit iptable_mangle_net_pre_exit(struct net *net) > > +{ > > + struct xt_table *table = net->ipv4.iptable_mangle; > > + > > + if (mangle_ops) > > + nf_unregister_net_hooks(net, mangle_ops, > > + hweight32(table->valid_hooks)); > > +} > > You probably need if (table) here, not mangle_ops. > I'm not sure if it makes sense to add a new > > xt_unregister_table_hook() helper, I guess one would have to try > and see if that reduces copy&paste programming or not. > > > static void __net_exit iptable_mangle_net_exit(struct net *net) > > { > > if (!net->ipv4.iptable_mangle) > > return; > > - ipt_unregister_table(net, net->ipv4.iptable_mangle, mangle_ops); > > + ipt_unregister_table(net, net->ipv4.iptable_mangle, NULL); > > I guess the 3rd arg could be removed from the helper. > > But yes, this looks like what I had in mind. Will there be a followup? Otherwise I will place this on my todo-list. Thanks.
On 2020-06-15 04:44, Florian Westphal wrote: > Florian Westphal <fw@strlen.de> wrote: >> dwilder <dwilder@us.ibm.com> wrote: >> > > Since the netns core already does an unconditional synchronize_rcu after >> > > the pre_exit hooks this would avoid the problem as well. >> > >> > Something like this? (un-tested) >> >> Yes. >> >> > diff --git a/net/ipv4/netfilter/iptable_mangle.c >> > b/net/ipv4/netfilter/iptable_mangle.c >> > index bb9266ea3785..0d448e4d5213 100644 >> > --- a/net/ipv4/netfilter/iptable_mangle.c >> > +++ b/net/ipv4/netfilter/iptable_mangle.c >> > @@ -100,15 +100,26 @@ static int __net_init iptable_mangle_table_init(struct >> > net *net) >> > return ret; >> > } >> > >> > +static void __net_exit iptable_mangle_net_pre_exit(struct net *net) >> > +{ >> > + struct xt_table *table = net->ipv4.iptable_mangle; >> > + >> > + if (mangle_ops) >> > + nf_unregister_net_hooks(net, mangle_ops, >> > + hweight32(table->valid_hooks)); >> > +} >> >> You probably need if (table) here, not mangle_ops. >> I'm not sure if it makes sense to add a new >> >> xt_unregister_table_hook() helper, I guess one would have to try >> and see if that reduces copy&paste programming or not. >> >> > static void __net_exit iptable_mangle_net_exit(struct net *net) >> > { >> > if (!net->ipv4.iptable_mangle) >> > return; >> > - ipt_unregister_table(net, net->ipv4.iptable_mangle, mangle_ops); >> > + ipt_unregister_table(net, net->ipv4.iptable_mangle, NULL); >> >> I guess the 3rd arg could be removed from the helper. >> >> But yes, this looks like what I had in mind. > > Will there be a followup? Otherwise I will place this on my todo-list. > > Thanks. Hi Florian I am working on a patch. Will post soon. Sorry for the delay.
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index c2670ea..97c4121 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -1800,8 +1800,10 @@ int ipt_register_table(struct net *net, const struct xt_table *table, void ipt_unregister_table(struct net *net, struct xt_table *table, const struct nf_hook_ops *ops) { - if (ops) + if (ops) { nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks)); + synchronize_net(); + } __ipt_unregister_table(net, table); }
This crash happened on a ppc64le system running ltp network tests when ltp script ran "rmmod iptable_mangle". [213425.602369] BUG: Kernel NULL pointer dereference at 0x00000010 [213425.602388] Faulting instruction address: 0xc008000000550bdc [213425.602399] Oops: Kernel access of bad area, sig: 11 [#1] [213425.602409] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [213425.602418] Modules linked in: nf_log_ipv4 nf_log_common iptable_mangle(-) iptable_nat nf_nat nf_conntrack iptable_filter ip_tables xt_limit xt_multiport xt_LOG xt_tcpudp nf_defrag_ipv6 nf_defrag_ipv4 x_tables sch_netem tcp_bbr rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver rds dummy sctp crypto_user veth uhid kvm_pr kvm vfio_iommu_spapr_tce vfio_spapr_eeh vfio hci_vhci bluetooth ecdh_generic ecc vhost_net tap vhost_vsock vmw_vsock_virtio_transport_common vhost vsock uinput n_gsm pps_ldisc ppp_synctty ppp_async ppp_generic slip slhc serport brd tun fuse vfat fat xfs ext4 crc16 mbcache jbd2 mlx5_ib ib_uverbs ib_core mlx5_core mlxfw tls loop be2net ibmveth(XX) st sr_mod cdrom lp parport_pc parport nvram xfrm_user joydev binfmt_misc rpadlpar_io(XX) rpaphp(XX) xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nfsv3 nfs_acl nfs lockd grace sunrpc fscache af_packet rfkill vmx_crypto gf128mul ibmvnic uio_pdrv_genirq crct10dif_vpmsum uio rtc_generic btrfs [213425.602577] libcrc32c xor raid6_pq dm_service_time sd_mod ibmvfc(XX) scsi_transport_fc crc32c_vpmsum dm_mirror dm_region_hash dm_log sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod [last unloaded: ipt_REJECT] [213425.602659] Supported: No, Unreleased kernel [213425.602671] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G X 5.3.18-14-default #1 SLE15-SP2 (unreleased) [213425.602682] NIP: c008000000550bdc LR: c008000001de00c8 CTR: c008000000550b48 [213425.602692] REGS: c000000002973250 TRAP: 0380 Tainted: G X (5.3.18-14-default) [213425.602701] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 88082822 XER: 00000001 [213425.602726] CFAR: c008000000551050 IRQMASK: 0 GPR00: c008000001de00c8 c0000000029734e0 c00800000055d800 c00000087b7c3600 GPR04: c000000002973768 0000000000000000 0000000000000000 c0000007ab050800 GPR08: 000000000000000e c0000007ab050814 c000000001558380 c008000001de04e0 GPR12: c008000000550b48 c0000000021e0000 c00000000016b358 0000000000000100 GPR16: 000000000000008e 00000000000000a0 0000000000000000 0000000000000005 GPR20: 0000000000000000 c000000001168fa8 0000000000000000 c0000007ac4b46d4 GPR24: c000000002973768 c008000000555f80 0000000000000001 c0000000011ee000 GPR28: c0000000011ee000 c00000087b7c3600 c0000007ab05080e c000000002973768 [213425.602816] NIP [c008000000550bdc] ipt_do_table+0x94/0x980 [ip_tables] [213425.602827] LR [c008000001de00c8] iptable_mangle_hook+0x50/0x180 [iptable_mangle] [213425.602835] Call Trace: [213425.602843] [c0000000029734e0] [c000000002973570] 0xc000000002973570 (unreliable) [213425.602856] [c000000002973690] [c008000001de00c8] iptable_mangle_hook+0x50/0x180 [iptable_mangle] [213425.602871] [c0000000029736f0] [c000000000a82b60] nf_hook_slow+0x70/0x140 [213425.602882] [c000000002973740] [c000000000a90cdc] ip_rcv+0xac/0x120 [213425.602894] [c0000000029737c0] [c0000000009d978c] __netif_receive_skb_core+0x42c/0x1160 [213425.602906] [c0000000029738a0] [c0000000009dab80] __netif_receive_skb_list_core+0x130/0x330 [213425.602919] [c000000002973940] [c0000000009dafa4] netif_receive_skb_list_internal+0x224/0x350 [213425.602932] [c0000000029739c0] [c0000000009db2b4] gro_normal_list.part.109+0x34/0x60 [213425.602943] [c0000000029739f0] [c0000000009dc0c8] napi_gro_receive+0x1b8/0x200 [213425.602957] [c000000002973a30] [c008000000e32368] ibmvnic_poll+0x2d0/0x410 [ibmvnic] [213425.602969] [c000000002973b10] [c0000000009dcebc] net_rx_action+0x1ec/0x540 [213425.602982] [c000000002973c30] [c000000000c1ff68] __do_softirq+0x178/0x424 [213425.602994] [c000000002973d20] [c00000000013c924] run_ksoftirqd+0x64/0x90 [213425.603006] [c000000002973d40] [c0000000001717c0] smpboot_thread_fn+0x270/0x2c0 [213425.603018] [c000000002973db0] [c00000000016b4fc] kthread+0x1ac/0x1c0 [213425.603029] [c000000002973e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c [213425.603038] Instruction dump: [213425.603046] e8e300c0 82c40000 e92d1178 f9210118 39200000 2fbc0000 7fc74214 419e046c [213425.603067] eb380010 2fb90000 419e0474 393e0006 <80850010> 38c00000 7d404e2c 39200001 [213425.603089] ---[ end trace f2babb2170f723cc ]--- [213425.690517] In the crash we find in iptable_mangle_hook() that state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference. net->ipv4.iptable_mangle is set to NULL in iptable_mangle_net_exit() and called when ip_mangle modules is unloaded. A rmmod task was found in the crash dump. A 2nd crash showed the same problem when running "rmmod iptable_filter" (net->ipv4.iptable_filter=NULL). Once a hook is registered packets will picked up a pointer from: net->ipv4.iptable_$table. The patch adds a call to synchronize_net() in ipt_unregister_table() to insure no packets are in flight that have picked up the pointer before completing the un-register. This change has has prevented the problem in our testing. However, we have concerns with this change as it would mean that on netns cleanup, we would need one synchronize_net() call for every table in use. Also, on module unload, there would be one synchronize_net() for every existing netns. Signed-off-by: David Wilder <dwilder@us.ibm.com> --- net/ipv4/netfilter/ip_tables.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)