Message ID | 20161214154729.g4yg4mxcgkvpe5w7@codemonkey.org.uk |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Dave Jones <davej@codemonkey.org.uk> Date: Wed, 14 Dec 2016 10:47:29 -0500 > It seems to be possible to craft a packet for sendmsg that triggers > the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like: > > RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40 > RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282 > RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002 > RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00 > RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009 > R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030 > R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80 > > Call Trace: > [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830 > [<ffffffff81772697>] inet_sendmsg+0x67/0xa0 > [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50 > [<ffffffff816d982f>] SYSC_sendto+0xef/0x170 > [<ffffffff816da27e>] SyS_sendto+0xe/0x10 > [<ffffffff81002910>] do_syscall_64+0x50/0xa0 > [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25 > > Handle this in rawv6_push_pending_frames and jump to the failure path. > > Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Hmmm, that's interesting. Becaue the code in __ip6_append_data(), which sets up the ->cork.base.length value, seems to be defensively trying to avoid this possibility. For example, it checks things like: if (cork->length + length > mtu - headersize && ipc6->dontfrag && (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_RAW)) { This is why the transport offset plus the length should never exceed the total length for that skb_copy_bits() call. Perhaps this protocol check in the code above is incomplete? Do you know what the sk->sk_protocol value was when that BUG triggered? That might shine some light on what is really happening here. Thanks.
On Sat, Dec 17, 2016 at 10:41:20AM -0500, David Miller wrote: > From: Dave Jones <davej@codemonkey.org.uk> > Date: Wed, 14 Dec 2016 10:47:29 -0500 > > > It seems to be possible to craft a packet for sendmsg that triggers > > the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like: > > > > RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40 > > RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282 > > RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002 > > RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00 > > RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009 > > R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030 > > R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80 > > > > Call Trace: > > [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830 > > [<ffffffff81772697>] inet_sendmsg+0x67/0xa0 > > [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50 > > [<ffffffff816d982f>] SYSC_sendto+0xef/0x170 > > [<ffffffff816da27e>] SyS_sendto+0xe/0x10 > > [<ffffffff81002910>] do_syscall_64+0x50/0xa0 > > [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25 > > > > Handle this in rawv6_push_pending_frames and jump to the failure path. > > > > Signed-off-by: Dave Jones <davej@codemonkey.org.uk> > > Hmmm, that's interesting. Becaue the code in __ip6_append_data(), which > sets up the ->cork.base.length value, seems to be defensively trying to > avoid this possibility. > > For example, it checks things like: > > if (cork->length + length > mtu - headersize && ipc6->dontfrag && > (sk->sk_protocol == IPPROTO_UDP || > sk->sk_protocol == IPPROTO_RAW)) { > > This is why the transport offset plus the length should never exceed > the total length for that skb_copy_bits() call. > > Perhaps this protocol check in the code above is incomplete? Do you > know what the sk->sk_protocol value was when that BUG triggered? That > might shine some light on what is really happening here. I'll see if I can craft up a reproducer next week. For some reason I've not hit this on my test setup at home, but it reproduces daily in our test setup at facebook. The only thing I can think of is that those fb boxes are ipv6 only, so I might be exercising v4 more at home. Dave
On Sat, Dec 17, 2016 at 10:41:20AM -0500, David Miller wrote: > > It seems to be possible to craft a packet for sendmsg that triggers > > the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like: > > > > RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40 > > RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282 > > RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002 > > RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00 > > RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009 > > R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030 > > R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80 > > > > Call Trace: > > [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830 > > [<ffffffff81772697>] inet_sendmsg+0x67/0xa0 > > [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50 > > [<ffffffff816d982f>] SYSC_sendto+0xef/0x170 > > [<ffffffff816da27e>] SyS_sendto+0xe/0x10 > > [<ffffffff81002910>] do_syscall_64+0x50/0xa0 > > [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25 > > > > Handle this in rawv6_push_pending_frames and jump to the failure path. > > > > Signed-off-by: Dave Jones <davej@codemonkey.org.uk> > > Hmmm, that's interesting. Becaue the code in __ip6_append_data(), which > sets up the ->cork.base.length value, seems to be defensively trying to > avoid this possibility. > > For example, it checks things like: > > if (cork->length + length > mtu - headersize && ipc6->dontfrag && > (sk->sk_protocol == IPPROTO_UDP || > sk->sk_protocol == IPPROTO_RAW)) { > > This is why the transport offset plus the length should never exceed > the total length for that skb_copy_bits() call. > > Perhaps this protocol check in the code above is incomplete? Do you > know what the sk->sk_protocol value was when that BUG triggered? That > might shine some light on what is really happening here. Hm. sk_protocol = 7, struct sock { __sk_common = { { skc_addrpair = 0, { skc_daddr = 0, skc_rcv_saddr = 0 } }, { skc_hash = 0, skc_u16hashes = {0, 0} }, { skc_portpair = 458752, { skc_dport = 0, skc_num = 7 } }, skc_family = 10, skc_state = 7 '\a', skc_reuse = 1 '\001', skc_reuseport = 0 '\000', skc_ipv6only = 0 '\000', skc_net_refcnt = 1 '\001', skc_bound_dev_if = 0, { skc_bind_node = { next = 0x0, pprev = 0x0 }, skc_portaddr_node = { next = 0x0, pprev = 0x0 } }, skc_prot = 0xffffffff81cf3bc0 <rawv6_prot>, skc_net = { net = 0xffffffff81ce78c0 <init_net> }, skc_v6_daddr = { in6_u = { u6_addr8 = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, u6_addr32 = {0, 0, 0, 0} } }, }, skc_v6_rcv_saddr = { in6_u = { u6_addr8 = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, u6_addr32 = {0, 0, 0, 0} } }, skc_cookie = { counter = 0 }, { skc_flags = 256, skc_listener = 0x100, skc_tw_dr = 0x100 }, skc_dontcopy_begin = 0xffff881fd1ce9b68, { skc_node = { next = 0x0, pprev = 0x0 }, skc_nulls_node = { next = 0x0, pprev = 0x0 } }, skc_tx_queue_mapping = -1, { skc_incoming_cpu = -1, skc_rcv_wnd = 4294967295, skc_tw_rcv_nxt = 4294967295 }, skc_refcnt = { counter = 1 }, skc_dontcopy_end = 0xffff881fd1ce9b84, { skc_rxhash = 0, skc_window_clamp = 0, skc_tw_snd_nxt = 0 } }, sk_lock = { slock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, owned = 1, wq = { lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } }, task_list = { next = 0xffff881fd1ce9b98, prev = 0xffff881fd1ce9b98 } } }, sk_receive_queue = { next = 0xffff881fd1ce9ba8, prev = 0xffff881fd1ce9ba8, qlen = 0, lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } } }, sk_backlog = { rmem_alloc = { counter = 0 }, len = 0, head = 0x0, tail = 0x0 }, sk_forward_alloc = 0, sk_txhash = 0, sk_napi_id = 0, sk_ll_usec = 0, sk_drops = { counter = 0 }, sk_rcvbuf = 1024000, sk_filter = 0x0, { sk_wq = 0xffff881fc4b3ab00, sk_wq_raw = 0xffff881fc4b3ab00 }, sk_policy = {0x0, 0x0}, sk_rx_dst = 0x0, sk_dst_cache = 0x0, sk_wmem_alloc = { counter = 769 }, sk_omem_alloc = { counter = 640 }, sk_sndbuf = 212992, sk_write_queue = { next = 0xffff881f6fc60b00, prev = 0xffff881f6fc60b00, qlen = 1, lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } } }, sk_shutdown = 0, sk_no_check_tx = 0, sk_no_check_rx = 0, sk_userlocks = 0, sk_protocol = 7, sk_type = 3, sk_wmem_queued = 0, sk_allocation = 37748928, sk_pacing_rate = 4294967295, sk_max_pacing_rate = 4294967295, sk_route_caps = 0, sk_route_nocaps = 0, sk_gso_type = 0, sk_gso_max_size = 0, sk_gso_max_segs = 0, sk_rcvlowat = 1, sk_lingertime = 0, sk_error_queue = { next = 0xffff881fd1ce9c88, prev = 0xffff881fd1ce9c88, qlen = 0, lock = { { rlock = { raw_lock = { val = { counter = 0 } } } } } }, sk_prot_creator = 0xffffffff81cf3bc0 <rawv6_prot>, sk_callback_lock = { raw_lock = { cnts = { counter = 0 }, wait_lock = { val = { counter = 0 } } } }, sk_err = 0, sk_err_soft = 0, sk_ack_backlog = 0, sk_max_ack_backlog = 0, sk_priority = 0, sk_mark = 0, sk_peer_pid = 0x0, sk_peer_cred = 0x0, sk_rcvtimeo = 9223372036854775807, sk_sndtimeo = 9223372036854775807, sk_timer = { entry = { next = 0x0, pprev = 0x0 }, expires = 0, function = 0x0, data = 0, flags = 4, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, sk_stamp = { tv64 = 1481510503238183349 }, sk_tsflags = 0, sk_tskey = 0, sk_socket = 0xffff881fd0d24b00, sk_user_data = 0x0, sk_frag = { page = 0x0, offset = 0, size = 0 }, sk_send_head = 0x0, sk_peek_off = -1, sk_write_pending = 0, sk_security = 0x0, sk_cgrp_data = { { { is_data = 48 '0', padding = 109 'm', prioidx = 33292, classid = 4294967295 }, val = 18446744071596436784 } }, sk_memcg = 0x0, sk_state_change = 0xffffffff816dbed0 <sock_def_wakeup>, sk_data_ready = 0xffffffff816dcd20 <sock_def_readable>, sk_write_space = 0xffffffff816dcc90 <sock_def_write_space>, sk_error_report = 0xffffffff816dcc30 <sock_def_error_report>, sk_backlog_rcv = 0xffffffff817c5690 <rawv6_rcv_skb>, sk_destruct = 0xffffffff81772460 <inet_sock_destruct>, sk_reuseport_cb = 0x0 }
From: Dave Jones <davej@codemonkey.org.uk> Date: Mon, 19 Dec 2016 12:03:20 -0500 > On Sat, Dec 17, 2016 at 10:41:20AM -0500, David Miller wrote: > > > > It seems to be possible to craft a packet for sendmsg that triggers > > > the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like: > > > > > > RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40 > > > RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282 > > > RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002 > > > RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00 > > > RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009 > > > R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030 > > > R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80 > > > > > > Call Trace: > > > [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830 > > > [<ffffffff81772697>] inet_sendmsg+0x67/0xa0 > > > [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50 > > > [<ffffffff816d982f>] SYSC_sendto+0xef/0x170 > > > [<ffffffff816da27e>] SyS_sendto+0xe/0x10 > > > [<ffffffff81002910>] do_syscall_64+0x50/0xa0 > > > [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25 > > > > > > Handle this in rawv6_push_pending_frames and jump to the failure path. > > > > > > Signed-off-by: Dave Jones <davej@codemonkey.org.uk> > > > > Hmmm, that's interesting. Becaue the code in __ip6_append_data(), which > > sets up the ->cork.base.length value, seems to be defensively trying to > > avoid this possibility. > > > > For example, it checks things like: > > > > if (cork->length + length > mtu - headersize && ipc6->dontfrag && > > (sk->sk_protocol == IPPROTO_UDP || > > sk->sk_protocol == IPPROTO_RAW)) { > > > > This is why the transport offset plus the length should never exceed > > the total length for that skb_copy_bits() call. > > > > Perhaps this protocol check in the code above is incomplete? Do you > > know what the sk->sk_protocol value was when that BUG triggered? That > > might shine some light on what is really happening here. > > Hm. > sk_protocol = 7, So, some arbitrary value. Obviously, the test above intends to handle RAW sockets and that is exactly what we have here. A raw socket gets whatever the user specified as 'protocol' at create time as the sk->sk_protocol value. One thing that's interesting is that if the user picks "IPPROTO_RAW" as the value of 'protocol' we set inet->hdrincl to 1. The user can also set inet->hdrincl to 1 or 0 via setsockopt(). I think this is part of the problem. The test above means to check for "RAW socket with hdrincl set" and is trying to do this more simply. But because setsockopt() can set this arbitrarily, testing sk_protocol alone isn't enough. So changing: sk->sk_protocol == IPPROTO_RAW into something like: (sk->sk_socket->type == SOCK_RAW && inet_sk(sk)->hdrincl) should correct the test. That is because this hdrincl thing is what controls which code path we take in rawv6_sendmsg(): if (inet->hdrincl) err = rawv6_send_hdrinc(sk, msg, len, &fl6, &dst, msg->msg_flags); else { ipc6.opt = opt; lock_sock(sk); err = ip6_append_data(sk, raw6_getfrag, &rfv, len, 0, &ipc6, &fl6, (struct rt6_info *)dst, msg->msg_flags, &sockc); if (err) ip6_flush_pending_frames(sk); else if (!(msg->msg_flags & MSG_MORE)) err = rawv6_push_pending_frames(sk, &fl6, rp); release_sock(sk); } You can test if the change I suggest above works.
On Mon, Dec 19, 2016 at 02:48:48PM -0500, David Miller wrote: > One thing that's interesting is that if the user picks "IPPROTO_RAW" > as the value of 'protocol' we set inet->hdrincl to 1. > > The user can also set inet->hdrincl to 1 or 0 via setsockopt(). > > I think this is part of the problem. The test above means to check > for "RAW socket with hdrincl set" and is trying to do this more simply. > But because setsockopt() can set this arbitrarily, testing sk_protocol > alone isn't enough. > > So changing: > > sk->sk_protocol == IPPROTO_RAW > > into something like: > > (sk->sk_socket->type == SOCK_RAW && inet_sk(sk)->hdrincl) > > should correct the test. > .. > > You can test if the change I suggest above works. Unfortunately, this made no difference. I spent some time today trying to make a better reproducer, but failed. I'll revisit again tomorrow. Maybe I need >1 process/thread to trigger this. That would explain why I can trigger it with Trinity. Dave
On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > Unfortunately, this made no difference. I spent some time today trying > to make a better reproducer, but failed. I'll revisit again tomorrow. > > Maybe I need >1 process/thread to trigger this. That would explain why > I can trigger it with Trinity. scratch that last part, I finally just repro'd it with a single process. Dave
From: Dave Jones <davej@codemonkey.org.uk> Date: Mon, 19 Dec 2016 19:40:13 -0500 > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > > > Unfortunately, this made no difference. I spent some time today trying > > to make a better reproducer, but failed. I'll revisit again tomorrow. > > > > Maybe I need >1 process/thread to trigger this. That would explain why > > I can trigger it with Trinity. > > scratch that last part, I finally just repro'd it with a single process. Thanks for the info, I'll try to think about this some more.
On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote: > From: Dave Jones <davej@codemonkey.org.uk> > Date: Mon, 19 Dec 2016 19:40:13 -0500 > > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > > > > > Unfortunately, this made no difference. I spent some time today trying > > > to make a better reproducer, but failed. I'll revisit again tomorrow. > > > > > > Maybe I need >1 process/thread to trigger this. That would explain why > > > I can trigger it with Trinity. > > > > scratch that last part, I finally just repro'd it with a single process. > > Thanks for the info, I'll try to think about this some more. I threw in some debug printks right before that BUG_ON. it's always this: skb->len=31 skb->data_len=0 offset:30 total_len:9 Shouldn't we have kicked out data_len=0 skb's somewhere before we got this far ? Dave
From: Dave Jones <davej@codemonkey.org.uk> Date: Tue, 20 Dec 2016 13:17:28 -0500 > On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote: > > From: Dave Jones <davej@codemonkey.org.uk> > > Date: Mon, 19 Dec 2016 19:40:13 -0500 > > > > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > > > > > > > Unfortunately, this made no difference. I spent some time today trying > > > > to make a better reproducer, but failed. I'll revisit again tomorrow. > > > > > > > > Maybe I need >1 process/thread to trigger this. That would explain why > > > > I can trigger it with Trinity. > > > > > > scratch that last part, I finally just repro'd it with a single process. > > > > Thanks for the info, I'll try to think about this some more. > > I threw in some debug printks right before that BUG_ON. > it's always this: > > skb->len=31 skb->data_len=0 offset:30 total_len:9 > > Shouldn't we have kicked out data_len=0 skb's somewhere before we got this far ? skb->data_len is just the length of any non-linear data in the SKB. This has to do with the SKB buffer layout and geometry, not whether the packet is "fragmented" in the protocol sense. So no, this isn't a criteria for packets being filtered out by this point. Can you try to capture what sk->sk_socket->type and inet_sk(sk)->hdrincl are set to at the time of the crash? Thanks.
On Tue, Dec 20, 2016 at 10:17 AM, Dave Jones <davej@codemonkey.org.uk> wrote: > On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote: > > From: Dave Jones <davej@codemonkey.org.uk> > > Date: Mon, 19 Dec 2016 19:40:13 -0500 > > > > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > > > > > > > Unfortunately, this made no difference. I spent some time today trying > > > > to make a better reproducer, but failed. I'll revisit again tomorrow. > > > > > > > > Maybe I need >1 process/thread to trigger this. That would explain why > > > > I can trigger it with Trinity. > > > > > > scratch that last part, I finally just repro'd it with a single process. > > > > Thanks for the info, I'll try to think about this some more. > > I threw in some debug printks right before that BUG_ON. > it's always this: > > skb->len=31 skb->data_len=0 offset:30 total_len:9 Clearly we fail because 30 > 31 - 2, seems 'offset' is not correct here, off-by-one?
On Tue, Dec 20, 2016 at 01:28:13PM -0500, David Miller wrote: > This has to do with the SKB buffer layout and geometry, not whether > the packet is "fragmented" in the protocol sense. > > So no, this isn't a criteria for packets being filtered out by this > point. > > Can you try to capture what sk->sk_socket->type and > inet_sk(sk)->hdrincl are set to at the time of the crash? > type:3 hdrincl:0 Dave
On Tue, Dec 20, 2016 at 11:31:38AM -0800, Cong Wang wrote: > On Tue, Dec 20, 2016 at 10:17 AM, Dave Jones <davej@codemonkey.org.uk> wrote: > > On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote: > > > From: Dave Jones <davej@codemonkey.org.uk> > > > Date: Mon, 19 Dec 2016 19:40:13 -0500 > > > > > > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote: > > > > > > > > > Unfortunately, this made no difference. I spent some time today trying > > > > > to make a better reproducer, but failed. I'll revisit again tomorrow. > > > > > > > > > > Maybe I need >1 process/thread to trigger this. That would explain why > > > > > I can trigger it with Trinity. > > > > > > > > scratch that last part, I finally just repro'd it with a single process. > > > > > > Thanks for the info, I'll try to think about this some more. > > > > I threw in some debug printks right before that BUG_ON. > > it's always this: > > > > skb->len=31 skb->data_len=0 offset:30 total_len:9 > > Clearly we fail because 30 > 31 - 2, seems 'offset' is not correct here, > off-by-one? Ok, I finally made a messy, albeit good enough reproducer. #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #define LEN 504 int main(int argc, char* argv[]) { int fd; int zero = 0; char buf[LEN]; memset(buf, 0, LEN); fd = socket(AF_INET6, SOCK_RAW, 7); setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4); setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN); sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110); }
On Tue, Dec 20, 2016 at 2:12 PM, Dave Jones <davej@codemonkey.org.uk> wrote: > fd = socket(AF_INET6, SOCK_RAW, 7); > > setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4); > setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN); > Interesting, you set the checksum offset to be 0, but the packet size is actually 49, transport header is located at offset 48, so apparently the packet doesn't have room for a 16bit checksum after network header. Your original patch seems reasonable to me, unless there is some check in __ip6_append_data() which is supposed to catch this, but CHECKSUM is specific to raw socket only.
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 291ebc260e70..35aa82faa052 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -591,7 +591,9 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6, } offset += skb_transport_offset(skb); - BUG_ON(skb_copy_bits(skb, offset, &csum, 2)); + err = skb_copy_bits(skb, offset, &csum, 2); + if (err < 0) + goto out; /* in case cksum was not initialized */ if (unlikely(csum))
It seems to be possible to craft a packet for sendmsg that triggers the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like: RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40 RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282 RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002 RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00 RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009 R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030 R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80 Call Trace: [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170 [<ffffffff816da27e>] SyS_sendto+0xe/0x10 [<ffffffff81002910>] do_syscall_64+0x50/0xa0 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25 Handle this in rawv6_push_pending_frames and jump to the failure path. Signed-off-by: Dave Jones <davej@codemonkey.org.uk>