Message ID | 1314961686-30870-1-git-send-email-phil.sutter@viprinet.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote: > This flushes the cache before and after accessing the mmapped packet > buffer. It seems like the call to flush_dcache_page from inside > __packet_get_status is not enough on Kirkwood (or ARM in general). > --- > I know this is far from an optimal solution, but it's in fact the only working > one I found. [...] This is ridiculous. If flush_dcache_page() isn't doing everything it should, you need to fix that. Ben.
On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote: > On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote: > > This flushes the cache before and after accessing the mmapped packet > > buffer. It seems like the call to flush_dcache_page from inside > > __packet_get_status is not enough on Kirkwood (or ARM in general). > > --- > > I know this is far from an optimal solution, but it's in fact the only working > > one I found. > [...] > > This is ridiculous. If flush_dcache_page() isn't doing everything it > should, you need to fix that. You're absolutely correct. But in fact this problem goes way too deep for me to find it's cause. And since my time is finite, I doubt this will change in the near future. So I asked for help, a pointer in whatever direction or anything I could try to help further analyzing - without any response (unless I missed it, in which case I apologize). Please don't get me wrong. I have no intend in this patch becoming mainline, just want to give others with the same problem a starting point. Greetings, Phil
> > This flushes the cache before and after accessing the mmapped packet > buffer. It seems like the call to flush_dcache_page from inside > __packet_get_status is not enough on Kirkwood (or ARM in general). > + kw_extra_cache_flush(); > + rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) : > + packet_snd(sock, msg, len); > + kw_extra_cache_flush(); > + return rc; > } If a workaround is needed for mmap, then why not change tpacket_snd? Also, is this workaround actually working for all the cases? Because packet_get_status is not being touched in your patch. Also, I don't see any changes for the Rx-path. Is that working ok? Chetan Loke -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 02, 2011 at 10:00:16AM -0400, chetan loke wrote: > > > > This flushes the cache before and after accessing the mmapped packet > > buffer. It seems like the call to flush_dcache_page from inside > > __packet_get_status is not enough on Kirkwood (or ARM in general). > > > > > + kw_extra_cache_flush(); > > + rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) : > > + packet_snd(sock, msg, len); > > + kw_extra_cache_flush(); > > + return rc; > > } > > If a workaround is needed for mmap, then why not change tpacket_snd? I did not verify that packet_snd() is not affected. OTOH, adding it there was quite "intuitive". > Also, is this workaround actually working for all the cases? Because > packet_get_status is not being touched in your patch. > > Also, I don't see any changes for the Rx-path. Is that working ok? So far we haven't noticed problems in that direction. I just tried some explicit test: having tcpdump print local timestamps (not the pcap-ones) on every received packet, activating icmp_echo_ignore_all and pinging the host on a dedicated line. I expected to sometimes see a second difference between the two timestamps, as like with sending from time to time a packet should get "lost" in the cache, and then occur to userspace after the next one arrived. Maybe my test is broken, or RX is indeed unaffected. Greetings and thanks for the hints, Phil
On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote: > On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote: > > This flushes the cache before and after accessing the mmapped packet > > buffer. It seems like the call to flush_dcache_page from inside > > __packet_get_status is not enough on Kirkwood (or ARM in general). > > --- > > I know this is far from an optimal solution, but it's in fact the only working > > one I found. > [...] > > This is ridiculous. If flush_dcache_page() isn't doing everything it > should, you need to fix that. It does do everything it should - which is to perform maintanence on page cache pages. It flushes the kernel mapping of the page. It also flushes the userspace mappings of the page which it finds by walking the mmap list via the associated struct page. It does not touch vmalloc mappings because it has no way to know whether they exist or not. It doesn't do so much for anonymous pages - to do so would only duplicate what flush_anon_page() does at the very same callsites. Plus the mmap list isn't available for such pages so there's no way to find out what userspace addresses to flush. If the AF_PACKET buffers are created from anonymous pages and it's using flush_dcache_page(), it's using the wrong interface. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Fri, Sep 02, 2011 at 06:28:50PM +0100, Russell King - ARM Linux wrote: > On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote: > > On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote: > > > This flushes the cache before and after accessing the mmapped packet > > > buffer. It seems like the call to flush_dcache_page from inside > > > __packet_get_status is not enough on Kirkwood (or ARM in general). > > > --- > > > I know this is far from an optimal solution, but it's in fact the only working > > > one I found. > > [...] > > > > This is ridiculous. If flush_dcache_page() isn't doing everything it > > should, you need to fix that. > > It does do everything it should - which is to perform maintanence on > page cache pages. It flushes the kernel mapping of the page. It > also flushes the userspace mappings of the page which it finds by > walking the mmap list via the associated struct page. It does not > touch vmalloc mappings because it has no way to know whether they > exist or not. > > It doesn't do so much for anonymous pages - to do so would only > duplicate what flush_anon_page() does at the very same callsites. > Plus the mmap list isn't available for such pages so there's no > way to find out what userspace addresses to flush. Indeed very interesting information, thanks a lot! The code in question uses __get_free_pages(), and if that fails uses vmalloc() (see alloc_one_pg_vec_page() for reference). Both code paths show result in the same faulty behaviour. > If the AF_PACKET buffers are created from anonymous pages and it's > using flush_dcache_page(), it's using the wrong interface. So, in order to fix this, which alternative would you suggest? Quite a lot of work has been done regarding memory allocation, so I guess changing that side is a no-go. Greetings, Phil
On Mon, Sep 05, 2011 at 09:57:14PM +0200, Phil Sutter wrote: > Hi, > > On Fri, Sep 02, 2011 at 06:28:50PM +0100, Russell King - ARM Linux wrote: > > On Fri, Sep 02, 2011 at 02:46:17PM +0100, Ben Hutchings wrote: > > > On Fri, 2011-09-02 at 13:08 +0200, Phil Sutter wrote: > > > > This flushes the cache before and after accessing the mmapped packet > > > > buffer. It seems like the call to flush_dcache_page from inside > > > > __packet_get_status is not enough on Kirkwood (or ARM in general). > > > > --- > > > > I know this is far from an optimal solution, but it's in fact the only working > > > > one I found. > > > [...] > > > > > > This is ridiculous. If flush_dcache_page() isn't doing everything it > > > should, you need to fix that. > > > > It does do everything it should - which is to perform maintanence on > > page cache pages. It flushes the kernel mapping of the page. It > > also flushes the userspace mappings of the page which it finds by > > walking the mmap list via the associated struct page. It does not > > touch vmalloc mappings because it has no way to know whether they > > exist or not. > > > > It doesn't do so much for anonymous pages - to do so would only > > duplicate what flush_anon_page() does at the very same callsites. > > Plus the mmap list isn't available for such pages so there's no > > way to find out what userspace addresses to flush. > > Indeed very interesting information, thanks a lot! > > The code in question uses __get_free_pages(), and if that fails uses > vmalloc() (see alloc_one_pg_vec_page() for reference). Both code paths > show result in the same faulty behaviour. So, what you're wanting is cache coherency between vmalloc() and userspace. There is no API in the kernel to do that, and you'll see the same failures of this interface not only on ARM but also other architectures with virtual caches. It sounds like we need an API to flush the cache using both the userspace address, plus the kernel side address be that in the direct map or the vmalloc map areas. Or maybe the right solution is to simply disable AF_PACKET MMAP support for virtual cached architectures - it may be that adding cache flushing calls makes the thing too expensive and the benefits of mmap over normal read/write are lost. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 06, 2011 at 10:57:22AM +0100, Russell King - ARM Linux wrote: > > The code in question uses __get_free_pages(), and if that fails uses > > vmalloc() (see alloc_one_pg_vec_page() for reference). Both code paths > > show result in the same faulty behaviour. > > So, what you're wanting is cache coherency between vmalloc() and > userspace. There is no API in the kernel to do that, and you'll see > the same failures of this interface not only on ARM but also other > architectures with virtual caches. > > It sounds like we need an API to flush the cache using both the > userspace address, plus the kernel side address be that in the direct > map or the vmalloc map areas. > > Or maybe the right solution is to simply disable AF_PACKET MMAP support > for virtual cached architectures - it may be that adding cache flushing > calls makes the thing too expensive and the benefits of mmap over normal > read/write are lost. OK, that's horrible. Of course we depend on just this combination to work flawlessly, i.e. PACKET_MMAP && VIVT. :( Another userspace-interface I'm working on uses a different solution: memory is allocated in userspace and accessed from kernelspace using get_user_pages(). I did not explicitly search for the earlier described fault pattern, but we didn't notice any problem with this approach on the very same hardware either. I already see myself writing TPACKET_V3. ;) What do you think? Greetings, Phil
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 243946d..d7b5c2e 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -87,6 +87,14 @@ #include <net/inet_common.h> #endif +/* whether we need additional cacheflushing between user- and kernel-space */ +#ifdef CONFIG_ARCH_KIRKWOOD +# define ENABLE_CACHEPROB_WORKAROUND +# define kw_extra_cache_flush() flush_cache_all() +#else +# define kw_extra_cache_flush() /* nothing */ +#endif + /* Assumptions: - if device has no dev->hard_header routine, it adds and removes ll header @@ -1239,10 +1247,13 @@ static int packet_sendmsg(struct kiocb *iocb, struct socket *sock, { struct sock *sk = sock->sk; struct packet_sock *po = pkt_sk(sk); - if (po->tx_ring.pg_vec) - return tpacket_snd(po, msg); - else - return packet_snd(sock, msg, len); + int rc; + + kw_extra_cache_flush(); + rc = po->tx_ring.pg_vec ? tpacket_snd(po, msg) : + packet_snd(sock, msg, len); + kw_extra_cache_flush(); + return rc; } /* @@ -2622,6 +2633,11 @@ static int __init packet_init(void) sock_register(&packet_family_ops); register_pernet_subsys(&packet_net_ops); register_netdevice_notifier(&packet_netdev_notifier); + +#ifdef ENABLE_CACHEPROB_WORKAROUND + printk(KERN_INFO "af_packet: cache coherency workaround for kirkwood is active!\n"); +#endif + out: return rc; }