Message ID | 1227284770-19215-4-git-send-email-joerg.roedel@amd.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
* Joerg Roedel <joerg.roedel@amd.com> wrote: > +extern > +void dma_debug_init(void); this can be on a single line. > + > +#else /* CONFIG_DMA_API_DEBUG */ > + > +static inline > +void dma_debug_init(void) this too. (when something fits on a single line, we prefer it so) > +#include <linux/types.h> > +#include <linux/scatterlist.h> > +#include <linux/list.h> > +#include <linux/slab.h> > +#include <linux/spinlock.h> > +#include <linux/module.h> > +#include <linux/hardirq.h> > +#include <linux/dma-mapping.h> > +#include <asm/bug.h> > +#include <asm/dma-mapping.h> > +#include <asm/dma_debug.h> to reduce the chances of commit conflicts in the future, we generally sort include lines in x86 files the following way: > +#include <linux/scatterlist.h> > +#include <linux/dma-mapping.h> > +#include <linux/spinlock.h> > +#include <linux/hardirq.h> > +#include <linux/module.h> > +#include <linux/types.h> > +#include <linux/list.h> > +#include <linux/slab.h> > > +#include <asm/bug.h> > +#include <asm/dma-mapping.h> > +#include <asm/dma_debug.h> [ note the extra newline too between the linux/ and asm/ portions. ] > +#define HASH_SIZE 256 > +#define HASH_FN_SHIFT 20 > +#define HASH_FN_MASK 0xffULL please align the values vertically. > +/* Hash list to save the allocated dma addresses */ > +static struct list_head dma_entry_hash[HASH_SIZE]; Should be cacheline-aligned i guess - if this feature is enabled this is a hot area. > +/* A slab cache to allocate dma_map_entries fast */ > +static struct kmem_cache *dma_entry_cache; __read_mostly - to isolate it from the above hot area. > +/* lock to protect the data structures */ > +static DEFINE_SPINLOCK(dma_lock); should have a separate cacheline too i guess. > +static int hash_fn(struct dma_debug_entry *entry) > +{ > + /* > + * Hash function is based on the dma address. > + * We use bits 20-27 here as the index into the hash > + */ > + BUG_ON(entry->dev_addr == bad_dma_address); please use WARN_ON_ONCE() instead of crashing the box in possibly hard to debug spots. > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; > +} > + > +static struct dma_debug_entry *dma_entry_alloc(void) > +{ > + gfp_t gfp = GFP_KERNEL | __GFP_ZERO; > + > + if (in_atomic()) > + gfp |= GFP_ATOMIC; > + > + return kmem_cache_alloc(dma_entry_cache, gfp); hm. that slab allocation in the middle of DMA mapping ops makes me a bit nervous. the DMA mapping ops are generally rather atomic. and in_atomic() check there is a bug on !PREEMPT kernels, so it wont fly. We dont have a gfp flag passed in as all the DMA mapping APIs really expect all allocations having been done in advance already. Any chance to attach the debug entry to the iotlb entries themselves - either during build time (for swiotlb) or during iommu init time (for the hw iommu's) ? Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 21, 2008 at 05:56:28PM +0100, Ingo Molnar wrote: > > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; > > +} > > + > > +static struct dma_debug_entry *dma_entry_alloc(void) > > +{ > > + gfp_t gfp = GFP_KERNEL | __GFP_ZERO; > > + > > + if (in_atomic()) > > + gfp |= GFP_ATOMIC; > > + > > + return kmem_cache_alloc(dma_entry_cache, gfp); > > hm. that slab allocation in the middle of DMA mapping ops makes me a > bit nervous. the DMA mapping ops are generally rather atomic. > > and in_atomic() check there is a bug on !PREEMPT kernels, so it wont > fly. I am not sure I understand this correctly. You say the check for in_atomic() will break on !PREEMPT kernels? > We dont have a gfp flag passed in as all the DMA mapping APIs really > expect all allocations having been done in advance already. Hmm, I can change the code to pre-allocate a certain (configurable?) number of these entries and disble the checking if we run out of it. > Any chance to attach the debug entry to the iotlb entries themselves - > either during build time (for swiotlb) or during iommu init time (for > the hw iommu's) ? Hm, I want to avoid that. This approach has the advantage that is works independent of any dma_ops implementation. So it can be used to find out if a DMA bug originates from the device driver or (in my case) from the IOMMU driver. Joerg
* Joerg Roedel <joerg.roedel@amd.com> wrote: > On Fri, Nov 21, 2008 at 05:56:28PM +0100, Ingo Molnar wrote: > > > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; > > > +} > > > + > > > +static struct dma_debug_entry *dma_entry_alloc(void) > > > +{ > > > + gfp_t gfp = GFP_KERNEL | __GFP_ZERO; > > > + > > > + if (in_atomic()) > > > + gfp |= GFP_ATOMIC; > > > + > > > + return kmem_cache_alloc(dma_entry_cache, gfp); > > > > hm. that slab allocation in the middle of DMA mapping ops makes me a > > bit nervous. the DMA mapping ops are generally rather atomic. > > > > and in_atomic() check there is a bug on !PREEMPT kernels, so it wont > > fly. > > I am not sure I understand this correctly. You say the check for > in_atomic() will break on !PREEMPT kernels? Correct. There is no check to be able to tell whether we can schedule or not. I.e. on !PREEMPT your patches will crash and burn. > > We dont have a gfp flag passed in as all the DMA mapping APIs > > really expect all allocations having been done in advance already. > > Hmm, I can change the code to pre-allocate a certain (configurable?) > number of these entries and disble the checking if we run out of it. yeah, that's a good approach too - that's what lockdep does. Perhaps make the max entries nr a Kconfig entity - so it can be tuned up/down depending on what type of iommu scheme is enabled. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Ingo Molnar <mingo@elte.hu> wrote: > > > We dont have a gfp flag passed in as all the DMA mapping APIs > > > really expect all allocations having been done in advance > > > already. > > > > Hmm, I can change the code to pre-allocate a certain > > (configurable?) number of these entries and disble the checking if > > we run out of it. > > yeah, that's a good approach too - that's what lockdep does. Perhaps > make the max entries nr a Kconfig entity - so it can be tuned > up/down depending on what type of iommu scheme is enabled. there's another reason why we want to do that: this scheme covers all of DMA - not just the ones which need to go via an iommu and for which there's an IOTLB entry present. So the pool should probably be sized after RAM size, to be on the safe side. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Joerg Roedel <joerg.roedel@amd.com> wrote: > +static struct list_head dma_entry_hash[HASH_SIZE]; > + > +/* A slab cache to allocate dma_map_entries fast */ > +static struct kmem_cache *dma_entry_cache; > + > +/* lock to protect the data structures */ > +static DEFINE_SPINLOCK(dma_lock); some more generic comments about the data structure: it's main purpose is to provide a mapping based on (dev,addr). There's little if any cross-entry interaction - same-address+same-dev DMA is checked. 1) the hash: + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; should mix in entry->dev as well - that way we get not just per address but per device hash space separation as well. 2) HASH_FN_SHIFT is 1MB chunks right now - that's probably fine in practice albeit perhaps a bit too small. There's seldom any coherency between the physical addresses of DMA - we rarely have any real (performance-relevant) physical co-location of DMA addresses beyond 4K granularity. So using 1MB chunking here will discard a good deal of random low bits we should be hashing on. 3) And the most scalable locking would be per hash bucket locking - no global lock is needed. The bucket hash heads should probably be cacheline sized - so we'd get one lock per bucket. This way if there's irq+DMA traffic on one CPU from one device into one range of memory, and irq+DMA traffic on another CPU to another device, they will map to two different hash buckets. 4) Plus it might be an option to make hash lookup lockless as well: depending on the DMA flux we can get a lot of lookups, and taking the bucket lock can be avoided, if you use RCU-safe list ops and drive the refilling of the free entries pool from RCU. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 21 Nov 2008 17:26:03 +0100 Joerg Roedel <joerg.roedel@amd.com> wrote: > Impact: creates necessary data structures for DMA-API debugging > > Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> > --- > arch/x86/include/asm/dma-mapping.h | 1 + > arch/x86/include/asm/dma_debug.h | 14 +++++ > arch/x86/kernel/Makefile | 2 + > arch/x86/kernel/pci-dma-debug.c | 111 ++++++++++++++++++++++++++++++++++++ > arch/x86/kernel/pci-dma.c | 2 + > 5 files changed, 130 insertions(+), 0 deletions(-) > create mode 100644 arch/x86/kernel/pci-dma-debug.c > > diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h > index 7f225a4..83d7b7d 100644 > --- a/arch/x86/include/asm/dma-mapping.h > +++ b/arch/x86/include/asm/dma-mapping.h > @@ -9,6 +9,7 @@ > #include <linux/scatterlist.h> > #include <asm/io.h> > #include <asm/swiotlb.h> > +#include <asm/dma_debug.h> > #include <asm-generic/dma-coherent.h> > > extern dma_addr_t bad_dma_address; > diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h > index d79f024..f2c3d53 100644 > --- a/arch/x86/include/asm/dma_debug.h > +++ b/arch/x86/include/asm/dma_debug.h > @@ -38,4 +38,18 @@ struct dma_debug_entry { > int direction; > }; > > +#ifdef CONFIG_DMA_API_DEBUG > + > +extern > +void dma_debug_init(void); > + > +#else /* CONFIG_DMA_API_DEBUG */ > + > +static inline > +void dma_debug_init(void) > +{ > +} > + > +#endif /* CONFIG_DMA_API_DEBUG */ > + > #endif /* __ASM_X86_DMA_DEBUG */ > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > index e489ff9..6271cd2 100644 > --- a/arch/x86/kernel/Makefile > +++ b/arch/x86/kernel/Makefile > @@ -105,6 +105,8 @@ microcode-$(CONFIG_MICROCODE_INTEL) += microcode_intel.o > microcode-$(CONFIG_MICROCODE_AMD) += microcode_amd.o > obj-$(CONFIG_MICROCODE) += microcode.o > > +obj-$(CONFIG_DMA_API_DEBUG) += pci-dma-debug.o > + > ### > # 64 bit specific files > ifeq ($(CONFIG_X86_64),y) > diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c > new file mode 100644 > index 0000000..c2d3408 > --- /dev/null > +++ b/arch/x86/kernel/pci-dma-debug.c > @@ -0,0 +1,111 @@ > +/* > + * Copyright (C) 2008 Advanced Micro Devices, Inc. > + * > + * Author: Joerg Roedel <joerg.roedel@amd.com> > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published > + * by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > + */ > + > +#include <linux/types.h> > +#include <linux/scatterlist.h> > +#include <linux/list.h> > +#include <linux/slab.h> > +#include <linux/spinlock.h> > +#include <linux/module.h> > +#include <linux/hardirq.h> > +#include <linux/dma-mapping.h> > +#include <asm/bug.h> > +#include <asm/dma-mapping.h> > +#include <asm/dma_debug.h> > + > +#define HASH_SIZE 256 > +#define HASH_FN_SHIFT 20 > +#define HASH_FN_MASK 0xffULL > + > +/* Hash list to save the allocated dma addresses */ > +static struct list_head dma_entry_hash[HASH_SIZE]; > + > +/* A slab cache to allocate dma_map_entries fast */ > +static struct kmem_cache *dma_entry_cache; > + > +/* lock to protect the data structures */ > +static DEFINE_SPINLOCK(dma_lock); > + > +static int hash_fn(struct dma_debug_entry *entry) > +{ > + /* > + * Hash function is based on the dma address. > + * We use bits 20-27 here as the index into the hash > + */ > + BUG_ON(entry->dev_addr == bad_dma_address); 'bad_dma_address' is x86 specific. You already found it though. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Nov 22, 2008 at 12:27:41PM +0900, FUJITA Tomonori wrote: > On Fri, 21 Nov 2008 17:26:03 +0100 > Joerg Roedel <joerg.roedel@amd.com> wrote: > > > Impact: creates necessary data structures for DMA-API debugging > > > > Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> > > --- > > arch/x86/include/asm/dma-mapping.h | 1 + > > arch/x86/include/asm/dma_debug.h | 14 +++++ > > arch/x86/kernel/Makefile | 2 + > > arch/x86/kernel/pci-dma-debug.c | 111 ++++++++++++++++++++++++++++++++++++ > > arch/x86/kernel/pci-dma.c | 2 + > > 5 files changed, 130 insertions(+), 0 deletions(-) > > create mode 100644 arch/x86/kernel/pci-dma-debug.c > > > > diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h > > index 7f225a4..83d7b7d 100644 > > --- a/arch/x86/include/asm/dma-mapping.h > > +++ b/arch/x86/include/asm/dma-mapping.h > > @@ -9,6 +9,7 @@ > > #include <linux/scatterlist.h> > > #include <asm/io.h> > > #include <asm/swiotlb.h> > > +#include <asm/dma_debug.h> > > #include <asm-generic/dma-coherent.h> > > > > extern dma_addr_t bad_dma_address; > > diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h > > index d79f024..f2c3d53 100644 > > --- a/arch/x86/include/asm/dma_debug.h > > +++ b/arch/x86/include/asm/dma_debug.h > > @@ -38,4 +38,18 @@ struct dma_debug_entry { > > int direction; > > }; > > > > +#ifdef CONFIG_DMA_API_DEBUG > > + > > +extern > > +void dma_debug_init(void); > > + > > +#else /* CONFIG_DMA_API_DEBUG */ > > + > > +static inline > > +void dma_debug_init(void) > > +{ > > +} > > + > > +#endif /* CONFIG_DMA_API_DEBUG */ > > + > > #endif /* __ASM_X86_DMA_DEBUG */ > > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > > index e489ff9..6271cd2 100644 > > --- a/arch/x86/kernel/Makefile > > +++ b/arch/x86/kernel/Makefile > > @@ -105,6 +105,8 @@ microcode-$(CONFIG_MICROCODE_INTEL) += microcode_intel.o > > microcode-$(CONFIG_MICROCODE_AMD) += microcode_amd.o > > obj-$(CONFIG_MICROCODE) += microcode.o > > > > +obj-$(CONFIG_DMA_API_DEBUG) += pci-dma-debug.o > > + > > ### > > # 64 bit specific files > > ifeq ($(CONFIG_X86_64),y) > > diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c > > new file mode 100644 > > index 0000000..c2d3408 > > --- /dev/null > > +++ b/arch/x86/kernel/pci-dma-debug.c > > @@ -0,0 +1,111 @@ > > +/* > > + * Copyright (C) 2008 Advanced Micro Devices, Inc. > > + * > > + * Author: Joerg Roedel <joerg.roedel@amd.com> > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published > > + * by the Free Software Foundation. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public License > > + * along with this program; if not, write to the Free Software > > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > > + */ > > + > > +#include <linux/types.h> > > +#include <linux/scatterlist.h> > > +#include <linux/list.h> > > +#include <linux/slab.h> > > +#include <linux/spinlock.h> > > +#include <linux/module.h> > > +#include <linux/hardirq.h> > > +#include <linux/dma-mapping.h> > > +#include <asm/bug.h> > > +#include <asm/dma-mapping.h> > > +#include <asm/dma_debug.h> > > + > > +#define HASH_SIZE 256 > > +#define HASH_FN_SHIFT 20 > > +#define HASH_FN_MASK 0xffULL > > + > > +/* Hash list to save the allocated dma addresses */ > > +static struct list_head dma_entry_hash[HASH_SIZE]; > > + > > +/* A slab cache to allocate dma_map_entries fast */ > > +static struct kmem_cache *dma_entry_cache; > > + > > +/* lock to protect the data structures */ > > +static DEFINE_SPINLOCK(dma_lock); > > + > > +static int hash_fn(struct dma_debug_entry *entry) > > +{ > > + /* > > + * Hash function is based on the dma address. > > + * We use bits 20-27 here as the index into the hash > > + */ > > + BUG_ON(entry->dev_addr == bad_dma_address); > > 'bad_dma_address' is x86 specific. You already found it though. Interesting. Is there another value for dma_addr_t which drivers can check for to find out if a dma-api operation failed? Joerg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 21, 2008 at 06:43:48PM +0100, Ingo Molnar wrote: > > * Joerg Roedel <joerg.roedel@amd.com> wrote: > > > +static struct list_head dma_entry_hash[HASH_SIZE]; > > + > > +/* A slab cache to allocate dma_map_entries fast */ > > +static struct kmem_cache *dma_entry_cache; > > + > > +/* lock to protect the data structures */ > > +static DEFINE_SPINLOCK(dma_lock); > > some more generic comments about the data structure: it's main purpose > is to provide a mapping based on (dev,addr). There's little if any > cross-entry interaction - same-address+same-dev DMA is checked. > > 1) > > the hash: > > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; > > should mix in entry->dev as well - that way we get not just per > address but per device hash space separation as well. > > 2) > > HASH_FN_SHIFT is 1MB chunks right now - that's probably fine in > practice albeit perhaps a bit too small. There's seldom any coherency > between the physical addresses of DMA - we rarely have any real > (performance-relevant) physical co-location of DMA addresses beyond 4K > granularity. So using 1MB chunking here will discard a good deal of > random low bits we should be hashing on. > > 3) > > And the most scalable locking would be per hash bucket locking - no > global lock is needed. The bucket hash heads should probably be > cacheline sized - so we'd get one lock per bucket. Hmm, I just had the idea of saving this data in struct device. How about that? The locking should scale too and we can extend it easier. For example it simplifys a per-device disable function for the checking. Or another future feature might be leak tracing. > This way if there's irq+DMA traffic on one CPU from one device into > one range of memory, and irq+DMA traffic on another CPU to another > device, they will map to two different hash buckets. > > 4) > > Plus it might be an option to make hash lookup lockless as well: > depending on the DMA flux we can get a lot of lookups, and taking the > bucket lock can be avoided, if you use RCU-safe list ops and drive the > refilling of the free entries pool from RCU. Joerg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 22 Nov 2008 10:40:32 +0100 Joerg Roedel <joro@8bytes.org> wrote: > > > +static int hash_fn(struct dma_debug_entry *entry) > > > +{ > > > + /* > > > + * Hash function is based on the dma address. > > > + * We use bits 20-27 here as the index into the hash > > > + */ > > > + BUG_ON(entry->dev_addr == bad_dma_address); > > > > 'bad_dma_address' is x86 specific. You already found it though. > > Interesting. Is there another value for dma_addr_t which drivers can > check for to find out if a dma-api operation failed? They are architecture dependent. But only X86 uses a variable because of GART and swiotlb, I think. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 22 Nov 2008 19:16:05 +0900 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > On Sat, 22 Nov 2008 10:40:32 +0100 > Joerg Roedel <joro@8bytes.org> wrote: > > > > > +static int hash_fn(struct dma_debug_entry *entry) > > > > +{ > > > > + /* > > > > + * Hash function is based on the dma address. > > > > + * We use bits 20-27 here as the index into the hash > > > > + */ > > > > + BUG_ON(entry->dev_addr == bad_dma_address); > > > > > > 'bad_dma_address' is x86 specific. You already found it though. > > > > Interesting. Is there another value for dma_addr_t which drivers can > > check for to find out if a dma-api operation failed? > > They are architecture dependent. But only X86 uses a variable because > of GART and swiotlb, I think. BTW, this code doesn't work even on x86 (swiotlb). dma_map_error should be used, which is an architecture-independent way to test a dma address. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Joerg Roedel <joro@8bytes.org> wrote: > On Fri, Nov 21, 2008 at 06:43:48PM +0100, Ingo Molnar wrote: > > > > * Joerg Roedel <joerg.roedel@amd.com> wrote: > > > > > +static struct list_head dma_entry_hash[HASH_SIZE]; > > > + > > > +/* A slab cache to allocate dma_map_entries fast */ > > > +static struct kmem_cache *dma_entry_cache; > > > + > > > +/* lock to protect the data structures */ > > > +static DEFINE_SPINLOCK(dma_lock); > > > > some more generic comments about the data structure: it's main purpose > > is to provide a mapping based on (dev,addr). There's little if any > > cross-entry interaction - same-address+same-dev DMA is checked. > > > > 1) > > > > the hash: > > > > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; > > > > should mix in entry->dev as well - that way we get not just per > > address but per device hash space separation as well. > > > > 2) > > > > HASH_FN_SHIFT is 1MB chunks right now - that's probably fine in > > practice albeit perhaps a bit too small. There's seldom any coherency > > between the physical addresses of DMA - we rarely have any real > > (performance-relevant) physical co-location of DMA addresses beyond 4K > > granularity. So using 1MB chunking here will discard a good deal of > > random low bits we should be hashing on. > > > > 3) > > > > And the most scalable locking would be per hash bucket locking - no > > global lock is needed. The bucket hash heads should probably be > > cacheline sized - so we'd get one lock per bucket. > > Hmm, I just had the idea of saving this data in struct device. How > about that? The locking should scale too and we can extend it > easier. For example it simplifys a per-device disable function for > the checking. Or another future feature might be leak tracing. that will help with spreading the hash across devices, but brings in lifetime issues: you must be absolutely sure all DMA has drained at the point a device is deinitialized. Dunno ... i think it's still better to have a central hash for all DMA ops that is aware of per device details. The moment we spread this out to struct device we've lost the ability to _potentially_ do inter-device checking. (for example to make sure no other device is DMA-ing on the same address - which is a possible bug pattern as well although right now Linux does not really avoid it actively) Hm? Btw., also have a look at lib/debugobjects.c: i think we should also consider extending that facility and then just hook it up to the DMA ops. The DMA checking is kind of a similar "op lifetime" scenario - with a few extras to extend lib/debugobjects.c with. Note how it already has pooling, a good hash, etc. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Ingo Molnar <mingo@elte.hu> wrote: > > * Joerg Roedel <joro@8bytes.org> wrote: > > > On Fri, Nov 21, 2008 at 06:43:48PM +0100, Ingo Molnar wrote: > > > > > > * Joerg Roedel <joerg.roedel@amd.com> wrote: > > > > > > > +static struct list_head dma_entry_hash[HASH_SIZE]; > > > > + > > > > +/* A slab cache to allocate dma_map_entries fast */ > > > > +static struct kmem_cache *dma_entry_cache; > > > > + > > > > +/* lock to protect the data structures */ > > > > +static DEFINE_SPINLOCK(dma_lock); > > > > > > some more generic comments about the data structure: it's main purpose > > > is to provide a mapping based on (dev,addr). There's little if any > > > cross-entry interaction - same-address+same-dev DMA is checked. > > > > > > 1) > > > > > > the hash: > > > > > > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; > > > > > > should mix in entry->dev as well - that way we get not just per > > > address but per device hash space separation as well. > > > > > > 2) > > > > > > HASH_FN_SHIFT is 1MB chunks right now - that's probably fine in > > > practice albeit perhaps a bit too small. There's seldom any coherency > > > between the physical addresses of DMA - we rarely have any real > > > (performance-relevant) physical co-location of DMA addresses beyond 4K > > > granularity. So using 1MB chunking here will discard a good deal of > > > random low bits we should be hashing on. > > > > > > 3) > > > > > > And the most scalable locking would be per hash bucket locking - no > > > global lock is needed. The bucket hash heads should probably be > > > cacheline sized - so we'd get one lock per bucket. > > > > Hmm, I just had the idea of saving this data in struct device. How > > about that? The locking should scale too and we can extend it > > easier. For example it simplifys a per-device disable function for > > the checking. Or another future feature might be leak tracing. > > that will help with spreading the hash across devices, but brings in > lifetime issues: you must be absolutely sure all DMA has drained at > the point a device is deinitialized. Note that obviously proper DMA quiescence is a must-have during device dinit anyway, but still, it's an extra complication to init/deinit the hashes, etc. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Another generic suggestion: Why not just replace dma_ops with a debug version? That way it could be a runtime feature based off the already existing DMA op callbacks, without any extra overhead. I'd prefer such a solution much more with an x86 architecture maintainer hat on, because this way distributions could enable this debug feature (with the facility being off by default) without worrying about the wrapping overhead. This would basically be a special variant of stacked DMA ops support. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Joerg Roedel <joerg.roedel@amd.com> writes: > +/* Hash list to save the allocated dma addresses */ > +static struct list_head dma_entry_hash[HASH_SIZE]; Hash tables should use hlists. > +static int hash_fn(struct dma_debug_entry *entry) > +{ > + /* > + * Hash function is based on the dma address. > + * We use bits 20-27 here as the index into the hash > + */ > + BUG_ON(entry->dev_addr == bad_dma_address); > + > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; It would be probably safer to use a stronger hash like FNV There are a couple to reuse in include/ > +} > + > +static struct dma_debug_entry *dma_entry_alloc(void) > +{ > + gfp_t gfp = GFP_KERNEL | __GFP_ZERO; > + > + if (in_atomic()) > + gfp |= GFP_ATOMIC; > + > + return kmem_cache_alloc(dma_entry_cache, gfp); > +} While the basic idea is reasonable this function is unfortunately broken. It's not always safe to allocate memory (e.g. in the block write out path which uses map_sg). You would need to use a mempool or something. Besides the other problem of using GFP_ATOMIC is that it can fail under high load and you don't handle this case very well (would report a bug incorrectly). And stress tests tend to trigger that, reporting false positives in such a case is a very very bad thing, it leads to QA people putting these messages on their blacklists. -Andi
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h index 7f225a4..83d7b7d 100644 --- a/arch/x86/include/asm/dma-mapping.h +++ b/arch/x86/include/asm/dma-mapping.h @@ -9,6 +9,7 @@ #include <linux/scatterlist.h> #include <asm/io.h> #include <asm/swiotlb.h> +#include <asm/dma_debug.h> #include <asm-generic/dma-coherent.h> extern dma_addr_t bad_dma_address; diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h index d79f024..f2c3d53 100644 --- a/arch/x86/include/asm/dma_debug.h +++ b/arch/x86/include/asm/dma_debug.h @@ -38,4 +38,18 @@ struct dma_debug_entry { int direction; }; +#ifdef CONFIG_DMA_API_DEBUG + +extern +void dma_debug_init(void); + +#else /* CONFIG_DMA_API_DEBUG */ + +static inline +void dma_debug_init(void) +{ +} + +#endif /* CONFIG_DMA_API_DEBUG */ + #endif /* __ASM_X86_DMA_DEBUG */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index e489ff9..6271cd2 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -105,6 +105,8 @@ microcode-$(CONFIG_MICROCODE_INTEL) += microcode_intel.o microcode-$(CONFIG_MICROCODE_AMD) += microcode_amd.o obj-$(CONFIG_MICROCODE) += microcode.o +obj-$(CONFIG_DMA_API_DEBUG) += pci-dma-debug.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c new file mode 100644 index 0000000..c2d3408 --- /dev/null +++ b/arch/x86/kernel/pci-dma-debug.c @@ -0,0 +1,111 @@ +/* + * Copyright (C) 2008 Advanced Micro Devices, Inc. + * + * Author: Joerg Roedel <joerg.roedel@amd.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <linux/types.h> +#include <linux/scatterlist.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/module.h> +#include <linux/hardirq.h> +#include <linux/dma-mapping.h> +#include <asm/bug.h> +#include <asm/dma-mapping.h> +#include <asm/dma_debug.h> + +#define HASH_SIZE 256 +#define HASH_FN_SHIFT 20 +#define HASH_FN_MASK 0xffULL + +/* Hash list to save the allocated dma addresses */ +static struct list_head dma_entry_hash[HASH_SIZE]; + +/* A slab cache to allocate dma_map_entries fast */ +static struct kmem_cache *dma_entry_cache; + +/* lock to protect the data structures */ +static DEFINE_SPINLOCK(dma_lock); + +static int hash_fn(struct dma_debug_entry *entry) +{ + /* + * Hash function is based on the dma address. + * We use bits 20-27 here as the index into the hash + */ + BUG_ON(entry->dev_addr == bad_dma_address); + + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; +} + +static struct dma_debug_entry *dma_entry_alloc(void) +{ + gfp_t gfp = GFP_KERNEL | __GFP_ZERO; + + if (in_atomic()) + gfp |= GFP_ATOMIC; + + return kmem_cache_alloc(dma_entry_cache, gfp); +} + +static void dma_entry_free(struct dma_debug_entry *entry) +{ + kmem_cache_free(dma_entry_cache, entry); +} + +static struct dma_debug_entry * +find_dma_entry(struct dma_debug_entry *ref) +{ + int idx = hash_fn(ref); + struct dma_debug_entry *entry; + + list_for_each_entry(entry, &dma_entry_hash[idx], list) { + if ((entry->dev_addr == ref->dev_addr) && + (entry->dev == ref->dev)) + return entry; + } + + return NULL; +} + +static void add_dma_entry(struct dma_debug_entry *entry) +{ + int idx = hash_fn(entry); + + list_add_tail(&entry->list, &dma_entry_hash[idx]); +} + +static void remove_dma_entry(struct dma_debug_entry *entry) +{ + list_del(&entry->list); +} + +void dma_debug_init(void) +{ + int i; + + for (i = 0; i < HASH_SIZE; ++i) + INIT_LIST_HEAD(&dma_entry_hash[i]); + + dma_entry_cache = kmem_cache_create("dma_debug_entry_cache", + sizeof(struct dma_debug_entry), + 0, SLAB_PANIC, NULL); + + printk(KERN_INFO "PCI-DMA: DMA API debugging enabled by kernel config\n"); +} + diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index 1926248..94096b8 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -275,6 +275,8 @@ EXPORT_SYMBOL(dma_supported); static int __init pci_iommu_init(void) { + dma_debug_init(); + calgary_iommu_init(); intel_iommu_init();
Impact: creates necessary data structures for DMA-API debugging Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> --- arch/x86/include/asm/dma-mapping.h | 1 + arch/x86/include/asm/dma_debug.h | 14 +++++ arch/x86/kernel/Makefile | 2 + arch/x86/kernel/pci-dma-debug.c | 111 ++++++++++++++++++++++++++++++++++++ arch/x86/kernel/pci-dma.c | 2 + 5 files changed, 130 insertions(+), 0 deletions(-) create mode 100644 arch/x86/kernel/pci-dma-debug.c