Message ID | 1231517970-20288-4-git-send-email-joerg.roedel@amd.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Hi Joerg. On Fri, Jan 09, 2009 at 05:19:17PM +0100, Joerg Roedel (joerg.roedel@amd.com) wrote: > +/* > + * Request exclusive access to a hash bucket for a given dma_debug_entry. > + */ > +static struct hash_bucket *get_hash_bucket(struct dma_debug_entry *entry, > + unsigned long *flags) > +{ > + int idx = hash_fn(entry); > + unsigned long __flags; > + > + spin_lock_irqsave(&dma_entry_hash[idx].lock, __flags); > + *flags = __flags; > + return &dma_entry_hash[idx]; > +} > + > +/* > + * Give up exclusive access to the hash bucket > + */ > +static void put_hash_bucket(struct hash_bucket *bucket, > + unsigned long *flags) > +{ > + unsigned long __flags = *flags; > + > + spin_unlock_irqrestore(&bucket->lock, __flags); > +} Why do you need such ugly helpers? > + * Add an entry to a hash bucket > + */ > +static void hash_bucket_add(struct hash_bucket *bucket, > + struct dma_debug_entry *entry) > +{ > + list_add_tail(&entry->list, &bucket->list); > +} > +/* > + * Remove entry from a hash bucket list > + */ > +static void hash_bucket_del(struct dma_debug_entry *entry) > +{ > + list_del(&entry->list); > +} Do you really need this getting they are called only from single place?
On Fri, Jan 09, 2009 at 08:55:42PM +0300, Evgeniy Polyakov wrote: > Hi Joerg. > > On Fri, Jan 09, 2009 at 05:19:17PM +0100, Joerg Roedel (joerg.roedel@amd.com) wrote: > > +/* > > + * Request exclusive access to a hash bucket for a given dma_debug_entry. > > + */ > > +static struct hash_bucket *get_hash_bucket(struct dma_debug_entry *entry, > > + unsigned long *flags) > > +{ > > + int idx = hash_fn(entry); > > + unsigned long __flags; > > + > > + spin_lock_irqsave(&dma_entry_hash[idx].lock, __flags); > > + *flags = __flags; > > + return &dma_entry_hash[idx]; > > +} > > + > > +/* > > + * Give up exclusive access to the hash bucket > > + */ > > +static void put_hash_bucket(struct hash_bucket *bucket, > > + unsigned long *flags) > > +{ > > + unsigned long __flags = *flags; > > + > > + spin_unlock_irqrestore(&bucket->lock, __flags); > > +} > > Why do you need such ugly helpers? Because everything else I thought about here was even more ugly. But maybe you have a better idea? I tried to lock directly in the debug_ functions. But this is ugly and unnecessary code duplication. > > > + * Add an entry to a hash bucket > > + */ > > +static void hash_bucket_add(struct hash_bucket *bucket, > > + struct dma_debug_entry *entry) > > +{ > > + list_add_tail(&entry->list, &bucket->list); > > +} > > > +/* > > + * Remove entry from a hash bucket list > > + */ > > +static void hash_bucket_del(struct dma_debug_entry *entry) > > +{ > > + list_del(&entry->list); > > +} > > Do you really need this getting they are called only from single place? Hmm, true. I will inline these functions. Joerg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 09, 2009 at 07:14:46PM +0100, Joerg Roedel (joro@8bytes.org) wrote: > > > +static struct hash_bucket *get_hash_bucket(struct dma_debug_entry *entry, > > > + unsigned long *flags) > > > +{ > > > + int idx = hash_fn(entry); > > > + unsigned long __flags; > > > + > > > + spin_lock_irqsave(&dma_entry_hash[idx].lock, __flags); > > > + *flags = __flags; > > > + return &dma_entry_hash[idx]; > > > +} > > > + > > > +/* > > > + * Give up exclusive access to the hash bucket > > > + */ > > > +static void put_hash_bucket(struct hash_bucket *bucket, > > > + unsigned long *flags) > > > +{ > > > + unsigned long __flags = *flags; > > > + > > > + spin_unlock_irqrestore(&bucket->lock, __flags); > > > +} > > > > Why do you need such ugly helpers? > > Because everything else I thought about here was even more ugly. But > maybe you have a better idea? I tried to lock directly in the debug_ > functions. But this is ugly and unnecessary code duplication. I believe that having direct locking in the debug_ functions is not a duplication, anyone will have a direct vision on the locking and hash array dereference, and this will be just one additional line compared to the get_* call and the same number of lines for the put :)
On Fri, Jan 09, 2009 at 09:23:39PM +0300, Evgeniy Polyakov wrote: > On Fri, Jan 09, 2009 at 07:14:46PM +0100, Joerg Roedel (joro@8bytes.org) wrote: > > > > +static struct hash_bucket *get_hash_bucket(struct dma_debug_entry *entry, > > > > + unsigned long *flags) > > > > +{ > > > > + int idx = hash_fn(entry); > > > > + unsigned long __flags; > > > > + > > > > + spin_lock_irqsave(&dma_entry_hash[idx].lock, __flags); > > > > + *flags = __flags; > > > > + return &dma_entry_hash[idx]; > > > > +} > > > > + > > > > +/* > > > > + * Give up exclusive access to the hash bucket > > > > + */ > > > > +static void put_hash_bucket(struct hash_bucket *bucket, > > > > + unsigned long *flags) > > > > +{ > > > > + unsigned long __flags = *flags; > > > > + > > > > + spin_unlock_irqrestore(&bucket->lock, __flags); > > > > +} > > > > > > Why do you need such ugly helpers? > > > > Because everything else I thought about here was even more ugly. But > > maybe you have a better idea? I tried to lock directly in the debug_ > > functions. But this is ugly and unnecessary code duplication. > > I believe that having direct locking in the debug_ functions is not a > duplication, anyone will have a direct vision on the locking and hash > array dereference, and this will be just one additional line compared to > the get_* call and the same number of lines for the put :) Even more additional lines because of the additional variables needed in every function. Anyway, I try it and if it does not look good I will keep that change ;) Joerg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 9 Jan 2009 17:19:17 +0100 Joerg Roedel <joerg.roedel@amd.com> wrote: > +struct hash_bucket { > + struct list_head list; > + spinlock_t lock; > +} ____cacheline_aligned; __cacheline_aligned_in_smp. This all looks like an exotically large amount of code for a debug thingy? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2009-01-13 at 00:51 -0800, Andrew Morton wrote: > This all looks like an exotically large amount of code for a debug > thingy? Perhaps so, but it's a useful debug thingy which lets us debug _very_ common problems in drivers, which many people are unlikely to notice without this 'assistance'.
* Andrew Morton <akpm@linux-foundation.org> wrote: > On Fri, 9 Jan 2009 17:19:17 +0100 Joerg Roedel <joerg.roedel@amd.com> wrote: > > > +struct hash_bucket { > > + struct list_head list; > > + spinlock_t lock; > > +} ____cacheline_aligned; > > __cacheline_aligned_in_smp. > > This all looks like an exotically large amount of code for a debug > thingy? this code checks the DMA usage of ~1 million lines of kernel code - all the DMA using drivers. I think Joerg's feature is hugely relevant as DMA scribbles are one of the hardest to debug kernel bugs: they can end up in permanent data corruption or other hard to find bugs. In fact i think his patchset is rather simple and even having 10 times as much debug code would pay for its existence in the long run. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 14 Jan 2009 12:43:47 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > On Fri, 9 Jan 2009 17:19:17 +0100 Joerg Roedel <joerg.roedel@amd.com> wrote: > > > > > +struct hash_bucket { > > > + struct list_head list; > > > + spinlock_t lock; > > > +} ____cacheline_aligned; > > > > __cacheline_aligned_in_smp. > > > > This all looks like an exotically large amount of code for a debug > > thingy? > > this code checks the DMA usage of ~1 million lines of kernel code - all > the DMA using drivers. I think Joerg's feature is hugely relevant as DMA > scribbles are one of the hardest to debug kernel bugs: they can end up in > permanent data corruption or other hard to find bugs. In fact i think his > patchset is rather simple and even having 10 times as much debug code > would pay for its existence in the long run. > Have we previously found bugs by other means which this facility would have detected? I don't recall any... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 14 Jan 2009 12:43:47 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > > > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > On Fri, 9 Jan 2009 17:19:17 +0100 Joerg Roedel <joerg.roedel@amd.com> wrote: > > > > > > > +struct hash_bucket { > > > > + struct list_head list; > > > > + spinlock_t lock; > > > > +} ____cacheline_aligned; > > > > > > __cacheline_aligned_in_smp. > > > > > > This all looks like an exotically large amount of code for a debug > > > thingy? > > > > this code checks the DMA usage of ~1 million lines of kernel code - > > all the DMA using drivers. I think Joerg's feature is hugely relevant > > as DMA scribbles are one of the hardest to debug kernel bugs: they can > > end up in permanent data corruption or other hard to find bugs. In > > fact i think his patchset is rather simple and even having 10 times as > > much debug code would pay for its existence in the long run. > > Have we previously found bugs by other means which this facility would > have detected? I don't recall any... this facility found a handful of real bugs already - Joerg, do you have more specifics about that? Also, such a facility would be useful during driver development, when such bugs occur en masse. Faster driver development under Linux is certainly desirable. This facility basically adds a sandbox to all DMA ops and tracks the lifetime and validity of DMA operations. Given how ugly DMA bugs can be in practice and how hard to debug they are, i see this as good step forward. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 14 Jan 2009 12:43:47 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > > > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > On Fri, 9 Jan 2009 17:19:17 +0100 Joerg Roedel <joerg.roedel@amd.com> wrote: > > > > > > > +struct hash_bucket { > > > > + struct list_head list; > > > > + spinlock_t lock; > > > > +} ____cacheline_aligned; > > > > > > __cacheline_aligned_in_smp. > > > > > > This all looks like an exotically large amount of code for a debug > > > thingy? > > > > this code checks the DMA usage of ~1 million lines of kernel code - all > > the DMA using drivers. I think Joerg's feature is hugely relevant as DMA > > scribbles are one of the hardest to debug kernel bugs: they can end up in > > permanent data corruption or other hard to find bugs. In fact i think his > > patchset is rather simple and even having 10 times as much debug code > > would pay for its existence in the long run. > > > > Have we previously found bugs by other means which this facility would > have detected? I don't recall any... btw., during the past decade we have had countless very ugly driver DMA bugs in the past that took vendors months and specialized equipment to track down. I cannot give you a number breakdown, only an impression: storage drivers tended to be the hardest hit (due to the severity of the bugs and due to their inherent complexity) - but DMA bugs in networking drivers can be hard to track down too. Plus there's another benefit: if a driver passes this checking and there's still DMA related corruption observed, then the hardware / firmware becomes a stronger suspect. This helps debugging too. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2009-01-14 at 09:39 -0800, Andrew Morton wrote: > > this code checks the DMA usage of ~1 million lines of kernel code - all > > the DMA using drivers. I think Joerg's feature is hugely relevant as DMA > > scribbles are one of the hardest to debug kernel bugs: they can end up in > > permanent data corruption or other hard to find bugs. In fact i think his > > patchset is rather simple and even having 10 times as much debug code > > would pay for its existence in the long run. > > > > Have we previously found bugs by other means which this facility would > have detected? I don't recall any... Yes. We've found network driver bugs which only showed up when an IOMMU has been active -- and were painful to diagnose because they involved DMA going wrong. It happens when drivers make mistakes with the DMA mapping APIs. With this debug facility, we can find such problems on _any_ hardware, and get a sane report about what's wrong.
On Wed, Jan 14, 2009 at 09:39:18AM -0800, Andrew Morton wrote: > > Have we previously found bugs by other means which this facility would > have detected? I don't recall any... > See for example this bugfix: http://www.spinics.net/lists/netdev/msg83208.html The bug was found using the first version of this patchset. Joerg
On Wed, 14 Jan 2009 18:48:04 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > On Wed, 14 Jan 2009 12:43:47 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > > > > > > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > On Fri, 9 Jan 2009 17:19:17 +0100 Joerg Roedel <joerg.roedel@amd.com> wrote: > > > > > > > > > +struct hash_bucket { > > > > > + struct list_head list; > > > > > + spinlock_t lock; > > > > > +} ____cacheline_aligned; > > > > > > > > __cacheline_aligned_in_smp. > > > > > > > > This all looks like an exotically large amount of code for a debug > > > > thingy? > > > > > > this code checks the DMA usage of ~1 million lines of kernel code - all > > > the DMA using drivers. I think Joerg's feature is hugely relevant as DMA > > > scribbles are one of the hardest to debug kernel bugs: they can end up in > > > permanent data corruption or other hard to find bugs. In fact i think his > > > patchset is rather simple and even having 10 times as much debug code > > > would pay for its existence in the long run. > > > > > > > Have we previously found bugs by other means which this facility would > > have detected? I don't recall any... > > btw., during the past decade we have had countless very ugly driver DMA > bugs in the past that took vendors months and specialized equipment to > track down. > > I cannot give you a number breakdown, only an impression: storage drivers > tended to be the hardest hit (due to the severity of the bugs and due to > their inherent complexity) - but DMA bugs in networking drivers can be > hard to track down too. We have had ugly DMA bugs in scsi drivers but I think that this new feature can find very few of them. I can't think of any SCSI driver bugs that this could find. This feature can't find any popular DMA bugs in scsi drivers, such as messing up driver's dma descriptor from a scatter gather list. But this can find some kinds of DMA bugs and it's is just about 1,000 lines. I don't see any problem about merging this. 1,000 lines it too large? Maybe cutting just 1,000 lines up into too many pieces is deceptive. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/lib/dma-debug.c b/lib/dma-debug.c index d04f8b6..74a0f36 100644 --- a/lib/dma-debug.c +++ b/lib/dma-debug.c @@ -18,9 +18,14 @@ */ #include <linux/dma-debug.h> +#include <linux/spinlock.h> #include <linux/types.h> #include <linux/list.h> +#define HASH_SIZE 256 +#define HASH_FN_SHIFT 20 +#define HASH_FN_MASK 0xffULL + enum { dma_debug_single, dma_debug_sg, @@ -37,3 +42,99 @@ struct dma_debug_entry { int direction; }; +struct hash_bucket { + struct list_head list; + spinlock_t lock; +} ____cacheline_aligned; + +/* Hash list to save the allocated dma addresses */ +static struct hash_bucket dma_entry_hash[HASH_SIZE]; + +/* + * Hash related functions + * + * Every DMA-API request is saved into a struct dma_debug_entry. To + * have quick access to these structs they are stored into a hash. + */ +static int hash_fn(struct dma_debug_entry *entry) +{ + /* + * Hash function is based on the dma address. + * We use bits 20-27 here as the index into the hash + */ + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK; +} + +/* + * Request exclusive access to a hash bucket for a given dma_debug_entry. + */ +static struct hash_bucket *get_hash_bucket(struct dma_debug_entry *entry, + unsigned long *flags) +{ + int idx = hash_fn(entry); + unsigned long __flags; + + spin_lock_irqsave(&dma_entry_hash[idx].lock, __flags); + *flags = __flags; + return &dma_entry_hash[idx]; +} + +/* + * Give up exclusive access to the hash bucket + */ +static void put_hash_bucket(struct hash_bucket *bucket, + unsigned long *flags) +{ + unsigned long __flags = *flags; + + spin_unlock_irqrestore(&bucket->lock, __flags); +} + +/* + * Search a given entry in the hash bucket list + */ +static struct dma_debug_entry *hash_bucket_find(struct hash_bucket *bucket, + struct dma_debug_entry *ref) +{ + struct dma_debug_entry *entry; + + list_for_each_entry(entry, &bucket->list, list) { + if ((entry->dev_addr == ref->dev_addr) && + (entry->dev == ref->dev)) + return entry; + } + + return NULL; +} + +/* + * Add an entry to a hash bucket + */ +static void hash_bucket_add(struct hash_bucket *bucket, + struct dma_debug_entry *entry) +{ + list_add_tail(&entry->list, &bucket->list); +} + +/* + * Remove entry from a hash bucket list + */ +static void hash_bucket_del(struct dma_debug_entry *entry) +{ + list_del(&entry->list); +} + +/* + * Wrapper function for adding an entry to the hash. + * This function takes care of locking itself. + */ +static void add_dma_entry(struct dma_debug_entry *entry) +{ + struct hash_bucket *bucket; + unsigned long flags; + + bucket = get_hash_bucket(entry, &flags); + hash_bucket_add(bucket, entry); + put_hash_bucket(bucket, &flags); +} +
Impact: implement necessary functions for the core hash Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> --- lib/dma-debug.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 101 insertions(+), 0 deletions(-)