diff mbox series

[net-next,v5,01/21] lib/bitmap: add bitmap_{read, write}()

Message ID 20240201122216.2634007-2-aleksander.lobakin@intel.com
State Handled Elsewhere
Headers show
Series ice: add PFCP filter support | expand

Commit Message

Alexander Lobakin Feb. 1, 2024, 12:21 p.m. UTC
From: Syed Nayyar Waris <syednwaris@gmail.com>

The two new functions allow reading/writing values of length up to
BITS_PER_LONG bits at arbitrary position in the bitmap.

The code was taken from "bitops: Introduce the for_each_set_clump macro"
by Syed Nayyar Waris with a number of changes and simplifications:
 - instead of using roundup(), which adds an unnecessary dependency
   on <linux/math.h>, we calculate space as BITS_PER_LONG-offset;
 - indentation is reduced by not using else-clauses (suggested by
   checkpatch for bitmap_get_value());
 - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read()
   and bitmap_write();
 - some redundant computations are omitted.

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Syed Nayyar Waris <syednwaris@gmail.com>
Signed-off-by: William Breathitt Gray <william.gray@linaro.org>
Link: https://lore.kernel.org/lkml/fe12eedf3666f4af5138de0e70b67a07c7f40338.1592224129.git.syednwaris@gmail.com/
Suggested-by: Yury Norov <yury.norov@gmail.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/bitmap.h | 77 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

Comments

Arnd Bergmann Feb. 1, 2024, 1:23 p.m. UTC | #1
On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote:
> From: Syed Nayyar Waris <syednwaris@gmail.com>
>
> The two new functions allow reading/writing values of length up to
> BITS_PER_LONG bits at arbitrary position in the bitmap.
>
> The code was taken from "bitops: Introduce the for_each_set_clump macro"
> by Syed Nayyar Waris with a number of changes and simplifications:
>  - instead of using roundup(), which adds an unnecessary dependency
>    on <linux/math.h>, we calculate space as BITS_PER_LONG-offset;
>  - indentation is reduced by not using else-clauses (suggested by
>    checkpatch for bitmap_get_value());
>  - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read()
>    and bitmap_write();
>  - some redundant computations are omitted.

These functions feel like they should not be inline but are
better off in lib/bitmap.c given their length.

As far as I can tell, the header ends up being included
indirectly almost everywhere, so just parsing these functions
likey adds not just dependencies but also compile time.

     Arnd
Alexander Potapenko Feb. 1, 2024, 1:45 p.m. UTC | #2
On Thu, Feb 1, 2024 at 2:23 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote:
> > From: Syed Nayyar Waris <syednwaris@gmail.com>
> >
> > The two new functions allow reading/writing values of length up to
> > BITS_PER_LONG bits at arbitrary position in the bitmap.
> >
> > The code was taken from "bitops: Introduce the for_each_set_clump macro"
> > by Syed Nayyar Waris with a number of changes and simplifications:
> >  - instead of using roundup(), which adds an unnecessary dependency
> >    on <linux/math.h>, we calculate space as BITS_PER_LONG-offset;
> >  - indentation is reduced by not using else-clauses (suggested by
> >    checkpatch for bitmap_get_value());
> >  - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read()
> >    and bitmap_write();
> >  - some redundant computations are omitted.
>
> These functions feel like they should not be inline but are
> better off in lib/bitmap.c given their length.
>
> As far as I can tell, the header ends up being included
> indirectly almost everywhere, so just parsing these functions
> likey adds not just dependencies but also compile time.
>
>      Arnd

Removing particular functions from a header to reduce compilation time
does not really scale.
Do we know this case has a noticeable impact on the compilation time?
If yes, maybe we need to tackle this problem in a different way (e.g.
reduce the number of dependencies on it)?
Arnd Bergmann Feb. 1, 2024, 2:02 p.m. UTC | #3
On Thu, Feb 1, 2024, at 14:45, Alexander Potapenko wrote:
> On Thu, Feb 1, 2024 at 2:23 PM Arnd Bergmann <arnd@arndb.de> wrote:
>> On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote:
>>
>> As far as I can tell, the header ends up being included
>> indirectly almost everywhere, so just parsing these functions
>> likey adds not just dependencies but also compile time.
>>
>
> Removing particular functions from a header to reduce compilation time
> does not really scale.
> Do we know this case has a noticeable impact on the compilation time?
> If yes, maybe we need to tackle this problem in a different way (e.g.
> reduce the number of dependencies on it)?

Cleaning up the header dependencies is definitely possible in
theory, and there are other places we could start this, but
it's also a multi-year effort that several people have tried
without much success.

All I'm asking here is to not make it worse by adding this
one without need. If the function is not normally inlined
anyway, there is no benefit to having it in the header.

      Arnd
Alexander Lobakin Feb. 1, 2024, 3:49 p.m. UTC | #4
From: Arnd Bergmann <arnd@arndb.de>
Date: Thu, 01 Feb 2024 14:23:33 +0100

> On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote:
>> From: Syed Nayyar Waris <syednwaris@gmail.com>
>>
>> The two new functions allow reading/writing values of length up to
>> BITS_PER_LONG bits at arbitrary position in the bitmap.
>>
>> The code was taken from "bitops: Introduce the for_each_set_clump macro"
>> by Syed Nayyar Waris with a number of changes and simplifications:
>>  - instead of using roundup(), which adds an unnecessary dependency
>>    on <linux/math.h>, we calculate space as BITS_PER_LONG-offset;
>>  - indentation is reduced by not using else-clauses (suggested by
>>    checkpatch for bitmap_get_value());
>>  - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read()
>>    and bitmap_write();
>>  - some redundant computations are omitted.
> 
> These functions feel like they should not be inline but are
> better off in lib/bitmap.c given their length.

When their arguments are compile-time constants, they got optimized
well. They're also used on hotpath, so making them external could hurt
performance + taking the first sentence into account, making them
external will hurt the performance even more, 'cause they won't be then
as optimized by the compiler as they are now.

> 
> As far as I can tell, the header ends up being included
> indirectly almost everywhere, so just parsing these functions
> likey adds not just dependencies but also compile time.
> 
>      Arnd

Thanks,
Olek
Yury Norov Feb. 28, 2024, 4:10 p.m. UTC | #5
On Thu, Feb 01, 2024 at 03:02:50PM +0100, Arnd Bergmann wrote:
> On Thu, Feb 1, 2024, at 14:45, Alexander Potapenko wrote:
> > On Thu, Feb 1, 2024 at 2:23 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >> On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote:
> >>
> >> As far as I can tell, the header ends up being included
> >> indirectly almost everywhere, so just parsing these functions
> >> likey adds not just dependencies but also compile time.
> >>
> >
> > Removing particular functions from a header to reduce compilation time
> > does not really scale.
> > Do we know this case has a noticeable impact on the compilation time?
> > If yes, maybe we need to tackle this problem in a different way (e.g.
> > reduce the number of dependencies on it)?
> 
> Cleaning up the header dependencies is definitely possible in
> theory, and there are other places we could start this, but
> it's also a multi-year effort that several people have tried
> without much success.
> 
> All I'm asking here is to not make it worse by adding this
> one without need. If the function is not normally inlined
> anyway, there is no benefit to having it in the header.
> 
>       Arnd

Hi Arnd,

I think Alexander has shown that the functions are normally inlined.
If for some target that doesn't hold, we'd use __always_inline.

They are very lightweight by nature - one or at max two word fetches
followed by some shifting. We spent quite some cycles making sure
that the generated code looks efficient, at least not worse than the
existing bitmap_{get,set}_value8(), which is a special case of the
bitmap_{read,write}.

I agree that bitmap header is overwhelmed (like many other kernel
headers), and I'm working on unloading it.

I checked allyesconfig build time before and after this patch, and
I found no difference for me. So if you're concerned about compilation
time, this patch doesn't make things worse in this department.

With all that, Alexander, can you please double-check that the
functions get inlined, and if so:

Signed-off-by: Yury Norov <yury.norov@gmail.com>
diff mbox series

Patch

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 99451431e4d6..7ca0379be8c1 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -79,6 +79,10 @@  struct device;
  *  bitmap_to_arr64(buf, src, nbits)            Copy nbits from buf to u64[] dst
  *  bitmap_get_value8(map, start)               Get 8bit value from map at start
  *  bitmap_set_value8(map, value, start)        Set 8bit value to map at start
+ *  bitmap_read(map, start, nbits)              Read an nbits-sized value from
+ *                                              map at start
+ *  bitmap_write(map, value, start, nbits)      Write an nbits-sized value to
+ *                                              map at start
  *
  * Note, bitmap_zero() and bitmap_fill() operate over the region of
  * unsigned longs, that is, bits behind bitmap till the unsigned long
@@ -636,6 +640,79 @@  static inline void bitmap_set_value8(unsigned long *map, unsigned long value,
 	map[index] |= value << offset;
 }
 
+/**
+ * bitmap_read - read a value of n-bits from the memory region
+ * @map: address to the bitmap memory region
+ * @start: bit offset of the n-bit value
+ * @nbits: size of value in bits, nonzero, up to BITS_PER_LONG
+ *
+ * Returns: value of @nbits bits located at the @start bit offset within the
+ * @map memory region. For @nbits = 0 and @nbits > BITS_PER_LONG the return
+ * value is undefined.
+ */
+static inline unsigned long bitmap_read(const unsigned long *map,
+					unsigned long start,
+					unsigned long nbits)
+{
+	size_t index = BIT_WORD(start);
+	unsigned long offset = start % BITS_PER_LONG;
+	unsigned long space = BITS_PER_LONG - offset;
+	unsigned long value_low, value_high;
+
+	if (unlikely(!nbits || nbits > BITS_PER_LONG))
+		return 0;
+
+	if (space >= nbits)
+		return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits);
+
+	value_low = map[index] & BITMAP_FIRST_WORD_MASK(start);
+	value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
+	return (value_low >> offset) | (value_high << space);
+}
+
+/**
+ * bitmap_write - write n-bit value within a memory region
+ * @map: address to the bitmap memory region
+ * @value: value to write, clamped to nbits
+ * @start: bit offset of the n-bit value
+ * @nbits: size of value in bits, nonzero, up to BITS_PER_LONG.
+ *
+ * bitmap_write() behaves as-if implemented as @nbits calls of __assign_bit(),
+ * i.e. bits beyond @nbits are ignored:
+ *
+ *   for (bit = 0; bit < nbits; bit++)
+ *           __assign_bit(start + bit, bitmap, val & BIT(bit));
+ *
+ * For @nbits == 0 and @nbits > BITS_PER_LONG no writes are performed.
+ */
+static inline void bitmap_write(unsigned long *map, unsigned long value,
+				unsigned long start, unsigned long nbits)
+{
+	size_t index;
+	unsigned long offset;
+	unsigned long space;
+	unsigned long mask;
+	bool fit;
+
+	if (unlikely(!nbits || nbits > BITS_PER_LONG))
+		return;
+
+	mask = BITMAP_LAST_WORD_MASK(nbits);
+	value &= mask;
+	offset = start % BITS_PER_LONG;
+	space = BITS_PER_LONG - offset;
+	fit = space >= nbits;
+	index = BIT_WORD(start);
+
+	map[index] &= (fit ? (~(mask << offset)) : ~BITMAP_FIRST_WORD_MASK(start));
+	map[index] |= value << offset;
+	if (fit)
+		return;
+
+	map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
+	map[index + 1] |= (value >> space);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __LINUX_BITMAP_H */